Skip to main content
Comprehensive troubleshooting guide for common SuperBox infrastructure issues, deployment errors, and runtime problems.

Deployment Issues

Terraform/OpenTofu Errors

Symptom:
Error: error configuring S3 Backend: no valid credential sources
Causes:
  • AWS credentials not configured
  • Incorrect IAM permissions
  • Profile not found
Solutions:
1

Verify credentials

aws sts get-caller-identity
Should return your AWS account ID
2

Check IAM permissions

Ensure your user/role has:
  • s3:* on target buckets
  • lambda:* for Lambda functions
  • iam:CreateRole, iam:AttachRolePolicy
  • logs:CreateLogGroup
3

Re-configure credentials

aws configure --profile superbox
# Enter Access Key ID
# Enter Secret Access Key
# Region: ap-south-1
Symptom:
Error acquiring the state lock
Lock Info: ID: xxx-xxx-xxx
Causes:
  • Previous deployment interrupted
  • Multiple deployments running simultaneously
Solutions:
# Force unlock (use with caution)
tofu force-unlock <LOCK_ID>

# Or remove lock from S3 backend
aws s3 rm s3://superbox-terraform-state/.terraform.tfstate.lock.info
Only force-unlock if you’re certain no other deployment is running
Symptom:
Error: InvalidParameterValueException: Unzipped size must be smaller than 262144000 bytes
Causes:
  • Lambda deployment package > 250 MB uncompressed
  • Too many dependencies
Solutions:
1

Check package size

Get-Item -Path "SuperBox-Infra/lambda_package.zip" | Select-Object Length
2

Use Lambda Layers

Move large dependencies to layers:
resource "aws_lambda_layer_version" "dependencies" {
  filename   = "layer.zip"
  layer_name = "superbox-deps"
  
  compatible_runtimes = ["python3.11"]
}
3

Exclude unnecessary files

Update packaging script:
exclude_patterns = [
    "**/__pycache__",
    "**/*.pyc",
    "**/tests",
    "**/docs"
]
Symptom:
Error: BucketAlreadyExists: The requested bucket name is not available
Causes:
  • S3 bucket names must be globally unique
  • Resource created in previous deployment
Solutions:
# Add unique suffix to bucket name
resource "aws_s3_bucket" "registry" {
  bucket = "superbox-mcp-registry-${random_id.bucket_suffix.hex}"
}

resource "random_id" "bucket_suffix" {
  byte_length = 4
}

# Or import existing resource
tofu import aws_s3_bucket.registry superbox-mcp-registry

Lambda Runtime Errors

Execution Failures

Error:
ModuleNotFoundError: No module named 'xyz'
Causes:
  • Dependency missing from requirements.txt
  • Package not included in Lambda package
Fix:
# Add to requirements.txt
echo "xyz==1.2.3" >> superbox.ai/requirements.txt

# Rebuild Lambda package
cd SuperBox-Infra
.\scripts\package_lambda.ps1

# Redeploy
tofu apply

S3 Issues

Bucket Access Errors

Causes:
  • Bucket policy blocking access
  • Missing IAM permissions
  • Public access blocked
Debug:
# Check bucket policy
aws s3api get-bucket-policy --bucket superbox-mcp-registry

# Test access
aws s3 ls s3://superbox-mcp-registry
Fix:
resource "aws_s3_bucket_public_access_block" "registry" {
  bucket = aws_s3_bucket.registry.id
  
  block_public_acls       = false  # Allow public access if needed
  block_public_policy     = false
  ignore_public_acls      = false
  restrict_public_buckets = false
}
Error:
An error occurred (NoSuchKey) when calling the GetObject operation
Causes:
  • File doesn’t exist at specified path
  • Incorrect S3 key format
Debug:
# List all objects
aws s3 ls s3://superbox-mcp-registry --recursive

# Check specific path
aws s3 ls s3://superbox-mcp-registry/servers/my-server/

CLI Issues

Authentication Failures

Error: Error: Authentication failed. Invalid token. Fix:
# Re-authenticate superbox auth login # Or set token manually
$env:SUPERBOX_TOKEN = "your-token-here" ```
</Accordion>

<Accordion title="Connection Refused" icon="plug">
**Error:**
Error: Could not connect to SuperBox backend

**Causes:**
- Backend server down
- Network/firewall blocking connection
- Incorrect API endpoint

**Debug:**
```powershell
# Test connectivity
curl https://api.superbox.ai/health

# Check DNS resolution
nslookup api.superbox.ai

# Verify endpoint in config
superbox config get api_url

Server Execution Errors

MCP Server Failures

Error:
Error: MCP server 'xyz' not found in registry
Fix:
# Search for server
superbox search xyz

# Pull specific version
superbox pull xyz --version 1.2.3

# Or publish if missing
superbox publish ./servers/xyz

CloudWatch Logs Debugging

Log Analysis

1

Access Lambda Logs

aws logs tail /aws/lambda/superbox-mcp-executor --follow
2

Filter Error Logs

fields @timestamp, @message | filter @message like
/Error|Exception|Failed/ | sort @timestamp desc | limit 100 ```
</Step>

<Step title="Extract Stack Traces">
  ```sql
  fields @timestamp, @message
  | filter @message like /Traceback/
  | display @message

Performance Debugging

Slow Execution

Identify Bottlenecks

fields @timestamp, @duration
| filter @type = "REPORT"
| sort @duration desc
| limit 50
Shows slowest Lambda executions

Cold Start Analysis

fields @timestamp, @initDuration
| filter @type = "REPORT"
| filter @initDuration > 0
| stats count() as cold_starts,
        avg(@initDuration) as avg_init_time
Measure cold start impact

Memory Profiling

import tracemalloc

tracemalloc.start()
# ... your code ...
current, peak = tracemalloc.get_traced_memory()
print(f"Peak memory: {peak / 1024 / 1024} MB")

External API Latency

import time

start = time.time()
response = requests.get(api_url)
duration = time.time() - start
print(f"API call took {duration}s")

Networking Issues

VPC Configuration (if using)

Symptom: External API calls timeoutCause: Lambda in private subnet without NAT GatewayFix:
resource "aws_lambda_function" "mcp_executor" {
  # Option 1: Remove VPC config (not recommended for production)
  # vpc_config = {}
  
  # Option 2: Use subnet with NAT Gateway
  vpc_config {
    subnet_ids         = [aws_subnet.private_with_nat.id]
    security_group_ids = [aws_security_group.lambda.id]
  }
}
Debug:
# Check security group rules
aws ec2 describe-security-groups --group-ids sg-xxxxx
Fix:
resource "aws_security_group_rule" "lambda_outbound" {
  type              = "egress"
  from_port         = 0
  to_port           = 0
  protocol          = "-1"
  cidr_blocks       = ["0.0.0.0/0"]
  security_group_id = aws_security_group.lambda.id
}

Common Error Messages

ErrorMeaningSolution
ResourceNotFoundExceptionResource (Lambda/S3) doesn’t existCheck resource names, verify deployment
ThrottlingExceptionToo many requestsImplement exponential backoff
InvalidParameterValueExceptionInvalid function parameterValidate input parameters
ServiceExceptionAWS service errorCheck AWS status page, retry
KMSAccessDeniedExceptionCannot decrypt environment variablesUpdate KMS key policy

Debug Mode

Enable detailed logging:
# In lambda.py
import logging

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)  # Change from INFO to DEBUG

# Add verbose logging
logger.debug(f"Received event: {json.dumps(event)}")
logger.debug(f"Environment: {os.environ}")
Update Lambda:
cd SuperBox-Infra
.\scripts\package_lambda.ps1
tofu apply

Getting Help

Emergency Rollback

If deployment breaks production:
# Rollback to previous state
tofu apply -auto-approve -backup=terraform.tfstate.backup

# Or destroy and recreate
tofu destroy -target=aws_lambda_function.mcp_executor
tofu apply
Always test infrastructure changes in staging environment first