Comprehensive troubleshooting guide for common SuperBox infrastructure issues, deployment errors, and runtime problems.
Deployment Issues
Access Denied / Invalid Credentials
Symptom: Error: error configuring S3 Backend: no valid credential sources
Causes:
AWS credentials not configured
Incorrect IAM permissions
Profile not found
Solutions:
Verify credentials
aws sts get-caller - identity
Should return your AWS account ID
Check IAM permissions
Ensure your user/role has:
s3:* on target buckets
lambda:* for Lambda functions
iam:CreateRole, iam:AttachRolePolicy
logs:CreateLogGroup
Re-configure credentials
aws configure -- profile superbox
# Enter Access Key ID
# Enter Secret Access Key
# Region: ap-south-1
Symptom: Error acquiring the state lock
Lock Info: ID: xxx-xxx-xxx
Causes:
Previous deployment interrupted
Multiple deployments running simultaneously
Solutions: # Force unlock (use with caution)
tofu force - unlock < LOCK_ID >
# Or remove lock from S3 backend
aws s3 rm s3: // superbox - terraform - state / .terraform.tfstate.lock.info
Only force-unlock if you’re certain no other deployment is running
Symptom: Error: InvalidParameterValueException: Unzipped size must be smaller than 262144000 bytes
Causes:
Lambda deployment package > 250 MB uncompressed
Too many dependencies
Solutions:
Check package size
Get-Item - Path "SuperBox-Infra/lambda_package.zip" | Select-Object Length
Use Lambda Layers
Move large dependencies to layers: resource "aws_lambda_layer_version" "dependencies" {
filename = "layer.zip"
layer_name = "superbox-deps"
compatible_runtimes = [ "python3.11" ]
}
Exclude unnecessary files
Update packaging script: exclude_patterns = [
"**/__pycache__" ,
"**/*.pyc" ,
"**/tests" ,
"**/docs"
]
Symptom: Error: BucketAlreadyExists: The requested bucket name is not available
Causes:
S3 bucket names must be globally unique
Resource created in previous deployment
Solutions: # Add unique suffix to bucket name
resource "aws_s3_bucket" "registry" {
bucket = "superbox-mcp-registry- ${ random_id . bucket_suffix . hex } "
}
resource "random_id" "bucket_suffix" {
byte_length = 4
}
# Or import existing resource
tofu import aws_s3_bucket . registry superbox - mcp - registry
Lambda Runtime Errors
Execution Failures
Module Import Error
Timeout Error
Memory Exceeded
Permission Denied
Error: ModuleNotFoundError: No module named 'xyz'
Causes:
Dependency missing from requirements.txt
Package not included in Lambda package
Fix: # Add to requirements.txt
echo "xyz==1.2.3" >> superbox.ai / requirements.txt
# Rebuild Lambda package
cd SuperBox - Infra
.\scripts\package_lambda.ps1
# Redeploy
tofu apply
Error: Task timed out after 30.00 seconds
Causes:
Long-running MCP server execution
External API calls taking too long
Insufficient timeout setting
Fix: resource "aws_lambda_function" "mcp_executor" {
timeout = 300 # Increase to 5 minutes
# Or optimize code
environment {
variables = {
EXECUTION_TIMEOUT = "60"
}
}
}
Error: Runtime exited with error: signal: killed
Runtime.ExitError
Causes:
Lambda running out of memory (default 128 MB)
Fix: resource "aws_lambda_function" "mcp_executor" {
memory_size = 512 # Increase from 128 MB
}
Check CloudWatch Logs for memory usage: fields @timestamp, @maxMemoryUsed, @memorySize
| filter @type = "REPORT"
| sort @timestamp desc
| limit 20
Error: AccessDeniedException: User is not authorized to perform: s3:GetObject
Causes:
Lambda IAM role missing S3 permissions
Fix: resource "aws_iam_role_policy" "lambda_s3_access" {
role = aws_iam_role . lambda_exec . id
policy = jsonencode ({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = [
"s3:GetObject" ,
"s3:PutObject" ,
"s3:ListBucket"
]
Resource = [
" ${ aws_s3_bucket . registry . arn } " ,
" ${ aws_s3_bucket . registry . arn } /*"
]
}]
})
}
S3 Issues
Bucket Access Errors
Causes:
Bucket policy blocking access
Missing IAM permissions
Public access blocked
Debug: # Check bucket policy
aws s3api get-bucket - policy -- bucket superbox - mcp - registry
# Test access
aws s3 ls s3: // superbox - mcp - registry
Fix: resource "aws_s3_bucket_public_access_block" "registry" {
bucket = aws_s3_bucket . registry . id
block_public_acls = false # Allow public access if needed
block_public_policy = false
ignore_public_acls = false
restrict_public_buckets = false
}
Error: An error occurred (NoSuchKey) when calling the GetObject operation
Causes:
File doesn’t exist at specified path
Incorrect S3 key format
Debug: # List all objects
aws s3 ls s3: // superbox - mcp - registry -- recursive
# Check specific path
aws s3 ls s3: // superbox - mcp - registry / servers / my - server /
CLI Issues
Authentication Failures
Error: Error: Authentication failed. Invalid token. Fix: # Re-authenticate superbox auth login # Or set token manually
$ env: SUPERBOX_TOKEN = "your-token-here" `` `
</ Accordion >
< Accordion title = "Connection Refused" icon = "plug" >
** Error: **
Error: Could not connect to SuperBox backend
**Causes:**
- Backend server down
- Network/firewall blocking connection
- Incorrect API endpoint
**Debug:**
```powershell
# Test connectivity
curl https://api.superbox.ai/health
# Check DNS resolution
nslookup api.superbox.ai
# Verify endpoint in config
superbox config get api_url
Server Execution Errors
MCP Server Failures
Error: Error: MCP server 'xyz' not found in registry
Fix: # Search for server
superbox search xyz
# Pull specific version
superbox pull xyz -- version 1.2 . 3
# Or publish if missing
superbox publish . / servers / xyz
Error: Error: Dependency conflict - package 'abc' requires version 2.x but 1.x is installed
Fix: # Update server manifest (mcp.yaml)
dependencies :
abc : ">=2.0.0,<3.0.0"
# Or create isolated environment
python -m venv .venv-xyz
.\.venv-xyz\Scripts\Activate.ps1
pip install -r requirements.txt
Error: SecurityError: Attempted to access prohibited resource
Causes:
MCP server trying to access network/filesystem beyond allowed scope
Fix: # Update sandbox permissions in mcp.yaml
sandbox :
network : true # Allow network access
filesystem :
- "/allowed/path"
- "/another/path"
CloudWatch Logs Debugging
Log Analysis
Access Lambda Logs
aws logs tail / aws / lambda / superbox - mcp - executor -- follow
Filter Error Logs
fields @timestamp, @message | filter @message like
/ Error|Exception|Failed / | sort @timestamp desc | limit 100 ```
</Step>
<Step title="Extract Stack Traces">
``` sql
fields @timestamp, @message
| filter @message like / Traceback /
| display @message
Slow Execution
Identify Bottlenecks fields @timestamp, @duration
| filter @type = "REPORT"
| sort @duration desc
| limit 50
Shows slowest Lambda executions
Cold Start Analysis fields @timestamp, @initDuration
| filter @type = "REPORT"
| filter @initDuration > 0
| stats count () as cold_starts,
avg (@initDuration) as avg_init_time
Measure cold start impact
Memory Profiling import tracemalloc
tracemalloc.start()
# ... your code ...
current, peak = tracemalloc.get_traced_memory()
print ( f "Peak memory: { peak / 1024 / 1024 } MB" )
External API Latency import time
start = time.time()
response = requests.get(api_url)
duration = time.time() - start
print ( f "API call took { duration } s" )
Networking Issues
VPC Configuration (if using)
Lambda Cannot Reach Internet
Symptom: External API calls timeoutCause: Lambda in private subnet without NAT GatewayFix: resource "aws_lambda_function" "mcp_executor" {
# Option 1: Remove VPC config (not recommended for production)
# vpc_config = {}
# Option 2: Use subnet with NAT Gateway
vpc_config {
subnet_ids = [ aws_subnet . private_with_nat . id ]
security_group_ids = [ aws_security_group . lambda . id ]
}
}
Security Group Blocking Traffic
Debug: # Check security group rules
aws ec2 describe - security - groups -- group-ids sg - xxxxx
Fix: resource "aws_security_group_rule" "lambda_outbound" {
type = "egress"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = [ "0.0.0.0/0" ]
security_group_id = aws_security_group . lambda . id
}
Common Error Messages
Error Meaning Solution ResourceNotFoundExceptionResource (Lambda/S3) doesn’t exist Check resource names, verify deployment ThrottlingExceptionToo many requests Implement exponential backoff InvalidParameterValueExceptionInvalid function parameter Validate input parameters ServiceExceptionAWS service error Check AWS status page, retry KMSAccessDeniedExceptionCannot decrypt environment variables Update KMS key policy
Debug Mode
Enable detailed logging:
# In lambda.py
import logging
logger = logging.getLogger()
logger.setLevel(logging. DEBUG ) # Change from INFO to DEBUG
# Add verbose logging
logger.debug( f "Received event: { json.dumps(event) } " )
logger.debug( f "Environment: { os.environ } " )
Update Lambda:
cd SuperBox - Infra
.\scripts\package_lambda.ps1
tofu apply
Getting Help
Emergency Rollback
If deployment breaks production:
# Rollback to previous state
tofu apply - auto - approve - backup = terraform.tfstate.backup
# Or destroy and recreate
tofu destroy - target = aws_lambda_function.mcp_executor
tofu apply
Always test infrastructure changes in staging environment first