Skip to main content
Monitor SuperBox infrastructure health, performance, and costs with AWS CloudWatch. Set up dashboards, alarms, and log insights for proactive issue detection.

CloudWatch Dashboard

Create a comprehensive dashboard to visualize all metrics:
1

Access CloudWatch Console

AWS Console → CloudWatchDashboardsCreate dashboard
2

Add Widgets

Create widgets for Lambda, S3, and application metrics
3

Configure Refresh

Set auto-refresh to 1 minute for real-time monitoring
Invocations
  • Metric: AWS/LambdaInvocations
  • Statistic: Sum
  • Period: 1 minute
  • Chart type: Line
Errors
  • Metric: AWS/LambdaErrors
  • Statistic: Sum
  • Period: 1 minute
  • Chart type: Stacked area (with Invocations)
Duration
  • Metric: AWS/LambdaDuration
  • Statistics: Average, Maximum, p99
  • Period: 1 minute
  • Chart type: Line
Throttles
  • Metric: AWS/LambdaThrottles
  • Statistic: Sum
  • Period: 1 minute
  • Chart type: Number

CloudWatch Alarms

Set up proactive alerting for critical issues:

Lambda Alarms

resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
  alarm_name          = "superbox-lambda-high-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "Errors"
  namespace           = "AWS/Lambda"
  period              = 300  # 5 minutes
  statistic           = "Sum"
  threshold           = 10
  alarm_description   = "Lambda error rate exceeds threshold"
  
  dimensions = {
    FunctionName = "superbox-mcp-executor"
  }
}
Triggers when: More than 10 errors in 5 minutes
resource "aws_cloudwatch_metric_alarm" "lambda_duration" {
  alarm_name          = "superbox-lambda-slow-execution"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "Duration"
  namespace           = "AWS/Lambda"
  period              = 300
  statistic           = "Average"
  threshold           = 30000  # 30 seconds
  alarm_description   = "Lambda execution time is high"
  
  dimensions = {
    FunctionName = "superbox-mcp-executor"
  }
}
Triggers when: Average execution > 30 seconds
resource "aws_cloudwatch_metric_alarm" "lambda_throttles" {
  alarm_name          = "superbox-lambda-throttled"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "Throttles"
  namespace           = "AWS/Lambda"
  period              = 300
  statistic           = "Sum"
  threshold           = 5
  alarm_description   = "Lambda function is being throttled"
  
  dimensions = {
    FunctionName = "superbox-mcp-executor"
  }
}
Triggers when: More than 5 throttles in 5 minutes

SNS Notification Setup

Configure email/SMS alerts:
# Create SNS topic
aws sns create-topic --name superbox-alerts

# Subscribe email
aws sns subscribe \
  --topic-arn arn:aws:sns:ap-south-1:123456789:superbox-alerts \
  --protocol email \
  --notification-endpoint your-email@example.com

# Confirm subscription via email
Link alarms to SNS topic:
alarm_actions = [aws_sns_topic.alerts.arn]
ok_actions    = [aws_sns_topic.alerts.arn]

CloudWatch Logs Insights

Query and analyze Lambda execution logs:

Query Examples

fields @timestamp, @message
| filter @message like /Error:/
| sort @timestamp desc
| limit 100
Shows recent errors with timestamps

Saved Queries

Save frequently used queries for quick access:
  1. Daily Execution Summary
  2. Failed Server Executions
  3. Memory Usage Patterns
  4. Cold Start Analysis

Performance Metrics

Lambda Performance

Cold Start Detection

fields @timestamp, @initDuration
| filter @type = "REPORT"
| filter @initDuration > 0
| stats avg(@initDuration) as avg_cold_start,
        max(@initDuration) as max_cold_start,
        count() as cold_starts
Analyze cold start frequency and duration

Memory Utilization

fields @timestamp, @maxMemoryUsed, @memorySize
| filter @type = "REPORT"
| stats avg(@maxMemoryUsed / @memorySize * 100) as avg_memory_pct,
        max(@maxMemoryUsed) as peak_memory
Monitor memory efficiency

Concurrent Executions

fields @timestamp
| filter @message like /START/
| stats count() as concurrent by bin(5m)
Track concurrent execution patterns

Error Rate Trend

fields @timestamp
| filter @message like /Error/
| stats count() as errors by bin(1h)
| sort @timestamp desc
Hourly error distribution

Cost Monitoring

Track infrastructure costs:

Cost Explorer Filters

1

Access Cost Explorer

AWS Console → Cost ManagementCost Explorer
2

Filter by Service

  • Service: AWS Lambda - Service: Amazon S3 - Service: CloudWatch - Tag: project:superbox
3

Create Budget Alert

Set budget alert at $50/month with email notification

Cost Optimization Queries

-- Lambda cost breakdown by function
SELECT
  line_item_resource_id,
  SUM(line_item_unblended_cost) as cost
FROM cost_usage
WHERE
  product_product_name = 'AWS Lambda'
  AND line_item_usage_start_date >= DATE_SUB(CURRENT_DATE, 30)
GROUP BY line_item_resource_id
ORDER BY cost DESC

X-Ray Tracing (Optional)

Enable AWS X-Ray for detailed request tracing:

Enable X-Ray

resource "aws_lambda_function" "mcp_executor" {
  # ... other config

  tracing_config {
    mode = "Active"
  }
}

Benefits

  • End-to-end request visualization
  • Identify bottlenecks in execution flow
  • Trace external API calls
  • Analyze Lambda initialization time

Grafana Integration (Optional)

For advanced visualization, integrate CloudWatch with Grafana:
1

Install Grafana

# Docker docker run -d -p 3000:3000 grafana/grafana
2

Add CloudWatch Data Source

  • Go to Configuration → Data Sources - Add AWS CloudWatch - Configure IAM credentials
3

Import Dashboard

Use pre-built Lambda monitoring dashboard from Grafana marketplace

Anomaly Detection

Enable CloudWatch Anomaly Detection:
resource "aws_cloudwatch_metric_alarm" "lambda_anomaly" {
  alarm_name                = "superbox-lambda-anomaly"
  comparison_operator       = "LessThanLowerOrGreaterThanUpperThreshold"
  evaluation_periods        = 2
  threshold_metric_id       = "e1"
  alarm_description         = "Anomaly detected in Lambda invocations"

  metric_query {
    id          = "e1"
    expression  = "ANOMALY_DETECTION_BAND(m1)"
    label       = "Invocations (Expected)"
    return_data = "true"
  }

  metric_query {
    id = "m1"

    metric {
      metric_name = "Invocations"
      namespace   = "AWS/Lambda"
      period      = 300
      stat        = "Sum"

      dimensions = {
        FunctionName = "superbox-mcp-executor"
      }
    }
  }
}

Monitoring Checklist

1

Daily Checks

  • Review error rate (< 1%)
  • Check average duration (< 30s)
  • Verify no throttling events
  • Monitor S3 bucket size growth
2

Weekly Reviews

  • Analyze cost trends
  • Review top error patterns
  • Check cold start frequency
  • Optimize memory allocation
3

Monthly Audits

  • Review CloudWatch log retention
  • Analyze popular MCP servers
  • Update alarm thresholds
  • Clean up old S3 objects

Next Steps