Skip to content

AWS CloudWatch โ€” Monitoring & Observability

Amazon CloudWatch is AWSโ€™s unified observability platform โ€” it collects metrics, logs, and events from AWS services and your own applications, allowing you to detect anomalies, set alarms, and visualize operational health.

In Azure terms: AWS CloudWatch = Azure Monitor + Log Analytics + Application Insights

ComponentDescription
MetricsNumeric time-series data (CPU usage, request count, latency)
LogsLog groups and log streams from services and applications
AlarmsTrigger actions when metrics cross thresholds
DashboardsVisual panels for metrics and logs
Events / EventBridgeReact to AWS service events or custom events
CloudWatch InsightsServerless log query engine (like Log Analytics KQL, but using its own syntax)
SyntheticsCanary scripts that continuously test endpoints
ServiceLensTraces + metrics + logs combined view (X-Ray integration)

Every AWS service publishes metrics to CloudWatch automatically:

ServiceExample Metrics
EC2CPUUtilization, NetworkIn, DiskReadOps
RDSDatabaseConnections, FreeStorageSpace, ReadLatency
LambdaInvocations, Errors, Duration, Throttles
SQSNumberOfMessagesSent, ApproximateAgeOfOldestMessage
ALBRequestCount, TargetResponseTime, HTTPCode_Target_5XX_Count

Custom Metrics: Publish your own metrics from apps:

Terminal window
aws cloudwatch put-metric-data \
--namespace "MyApp" \
--metric-data \
MetricName=PageViews,Value=1234,Unit=Count

Alarms evaluate a metric against a threshold and can trigger:

  • SNS notification (email, SMS, HTTP)
  • Auto Scaling action (scale out/in)
  • EC2 action (stop, reboot, terminate)
  • Systems Manager OpsItem
Terminal window
# Create an alarm for EC2 CPU > 80%
aws cloudwatch put-metric-alarm \
--alarm-name high-cpu \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--dimensions Name=InstanceId,Value=i-0abc123 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:123456789:my-topic

Applications, services, and AWS resources send logs to CloudWatch Logs:

  • Log Group โ€” Container for log streams (e.g., /aws/lambda/my-function)
  • Log Stream โ€” Sequence of log events from a single source (e.g., one Lambda instance)
  • Retention โ€” Set 1 day to 10 years (default: never expire)
  • Metric Filters โ€” Extract metric values from log patterns
-- Find Lambda errors in the last hour
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20
-- Top 10 slowest Lambda invocations
fields @timestamp, @billedDuration
| stats max(@billedDuration) as maxDuration by bin(1h)
| sort maxDuration desc
FeatureCloudWatchAzure Monitor
MetricsCloudWatch MetricsAzure Monitor Metrics
LogsCloudWatch LogsLog Analytics Workspace
Log queryCloudWatch Logs InsightsKQL (Kusto Query Language)
AlertingCloudWatch AlarmsAzure Monitor Alerts
DashboardsCloudWatch DashboardsAzure Monitor Workbooks / Dashboards
APMAWS X-Ray + ServiceLensApplication Insights
Synthetic monitoringCloudWatch SyntheticsApplication Insights Availability Tests
Event routingAmazon EventBridgeAzure Event Grid

AWS X-Ray provides distributed tracing for applications โ€” equivalent to Application Insights distributed traces:

  • Trace requests across Lambda, EC2, ECS, API Gateway, DynamoDB
  • Visualize service map showing dependencies and latency
  • Identify bottlenecks and errors in microservices
  • SDK available for Node.js, Python, Java, .NET, Ruby, Go

While CloudWatch is for operational monitoring, AWS CloudTrail records every API call made to AWS:

  • Who made the call (IAM user/role)
  • What action (API call)
  • When (timestamp)
  • From where (IP address)
  • On what resource

CloudTrail โ‰ˆ Azure Activity Log + Microsoft Entra Audit Logs

  • Set alarms on all critical Lambda error rates and RDS free storage
  • Use metric math to create composite alarms (e.g., error rate = errors / invocations ร— 100)
  • Ship all application logs to CloudWatch with structured JSON format
  • Set log retention to avoid unbounded storage costs
  • Use CloudWatch Container Insights for ECS/EKS workload monitoring