AWS CloudWatch โ Monitoring & Observability
AWS CloudWatch โ Monitoring & Observability
Section titled โAWS CloudWatch โ Monitoring & ObservabilityโAmazon CloudWatch is AWSโs unified observability platform โ it collects metrics, logs, and events from AWS services and your own applications, allowing you to detect anomalies, set alarms, and visualize operational health.
In Azure terms: AWS CloudWatch = Azure Monitor + Log Analytics + Application Insights
CloudWatch Components
Section titled โCloudWatch Componentsโ| Component | Description |
|---|---|
| Metrics | Numeric time-series data (CPU usage, request count, latency) |
| Logs | Log groups and log streams from services and applications |
| Alarms | Trigger actions when metrics cross thresholds |
| Dashboards | Visual panels for metrics and logs |
| Events / EventBridge | React to AWS service events or custom events |
| CloudWatch Insights | Serverless log query engine (like Log Analytics KQL, but using its own syntax) |
| Synthetics | Canary scripts that continuously test endpoints |
| ServiceLens | Traces + metrics + logs combined view (X-Ray integration) |
Metrics
Section titled โMetricsโEvery AWS service publishes metrics to CloudWatch automatically:
| Service | Example Metrics |
|---|---|
| EC2 | CPUUtilization, NetworkIn, DiskReadOps |
| RDS | DatabaseConnections, FreeStorageSpace, ReadLatency |
| Lambda | Invocations, Errors, Duration, Throttles |
| SQS | NumberOfMessagesSent, ApproximateAgeOfOldestMessage |
| ALB | RequestCount, TargetResponseTime, HTTPCode_Target_5XX_Count |
Custom Metrics: Publish your own metrics from apps:
aws cloudwatch put-metric-data \ --namespace "MyApp" \ --metric-data \ MetricName=PageViews,Value=1234,Unit=CountCloudWatch Alarms
Section titled โCloudWatch AlarmsโAlarms evaluate a metric against a threshold and can trigger:
- SNS notification (email, SMS, HTTP)
- Auto Scaling action (scale out/in)
- EC2 action (stop, reboot, terminate)
- Systems Manager OpsItem
# Create an alarm for EC2 CPU > 80%aws cloudwatch put-metric-alarm \ --alarm-name high-cpu \ --metric-name CPUUtilization \ --namespace AWS/EC2 \ --dimensions Name=InstanceId,Value=i-0abc123 \ --statistic Average \ --period 300 \ --threshold 80 \ --comparison-operator GreaterThanThreshold \ --evaluation-periods 2 \ --alarm-actions arn:aws:sns:us-east-1:123456789:my-topicCloudWatch Logs
Section titled โCloudWatch LogsโApplications, services, and AWS resources send logs to CloudWatch Logs:
- Log Group โ Container for log streams (e.g.,
/aws/lambda/my-function) - Log Stream โ Sequence of log events from a single source (e.g., one Lambda instance)
- Retention โ Set 1 day to 10 years (default: never expire)
- Metric Filters โ Extract metric values from log patterns
CloudWatch Logs Insights (Query)
Section titled โCloudWatch Logs Insights (Query)โ-- Find Lambda errors in the last hourfields @timestamp, @message| filter @message like /ERROR/| sort @timestamp desc| limit 20-- Top 10 slowest Lambda invocationsfields @timestamp, @billedDuration| stats max(@billedDuration) as maxDuration by bin(1h)| sort maxDuration descCloudWatch vs Azure Monitor
Section titled โCloudWatch vs Azure Monitorโ| Feature | CloudWatch | Azure Monitor |
|---|---|---|
| Metrics | CloudWatch Metrics | Azure Monitor Metrics |
| Logs | CloudWatch Logs | Log Analytics Workspace |
| Log query | CloudWatch Logs Insights | KQL (Kusto Query Language) |
| Alerting | CloudWatch Alarms | Azure Monitor Alerts |
| Dashboards | CloudWatch Dashboards | Azure Monitor Workbooks / Dashboards |
| APM | AWS X-Ray + ServiceLens | Application Insights |
| Synthetic monitoring | CloudWatch Synthetics | Application Insights Availability Tests |
| Event routing | Amazon EventBridge | Azure Event Grid |
AWS X-Ray (Distributed Tracing)
Section titled โAWS X-Ray (Distributed Tracing)โAWS X-Ray provides distributed tracing for applications โ equivalent to Application Insights distributed traces:
- Trace requests across Lambda, EC2, ECS, API Gateway, DynamoDB
- Visualize service map showing dependencies and latency
- Identify bottlenecks and errors in microservices
- SDK available for Node.js, Python, Java, .NET, Ruby, Go
CloudTrail (Audit Logging)
Section titled โCloudTrail (Audit Logging)โWhile CloudWatch is for operational monitoring, AWS CloudTrail records every API call made to AWS:
- Who made the call (IAM user/role)
- What action (API call)
- When (timestamp)
- From where (IP address)
- On what resource
CloudTrail โ Azure Activity Log + Microsoft Entra Audit Logs
Best Practices
Section titled โBest Practicesโ- Set alarms on all critical Lambda error rates and RDS free storage
- Use metric math to create composite alarms (e.g., error rate = errors / invocations ร 100)
- Ship all application logs to CloudWatch with structured JSON format
- Set log retention to avoid unbounded storage costs
- Use CloudWatch Container Insights for ECS/EKS workload monitoring