Home Insights Making CloudWatch Work Smarter

Making CloudWatch Work Smarter

Blog

17 Oct 2025

Performance Insight Without the Price Tag

In the world of cloud operations, observability is non-negotiable, but there’s a hidden cost that often surprises even experienced teams: monitoring itself.

Amazon CloudWatch is a powerful tool for tracking system health, application performance, and infrastructure usage. But without careful configuration, it can quietly turn into a budget drain.

Over my years working as an AWS Architect and Cloud Consultant, I’ve seen this pattern again and again. The good news? A few targeted adjustments can reduce Cloud Watch costs dramatically – often by 50% – while maintaining the insight you need. This post shares practical steps to make CloudWatch both cost-effective and technically sound.

CloudWatch: Where the Money Goes

Amazon CloudWatch is the cornerstone of observability on AWS. It brings together metrics, logs and events to give teams visibility into how their applications and infrastructures are performing. It provides real-time insight into system performance, resource utilisation and operational health – helping teams visualise system behaviour, detect anomalies, and automate responses to operational changes.

Of course, everything has a price. CloudWatch charges what you collect and store – every metric, log and dashboard can add up. While there is a free tier (5GB of logs, 10 metrics and a few alarms), real-world environments quickly exceed that.

The free-tier – this covers 5 GB Logs Data, basic monitoring metrics (those sent from AWS services by default), 10 metrics of custom or detailed monitoring metrics, 1 million API request (except always paid “Get*” operations), 3 custom dashboards, 10 Alarm metrics, etc.

The paid-tier – here things get really complicated and even checking the AWS pricing page: aws.amazon.com/cloudwatch/pricing/ may not bring a clarity at first glance. Basically, every gigabyte of data collected and stored, every custom metric you collect, is billable.

Without governance, the result is a bill full of small but compounding costs. But there are many practical ways to reduce costs while keeping operational visibility intact.

Smarter Monitoring, Lower Costs

Reduce Unnecessary Logging

Verbose logging: Debug-level logs are essential during development, but expensive in production. Keep verbosity disabled by default in all environments except for temporary debugging sessions.
CloudTrail data events: CloudTrail data events provide detailed records of operations performed on or within specific AWS resources (the who, what and when of resource-level activity). They can generate gigabytes per day. Unless required for compliance or investigations, disable data events or limit them to specific resources.
VPC Flow logs: VPC Flow logs are another heavy contributor, providing records of IP traffic going to and from network interfaces in your VPC. They are rarely needed in lower environments. And when you do need them, consider where they are stored…

Store VPC Flow Logs in S3 (Not CloudWatch)

VPC Flow Logs are one of the most common – and expensive – sources of CloudWatch data. They capture details of every IP flow within your VPC, and when stored directly in CloudWatch Logs, the ingestion and storage costs can add up quickly.

In this real environment, even collecting flow logs only (with data events stopped) and with just two weeks retention, we have 66 GB stored.

If you do need the events, then – especially non-production environments – it’s far more efficient to send VPC Flow Logs to S3 instead of CloudWatch. Here’s why this works so well:

Lower Cost:

S3 storage (especially when combined with Intelligent-Tiering or Glacier) is dramatically cheaper than CloudWatch logs. There is a significant cost for data ingestion to CloudWatch. Assuming one full data scan for analysis per month, using S3 is more than 30 times cheaper.

Sufficient Accessibility:

Logs stored in CloudWatch are ingested continuously and available for search in near real-time in CloudWatch Logs Insights with queries like:

_{fields @timestamp, srcAddr, dstAddr, action, bytes | filter action="ACCEPT" | stats sum(bytes) by srcAddr, dstAddr | sort @timestamp desc | limit 20.}

With s3 configured as storage, although logs are not immediately searchable, they are periodically delivered in batches and querying is possible via Amazon Athena (SQL on S3), or even integrated into EMR or OpenSearch, meaning you can keep analytical capability without paying for continuous ingestion.

_{SELECT srcaddr, dstaddr, action, SUM(bytes) as total_bytes FROM vpc_flow_logs WHERE action = 'ACCEPT' AND from_unixtime(start) > current_timestamp - interval '1' day GROUP BY srcaddr, dstaddr, action ORDER BY total_bytes DESC LIMIT 20;}

Lifecycle Management:

You can automate archive and deletion policies with S3 lifecycle rules – something CloudWatch doesn’t handle natively.

Security & Compliance:

Centralising logs into S3 can simplify access control and auditing.

Set Log Retention Policies

Every log group in CloudWatch should have a defined retention period. Without one, logs are stored indefinitely – and AWS keeps billing for that storage. Set retention aligned to your company’s data policies and, if long term retention is required, export the logs to S3 (in the same or another logging account) for storage and audit. This simple step alone can reduce monthly costs significantly.

Optimise CloudWatch Agent Metrics

The CloudWatch Agent is another hidden cost source. Default configurations often capture far more data than needed. Whilst monitoring performance is important, configuring too many or unused metrics increases your bill without bringing any value:

Overcollection of CPU Metrics

Overcollection of Disk Metrics

Add What Matters

Cost optimisation isn’t only about removing data – it’s about collecting the right data, more intelligently. This can mean adding metrics:

VolumeId

In the above example, we appended “VolumeId” to the disk metrics collection. This allows tracking of disk utilisation per EBS volume, so you can identify utilisation patterns across instances with multiple attached disks:

As well as making it possible to track your volume performance historically, where the volume has been attached to multiple instances:

Memory

By default, CloudWatch does not capture memory utilisation metrics for EC2 instances – meaning you could be missing a key indicator of application health. For cache-heavy or data-intensive workloads, adding custom memory and swap metrics can provide critical visibility while remaining cost-efficient.

Here are some useful examples to include in your CloudWatch Agent configuration, helping you to assess whether an instance is right-sized:

With these custom metrics in place you will be able to review your instances via AWS console, giving you the ability to make an informed decision on recommendations from AWS Compute Optimizer (which does not itself have sufficient insight into the relevant memory metrics).

Avoid Overuse of Dimensions for Better Aggregation

Each unique combination of metric and dimension counts as separate custom metric – and therefore increases your bill.

For most infrastructure metrics, using just the InstanceId dimension provides sufficient granularity. Adding multiple dimensions such as InstanceType, Region, AutoScalingGroupName or ImageId can quickly multiply metric counts without adding real analytical value:

Keep your configurations lean by using only the dimensions that truly support your monitoring objectives, for example aggregate CPU metrics at instance level, not across every possible metadata field:

Enforce and Automate Best Practice

All these optimisations can (and should) be built into your default AMIs or Infrastructure-as-Code templates.

By standardizing CloudWatch configuration – metrics, retention and logging – you avoid cost drift over time. Automation will ensure new environments follow the same rules without manual tuning.

For applications where something different is required, you can tailor the metrics with a dedicated app config file for CWAgent and overwrite the default one during the initial bootstrap process of the respective instances.

Case Study

Here’s how my team helped an international investment bank reduce CloudWatch costs by more than 70% using the optimisations described above.

These are the CloudWatch costs for a single non-production environment covering multiple business applications, over the course of 10 months:

We can see an immediate and significant decrease in the costs after the first month – taking a quick win by stopping CloudTrail data events and thereby significantly decreasing the costs from DataProcessing-Bytes.

The second biggest contributor – MetricMonitorUsage – decreased more steadily over time, as we optimised the custom metrics. This reflected a review period where we analysed the configuration and proposed recommendations, which we then discussed with the application teams. Over time, these recommendations were rolled out over all environments and became the default configuration. Here’s how the metrics reduced over four non-production and 2 production environments for the same department, with a reduction from around 150k to around 20k metrics a month overall:

Within a year, we had reduced the annual CloudWatch run rate from $44,000 to $11,500 for one environment.

With CloudWatch savings across all environments for that department, and including the savings from CloudTrail optimisation, we achieved reduction in costs of more than $200,000.

Bringing It All Together: Smarter Logging, Lower Costs

Effective observability is now crucial to ensure strong application performance and manage cloud expenses efficiently. Smart CloudWatch management is about balance: capturing what matters, discarding what doesn’t, and automating the rest. Every dollar spent should add tangible value, and monitoring hygiene should be embedded with policy and automation.

This doesn’t need to be expensive. By setting clear retention, refining metrics, and redirecting logs where appropriate, you can maintain full insight while reducing spend dramatically.

With the right technical discipline, CloudWatch turns from a cost driver into a catalyst for cloud excellence.

As part of Phi’s Cloud Services practice, I’ve helped clients identify and eliminate millions in wasted cloud spend. We look forward to doing the same for you.
Venelin Lehchanski, October 2025

If you’d like to make your observability smarter, our team can help you assess, optimise and automate your AWS CloudWatch or Azure Monitor setup for lasting efficiency.