Home Insights Reduce Cloud Costs by Automating EC2 Images/AMIs Cleanup with AWS Lambda Reduce Cloud Costs by Automating EC2 Images/AMIs Cleanup with AWS Lambda Aleks S.Consultant, Cloud Services Amazon Machine Images (AMIs) are one of the cloud resources most likely to accumulate unnoticed. Left unmanaged, they drive up storage costs and create operational clutter across accounts and regions. “With over 10 years of IT professional experience, half of those engaged in Public Cloud projects, I’ve been involved in designing, building, supporting and improving hybrid and multi-cloud environments across both AWS and Azure. I’ve seen repeatedly how, as teams iterate on infrastructure and deployments, hundreds of outdated AMIs and related Amazon Elastic Block Stores (EBS) snapshots can accumulate and quietly inflate storage costs for cloud customers. And I’ve seen how using Amazon Data Lifecycle Manager (ADLM) is never the complete solution.“ The Challenge Unmanaged AMIs accumulate fast – and without automation, they become an invisible and expensive problem across AWS accounts. We needed a safe, automated way to identify, report, and remove old EC2 images – regardless of origin – without risking active resources. While ADLM is valuable, it only manages snapshots and AMIs created through its own policies. It does not touch images that were created manually, via CI/CD pipelines, with custom automation, or those that are shared or replicated in cross-account backup workflows. In every one of the customer environments for which Phi provides support, the majority of EC2 images and snapshots fall into these categories – making it impossible for us to use ADLM to enforce lifecycle policies consistently. We needed a custom solution that could discover all AMIs and related snapshots regardless of origin, evaluate their creation dates and associations and remove only those that were safe to delete. In this blog we will share how Phi addressed this challenge with a modular, Lambda-based solution that: Runs on a scheduled basis using Amazon EventBridge Indexes every AMIs and snapshot across regions Applies safe and configurable retention logic Removes unused resources automatically Generates reports for transparency and audit Deploys consistently across accounts using Terraform By breaking the workflow into discrete steps, we monitored, evaluated, and improved each component independently – while maintaining transparency for our customers. The following End-to-End Architecture Diagram demonstrates how all the components worked together: Implementation Our solution consisted of three AWS Lambda functions, each using a dedicated Python script to perform a distinct role in the lifecycle management process. Inventory and Indexing The Index Lambda creates an inventory (index) of all EC2 AMIs in each AWS region. It captures details including: Name AMI ID Creation date Associated snapshot IDs Owning account and region Once all of the data is gathered, it is uploaded as a structured JSON report to S3. Remediation The Remediation Lambda uses the JSON report to identify stale, unused AMIs and deregister them automatically (or simulate this if dry-run was enabled). It filtered AMIs that: are owned by this account (not shared) match a configured name pattern. are older than the allowed age. are not in use. After cleaning is complete, it produces a remediation log CSV for reporting to an S3 bucket. Reporting Once the Index and Remediation Lambdas have finished, the Report Lambda merges the AMI index with remediation logs into a daily CSV report, as follows: Read the log file (produced by Remediation script) for that region and date. Parse it into a dictionary of remediation summaries Read the AMI index JSON created by Index script. Merge each AMI with remediation info Produce a daily CSV report The reports are uploaded to S3 and/or emailed to administrators for further analysis. Setup & Deployment Building a solution was only the first part of the project. An ability to deploy it consistently and securely across multiple customer environments brought the real impact. To automate our deployment, we used Terraform to provision and configure every component: the Lambda functions, triggers, IAM roles, networking, and alerting. Modular and Parameterized Design At the top level, we defined an input object that captures all environment-specific details including: Account alias and tags VPC subnets and security groups Notification settings (SMTP, alerts) Reporting bucket Supported regions This let us deploy the same Lambda-based cleanup solution into multiple AWS accounts with minimal manual changes, making the deployment fully reusable – ideal for multi-account or managed-service environments. Function Packaging and Deployment Each Lambda (indexing, remediation, and reporting) was packaged using the Terraform archive_file data source. Each ZIP package was then uploaded and deployed as a Lambda function with common defaults : Runtime: Python 3.13 Timeout: 900 seconds Memory: 1 GB VPC configuration: attached to customer-specific subnets and security groups All Lambdas share a consistent set of environment variables (defined in local.defaults.env_vars) to make the code portable and configuration-driven. These include SMTP settings (for push notifications of affected users), AWS region details, reporting bucket, CMDB tags, and proxy configuration. Scheduled Triggers with EventBridge Each function is triggered automatically through Amazon EventBridge (CloudWatch Events), with an EventBridge rule linked to the Lambda function via a target and permission block. To better use our Compute Savings Plan for cycling workloads, we run non-time sensitive compute operations outside business hours. Indexing begins at 22:00 UTC, Remediation at 22:15 and Reports at 22:30. The result is a regular, predictable cleanup cycle – with no manual intervention. Monitoring and Alerts For operational visibility, each Lambda was given its own CloudWatch Log Group and an error alarm integrated with Amazon SNS. If any function fails or raises errors, an SNS alert is immediately sent to the configured endpoint (email, webhook, or monitoring system). Such a setup ensures no silent failures – every unexpected issue triggers an alert and can be investigated promptly. Deployment Workflow Here’s how a typical deployment runs from start to finish: Code preparation – Python scripts are stored in each path, for example: src/functions/ and packaged automatically. Terraform apply – Deploy all the infrastructure including Lambdas, EventBridge schedules, and monitoring resources. Lambda execution: Index function inventories AMIs/snapshots Remediation function deletes unused resources Report function publishes a summary to S3 Alerting and visibility – Reported land in S3, errors go to SNS, and logs are stored in CloudWatch. This infrastructure-as-code approach gave us the reliability of AWS-native automation combined with the flexibility of custom logic. Results Only a couple of months after we deployed our solution in different customer environments, the cost and usage reports show great achievements in the reports: AMI remediation report: Snapshot remediation report: The cost reduction was impressive. The solution was rolled out from July 2025. Across the relevant environments we reduced AMI- and snapshot-related storage costs by up to 97%, eliminated hundreds of obsolete images, and improved operational hygiene across multiple regions: Conclusion By building our own Lambda-based cleanup automation, we achieved three clear outcomes: Cost Control by automatically pruning unused AMIs Scalability across accounts and regions through parameterized infrastructure Operational Visibility with automated reporting and alerting AWS Data Lifecycle Manager remains valuable, but it doesn’t cover the full lifecycle in real-world, multi-account environments. Our approach shows how combining native AWS services with lightweight Python automation can achieve precise, reliable lifecycle management – without the limitations of pre-built tools. In cloud operations, efficiency comes from taking ownership of your own automation. That is exactly what my team helps our clients achieve. By pairing native AWS services with lightweight, reusable automation, Phi helps organisations save money, gain visibility, and enforce operational discipline – keeping cloud estates lean. If you’re looking to strengthen your FinOps practices or automate cloud cost optimisation, we would be happy to help. Contact us for further information on relevant services: sales@phipartners.com Related news Rethinking Local Kubernetes Development Insights Public Cloud Cloud 14 Jan 2026 Reflections: Phi’s 2025 Insights News growth 22 Dec 2025 Meet our Sales Consultants: Tommaso Lugli, Sales Partner, London Blog Insights homepage 12 Dec 2025 Phi Team Hits the Paintball Field Insights business services homepage 09 Dec 2025