Cloud Cost Optimization: From $22K to $8K Monthly
Real cloud cost reduction project for a SaaS company. Identified and eliminated $168K annual waste through rightsizing, reserved instances, and architectural changes. Complete breakdown of what worked.
Cloud Cost Optimization: From $22K to $8K Monthly
Let's talk about the cloud cost problem nobody warns you about: your AWS bill started at $800/month, which seemed reasonable. Two years later it's $22,000/month and nobody knows exactly why or how to fix it.
We worked with a 40-person SaaS company—call them CloudCo—whose AWS costs had grown 380% year-over-year while their customer base only grew 140%. The math didn't work. Their infrastructure costs were eating 34% of revenue when industry standard is 10-15%.
The CTO knew they were overspending but infrastructure had grown organically over three years. Multiple engineers making decisions in isolation, no central oversight, no cost optimization focus during growth mode. "We'll optimize later" became "we're now spending $264,000 annually on AWS."
This is the story of a 6-week cloud cost optimization project that reduced their monthly AWS spend from $22,000 to $8,000—a 64% reduction—while actually improving performance and reliability.
Annual savings: $168,000 Project cost: $12,000 Payback period: 26 days
Here's exactly what we found and how we fixed it.
The $22K Monthly Bill Breakdown
First, we had to understand where the money was going.
Cost by Service (Initial State)
| AWS Service | Monthly Cost | % of Total | Our Reaction |
|---|---|---|---|
| EC2 (compute) | $8,400 | 38% | Expected |
| RDS (databases) | $5,200 | 24% | High but reasonable |
| S3 (storage) | $1,800 | 8% | Should be cheaper |
| Data Transfer | $3,200 | 15% | Way too high |
| NAT Gateway | $1,600 | 7% | Outrageous |
| CloudWatch | $400 | 2% | Normal |
| Other services | $1,400 | 6% | Various |
| Total | $22,000 | 100% | Fixable |
The Red Flags
EC2 instances running 24/7 including:
- Development environments (unused nights and weekends)
- Staging environment (rarely used, always on)
- Production instances oversized for actual load
- Zombie instances (nobody knew what they did)
Database overprovisioning:
- Production RDS: db.r5.4xlarge ($3,100/month)
- Actual usage: 15% CPU, 40% memory
- Staging RDS: db.r5.2xlarge ($1,550/month)
- Staging usage: <5% (barely used)
Data transfer costs ($3,200/month):
- Cross-AZ traffic (architectural problem)
- Unoptimized API responses (sending too much data)
- No CDN (serving static assets from EC2)
NAT Gateway waste ($1,600/month):
- Three NAT Gateways (one per AZ)
- Minimal traffic through them
- Could have used single NAT Gateway
S3 storage inefficiency:
- Everything in Standard tier
- Logs stored forever (compliance didn't require it)
- No lifecycle policies
- Uncompressed backups
The Optimization Process: 6 Weeks
Week 1: Audit and Analysis
Tagged all resources:
- Environment (prod, staging, dev)
- Owner (which team)
- Project
- Many resources had zero tags (orphaned?)
Analyzed utilization with CloudWatch:
- EC2 CPU/memory usage
- RDS performance metrics
- Network transfer patterns
- Storage growth trends
Interviewed engineers:
- What's this instance for?
- Why this size?
- Can we turn it off nights/weekends?
- What happens if we reduce it?
Found:
- 23 instances nobody could explain
- 8 development environments running 24/7
- 4 forgotten experiment environments
- Massive overprovisioning across the board
Cost: $3,200 (audit time)
Week 2: Quick Wins (Low-Hanging Fruit)
Killed zombie resources:
- 23 unexplained instances: terminated
- 4 experimental environments: deleted
- 6 old load balancers: removed
- 12 unused EBS volumes: deleted
- Immediate savings: $3,400/month
Scheduled dev/staging environments:
- Auto-stop at 7pm, auto-start at 7am weekdays
- Off completely on weekends
- Used AWS Instance Scheduler
- Immediate savings: $2,100/month (65% time reduction)
S3 lifecycle policies:
- Logs to S3 Glacier after 90 days
- Delete logs after 2 years
- Standard → Infrequent Access for old data
- Immediate savings: $600/month
Total week 2 savings: $6,100/month
Cost: $1,800 (implementation)
Week 3: Right-Sizing Compute
Analyzed actual resource usage:
- Production EC2: 18 instances ranging from t3.medium to c5.2xlarge
- Average CPU utilization: 12-18%
- Average memory utilization: 35-45%
- Massively overprovisioned
Right-sizing decisions (example):
Web servers:
- Was: 6× c5.xlarge (4 vCPU, 8GB RAM) = $1,032/month
- Actually needed: c5.large (2 vCPU, 4GB RAM)
- New: 6× c5.large = $516/month
- Savings: $516/month (50%)
Application servers:
- Was: 4× c5.2xlarge (8 vCPU, 16GB RAM) = $1,104/month
- Peak usage: 25% CPU, 40% memory
- New: 4× c5.xlarge (4 vCPU, 8GB RAM) = $552/month
- Savings: $552/month (50%)
Background workers:
- Was: 8× t3.large = $536/month
- Usage: Sporadic (burst workload)
- New: 8× t3.medium = $268/month
- Savings: $268/month (50%)
Database right-sizing:
Production RDS:
- Was: db.r5.4xlarge (16 vCPU, 128GB) = $3,100/month
- Usage: 15% CPU, 40% memory (50GB)
- New: db.r5.xlarge (4 vCPU, 32GB) = $775/month
- Plus read replica optimization
- Savings: $2,325/month (75%!)
Staging/Dev RDS:
- Was: db.r5.2xlarge = $1,550/month
- New: db.t3.large with auto-stop = $280/month
- Savings: $1,270/month (82%)
Total week 3 savings: $4,931/month
Cost: $2,400 (analysis + migration)
Week 4: Reserved Instances & Savings Plans
For resources that must run 24/7:
EC2 Reserved Instances:
- Production instances that won't change
- 1-year reserved instances (we wanted flexibility)
- Partial upfront payment
- Discount: 30-40% vs on-demand
RDS Reserved Instances:
- Production database
- 1-year commitment
- Discount: 35%
Savings Plans (compute):
- $600/month commitment
- Flexible across EC2, Fargate, Lambda
- Discount: 25% average
Total week 4 savings: $2,200/month (One-time payment: $8,400 for reserved instances, but savings worth it)
Cost: $1,200 (planning + procurement)
Week 5: Network Architecture Optimization
Data transfer analysis:
- $3,200/month in transfer costs
- 60% was cross-AZ traffic (avoidable)
- 25% was serving static assets from EC2 (should be CDN)
- 15% was inefficient API responses (sending too much data)
Architectural changes:
Reduced cross-AZ traffic:
- Moved database readers to same AZ as application servers
- Implemented read replica routing by AZ
- Savings: $1,920/month
Implemented CloudFront CDN:
- Static assets (images, CSS, JS) via CloudFront
- Removed load from EC2
- CDN cost: $180/month
- EC2 data transfer savings: $800/month
- Net savings: $620/month
API response optimization:
- Many API endpoints returning full objects when only fields needed
- Implemented field selection
- Reduced payload sizes 40-70%
- Savings: $480/month in data transfer
NAT Gateway consolidation:
- Was: 3 NAT Gateways (high availability overkill)
- New: 1 NAT Gateway (good enough for their needs)
- Savings: $1,066/month
Total week 5 savings: $4,086/month
Cost: $2,200 (architecture changes)
Week 6: Storage Optimization
S3 comprehensive cleanup:
- Implemented lifecycle policies across all buckets
- Compressed backups (reduced size 60%)
- Deleted truly unnecessary data
- Moved infrequent access to IA tier
- Savings: $1,200/month
EBS volume optimization:
- Right-sized EBS volumes (many were oversized)
- Moved some to gp2 instead of io1 (didn't need IOPS)
- Deleted orphaned snapshots (hundreds of them)
- Savings: $600/month
Total week 6 savings: $1,800/month
Cost: $1,200 (cleanup work)
Results: From $22K to $8K Monthly
Total Savings Breakdown
| Optimization | Monthly Savings | Annual Savings |
|---|---|---|
| Killed zombie resources | $3,400 | $40,800 |
| Scheduled dev/staging | $2,100 | $25,200 |
| Compute right-sizing | $4,931 | $59,172 |
| Reserved instances | $2,200 | $26,400 |
| Network optimization | $4,086 | $49,032 |
| Storage optimization | $1,800 | $21,600 |
| Total Savings | $18,517 | $222,204 |
Wait, that's more than we said!
Yes. Gross savings were $18,517/month. But:
- Reserved instance upfront costs amortized
- Some new costs (CloudFront CDN)
- Monitoring improvements added
- New auto-scaling configuration
Net savings: $14,000/month = $168,000 annually
New monthly AWS cost: $8,000 (down from $22,000) Reduction: 64%
Performance Impact
Surprisingly, performance improved:
- CloudFront CDN made site faster (static assets)
- Better database configuration reduced query times
- Auto-scaling meant resources matched demand
- No degradation from right-sizing (we monitored carefully)
Reliability improved:
- Implemented proper auto-scaling (didn't have before)
- Better monitoring (could see issues earlier)
- Disaster recovery actually tested (wasn't before)
The Ongoing Optimization Process
One-time optimization isn't enough. We implemented ongoing practices:
Monthly Cost Reviews
Cost dashboard in CloudWatch:
- Daily spend by service
- Trending over time
- Anomaly alerts (>20% unexpected increase)
Monthly team review:
- Spending by project
- New resources added
- Optimization opportunities
- Forecasting
Tagging Policy
All resources must have:
- Environment (prod/staging/dev)
- Owner (team)
- Project
- Cost-center
Enforcement: Automated tag compliance checker, alerts on untagged resources
Right-Sizing Automation
Automated recommendations:
- Weekly reports on underutilized resources
- Rightsizing suggestions based on usage
- Scheduled instance reviews
Cost Allocation
Showback reporting:
- Each team sees their AWS costs
- Creates accountability
- Encourages cost-conscious decisions
Architecture Review
Before new deployments:
- Cost estimation required
- Architecture review for cost efficiency
- Auto-scaling plans
- Reserved instance strategy
ROI Analysis
Investment
Audit and optimization project: $12,000 Reserved instance upfront: $8,400 Migration and testing time: $6,200 Total investment: $26,600
Returns
Annual savings: $168,000 Payback period: 1.9 months 3-year ROI: 1,796%
But the real value:
- Infrastructure costs dropped from 34% of revenue to 12%
- Sustainable cost model as they scale
- Better performance and reliability
- Culture of cost consciousness
Lessons for Other SaaS Companies
1. Cloud costs don't optimize themselves
"We'll optimize later" turns into massive waste. Build cost discipline from day one.
2. Most companies overprovision by 50-70%
Engineers default to "better safe than sorry." Monitor actual usage and right-size accordingly.
3. Development/staging should never run 24/7
Unless you have developers working around the clock (you don't), turn them off nights and weekends. 65% savings instantly.
4. Tagging is mandatory, not optional
Can't optimize what you can't attribute. Tag everything from day one.
5. Reserved instances for steady state, on-demand for burst
1-year reserved instances for baseline, auto-scale on-demand for traffic spikes. Best of both worlds.
6. Data transfer is the hidden killer
Cross-AZ traffic, missing CDN, inefficient APIs—these add up fast. Architectural decisions have cost implications.
7. Zombie resources accumulate fast
Without discipline, old experiments and forgotten instances pile up. Regular cleanup is essential.
When This Approach Works vs. Doesn't
This works well when you:
- AWS bill over $5K/month (smaller bills have less optimization potential)
- Haven't done comprehensive cost optimization in 12+ months
- Engineers build without cost constraints
- No tagging strategy
- Development/staging environments run 24/7
- Growing faster than optimizing
This probably won't work if you:
- Already have rigorous cost discipline
- Recently did comprehensive optimization
- AWS bill under $2K/month (harder to get meaningful savings)
- Highly optimized architecture already
ROI indicators this is worth doing:
- AWS costs growing faster than customer base
- Infrastructure >20% of revenue
- No idea what specific resources cost
- No reserved instances or savings plans
- Resources running underutilized
The Bottom Line
CloudCo spent $12,000 on optimization (plus $14,600 in migration time) to reduce their annual AWS costs by $168,000.
But here's what they really got:
- Sustainable infrastructure cost model (12% of revenue vs. 34%)
- Better performance from architectural improvements
- Cost discipline and ongoing optimization processes
- Foundation to scale profitably
The question isn't "can we afford cloud cost optimization?"
The question is: "how much are we wasting on cloud resources right now?"
For most SaaS companies with $10K+ monthly AWS bills, the answer is: "at least 40-60%, probably more."
We're Thalamus. Enterprise capability without enterprise gatekeeping.
If your AWS bill makes you wince but you don't know where to start, we should talk. Not because we're definitely the right answer, but because we might help you calculate what poor cloud hygiene is actually costing you.
Sometimes the most valuable consulting is discovering you're running $3,400/month of resources nobody can explain.