Case Studies

Cloud Cost Optimization: From $22K to $8K Monthly

Real cloud cost reduction project for a SaaS company. Identified and eliminated $168K annual waste through rightsizing, reserved instances, and architectural changes. Complete breakdown of what worked.

January 20, 2025
10 min read
By Thalamus AI

Cloud Cost Optimization: From $22K to $8K Monthly

Let's talk about the cloud cost problem nobody warns you about: your AWS bill started at $800/month, which seemed reasonable. Two years later it's $22,000/month and nobody knows exactly why or how to fix it.

We worked with a 40-person SaaS company—call them CloudCo—whose AWS costs had grown 380% year-over-year while their customer base only grew 140%. The math didn't work. Their infrastructure costs were eating 34% of revenue when industry standard is 10-15%.

The CTO knew they were overspending but infrastructure had grown organically over three years. Multiple engineers making decisions in isolation, no central oversight, no cost optimization focus during growth mode. "We'll optimize later" became "we're now spending $264,000 annually on AWS."

This is the story of a 6-week cloud cost optimization project that reduced their monthly AWS spend from $22,000 to $8,000—a 64% reduction—while actually improving performance and reliability.

Annual savings: $168,000 Project cost: $12,000 Payback period: 26 days

Here's exactly what we found and how we fixed it.

The $22K Monthly Bill Breakdown

First, we had to understand where the money was going.

Cost by Service (Initial State)

AWS ServiceMonthly Cost% of TotalOur Reaction
EC2 (compute)$8,40038%Expected
RDS (databases)$5,20024%High but reasonable
S3 (storage)$1,8008%Should be cheaper
Data Transfer$3,20015%Way too high
NAT Gateway$1,6007%Outrageous
CloudWatch$4002%Normal
Other services$1,4006%Various
Total$22,000100%Fixable

The Red Flags

EC2 instances running 24/7 including:

  • Development environments (unused nights and weekends)
  • Staging environment (rarely used, always on)
  • Production instances oversized for actual load
  • Zombie instances (nobody knew what they did)

Database overprovisioning:

  • Production RDS: db.r5.4xlarge ($3,100/month)
  • Actual usage: 15% CPU, 40% memory
  • Staging RDS: db.r5.2xlarge ($1,550/month)
  • Staging usage: <5% (barely used)

Data transfer costs ($3,200/month):

  • Cross-AZ traffic (architectural problem)
  • Unoptimized API responses (sending too much data)
  • No CDN (serving static assets from EC2)

NAT Gateway waste ($1,600/month):

  • Three NAT Gateways (one per AZ)
  • Minimal traffic through them
  • Could have used single NAT Gateway

S3 storage inefficiency:

  • Everything in Standard tier
  • Logs stored forever (compliance didn't require it)
  • No lifecycle policies
  • Uncompressed backups

The Optimization Process: 6 Weeks

Week 1: Audit and Analysis

Tagged all resources:

  • Environment (prod, staging, dev)
  • Owner (which team)
  • Project
  • Many resources had zero tags (orphaned?)

Analyzed utilization with CloudWatch:

  • EC2 CPU/memory usage
  • RDS performance metrics
  • Network transfer patterns
  • Storage growth trends

Interviewed engineers:

  • What's this instance for?
  • Why this size?
  • Can we turn it off nights/weekends?
  • What happens if we reduce it?

Found:

  • 23 instances nobody could explain
  • 8 development environments running 24/7
  • 4 forgotten experiment environments
  • Massive overprovisioning across the board

Cost: $3,200 (audit time)

Week 2: Quick Wins (Low-Hanging Fruit)

Killed zombie resources:

  • 23 unexplained instances: terminated
  • 4 experimental environments: deleted
  • 6 old load balancers: removed
  • 12 unused EBS volumes: deleted
  • Immediate savings: $3,400/month

Scheduled dev/staging environments:

  • Auto-stop at 7pm, auto-start at 7am weekdays
  • Off completely on weekends
  • Used AWS Instance Scheduler
  • Immediate savings: $2,100/month (65% time reduction)

S3 lifecycle policies:

  • Logs to S3 Glacier after 90 days
  • Delete logs after 2 years
  • Standard → Infrequent Access for old data
  • Immediate savings: $600/month

Total week 2 savings: $6,100/month

Cost: $1,800 (implementation)

Week 3: Right-Sizing Compute

Analyzed actual resource usage:

  • Production EC2: 18 instances ranging from t3.medium to c5.2xlarge
  • Average CPU utilization: 12-18%
  • Average memory utilization: 35-45%
  • Massively overprovisioned

Right-sizing decisions (example):

Web servers:

  • Was: 6× c5.xlarge (4 vCPU, 8GB RAM) = $1,032/month
  • Actually needed: c5.large (2 vCPU, 4GB RAM)
  • New: 6× c5.large = $516/month
  • Savings: $516/month (50%)

Application servers:

  • Was: 4× c5.2xlarge (8 vCPU, 16GB RAM) = $1,104/month
  • Peak usage: 25% CPU, 40% memory
  • New: 4× c5.xlarge (4 vCPU, 8GB RAM) = $552/month
  • Savings: $552/month (50%)

Background workers:

  • Was: 8× t3.large = $536/month
  • Usage: Sporadic (burst workload)
  • New: 8× t3.medium = $268/month
  • Savings: $268/month (50%)

Database right-sizing:

Production RDS:

  • Was: db.r5.4xlarge (16 vCPU, 128GB) = $3,100/month
  • Usage: 15% CPU, 40% memory (50GB)
  • New: db.r5.xlarge (4 vCPU, 32GB) = $775/month
  • Plus read replica optimization
  • Savings: $2,325/month (75%!)

Staging/Dev RDS:

  • Was: db.r5.2xlarge = $1,550/month
  • New: db.t3.large with auto-stop = $280/month
  • Savings: $1,270/month (82%)

Total week 3 savings: $4,931/month

Cost: $2,400 (analysis + migration)

Week 4: Reserved Instances & Savings Plans

For resources that must run 24/7:

EC2 Reserved Instances:

  • Production instances that won't change
  • 1-year reserved instances (we wanted flexibility)
  • Partial upfront payment
  • Discount: 30-40% vs on-demand

RDS Reserved Instances:

  • Production database
  • 1-year commitment
  • Discount: 35%

Savings Plans (compute):

  • $600/month commitment
  • Flexible across EC2, Fargate, Lambda
  • Discount: 25% average

Total week 4 savings: $2,200/month (One-time payment: $8,400 for reserved instances, but savings worth it)

Cost: $1,200 (planning + procurement)

Week 5: Network Architecture Optimization

Data transfer analysis:

  • $3,200/month in transfer costs
  • 60% was cross-AZ traffic (avoidable)
  • 25% was serving static assets from EC2 (should be CDN)
  • 15% was inefficient API responses (sending too much data)

Architectural changes:

Reduced cross-AZ traffic:

  • Moved database readers to same AZ as application servers
  • Implemented read replica routing by AZ
  • Savings: $1,920/month

Implemented CloudFront CDN:

  • Static assets (images, CSS, JS) via CloudFront
  • Removed load from EC2
  • CDN cost: $180/month
  • EC2 data transfer savings: $800/month
  • Net savings: $620/month

API response optimization:

  • Many API endpoints returning full objects when only fields needed
  • Implemented field selection
  • Reduced payload sizes 40-70%
  • Savings: $480/month in data transfer

NAT Gateway consolidation:

  • Was: 3 NAT Gateways (high availability overkill)
  • New: 1 NAT Gateway (good enough for their needs)
  • Savings: $1,066/month

Total week 5 savings: $4,086/month

Cost: $2,200 (architecture changes)

Week 6: Storage Optimization

S3 comprehensive cleanup:

  • Implemented lifecycle policies across all buckets
  • Compressed backups (reduced size 60%)
  • Deleted truly unnecessary data
  • Moved infrequent access to IA tier
  • Savings: $1,200/month

EBS volume optimization:

  • Right-sized EBS volumes (many were oversized)
  • Moved some to gp2 instead of io1 (didn't need IOPS)
  • Deleted orphaned snapshots (hundreds of them)
  • Savings: $600/month

Total week 6 savings: $1,800/month

Cost: $1,200 (cleanup work)

Results: From $22K to $8K Monthly

Total Savings Breakdown

OptimizationMonthly SavingsAnnual Savings
Killed zombie resources$3,400$40,800
Scheduled dev/staging$2,100$25,200
Compute right-sizing$4,931$59,172
Reserved instances$2,200$26,400
Network optimization$4,086$49,032
Storage optimization$1,800$21,600
Total Savings$18,517$222,204

Wait, that's more than we said!

Yes. Gross savings were $18,517/month. But:

  • Reserved instance upfront costs amortized
  • Some new costs (CloudFront CDN)
  • Monitoring improvements added
  • New auto-scaling configuration

Net savings: $14,000/month = $168,000 annually

New monthly AWS cost: $8,000 (down from $22,000) Reduction: 64%

Performance Impact

Surprisingly, performance improved:

  • CloudFront CDN made site faster (static assets)
  • Better database configuration reduced query times
  • Auto-scaling meant resources matched demand
  • No degradation from right-sizing (we monitored carefully)

Reliability improved:

  • Implemented proper auto-scaling (didn't have before)
  • Better monitoring (could see issues earlier)
  • Disaster recovery actually tested (wasn't before)

The Ongoing Optimization Process

One-time optimization isn't enough. We implemented ongoing practices:

Monthly Cost Reviews

Cost dashboard in CloudWatch:

  • Daily spend by service
  • Trending over time
  • Anomaly alerts (>20% unexpected increase)

Monthly team review:

  • Spending by project
  • New resources added
  • Optimization opportunities
  • Forecasting

Tagging Policy

All resources must have:

  • Environment (prod/staging/dev)
  • Owner (team)
  • Project
  • Cost-center

Enforcement: Automated tag compliance checker, alerts on untagged resources

Right-Sizing Automation

Automated recommendations:

  • Weekly reports on underutilized resources
  • Rightsizing suggestions based on usage
  • Scheduled instance reviews

Cost Allocation

Showback reporting:

  • Each team sees their AWS costs
  • Creates accountability
  • Encourages cost-conscious decisions

Architecture Review

Before new deployments:

  • Cost estimation required
  • Architecture review for cost efficiency
  • Auto-scaling plans
  • Reserved instance strategy

ROI Analysis

Investment

Audit and optimization project: $12,000 Reserved instance upfront: $8,400 Migration and testing time: $6,200 Total investment: $26,600

Returns

Annual savings: $168,000 Payback period: 1.9 months 3-year ROI: 1,796%

But the real value:

  • Infrastructure costs dropped from 34% of revenue to 12%
  • Sustainable cost model as they scale
  • Better performance and reliability
  • Culture of cost consciousness

Lessons for Other SaaS Companies

1. Cloud costs don't optimize themselves

"We'll optimize later" turns into massive waste. Build cost discipline from day one.

2. Most companies overprovision by 50-70%

Engineers default to "better safe than sorry." Monitor actual usage and right-size accordingly.

3. Development/staging should never run 24/7

Unless you have developers working around the clock (you don't), turn them off nights and weekends. 65% savings instantly.

4. Tagging is mandatory, not optional

Can't optimize what you can't attribute. Tag everything from day one.

5. Reserved instances for steady state, on-demand for burst

1-year reserved instances for baseline, auto-scale on-demand for traffic spikes. Best of both worlds.

6. Data transfer is the hidden killer

Cross-AZ traffic, missing CDN, inefficient APIs—these add up fast. Architectural decisions have cost implications.

7. Zombie resources accumulate fast

Without discipline, old experiments and forgotten instances pile up. Regular cleanup is essential.

When This Approach Works vs. Doesn't

This works well when you:

  • AWS bill over $5K/month (smaller bills have less optimization potential)
  • Haven't done comprehensive cost optimization in 12+ months
  • Engineers build without cost constraints
  • No tagging strategy
  • Development/staging environments run 24/7
  • Growing faster than optimizing

This probably won't work if you:

  • Already have rigorous cost discipline
  • Recently did comprehensive optimization
  • AWS bill under $2K/month (harder to get meaningful savings)
  • Highly optimized architecture already

ROI indicators this is worth doing:

  • AWS costs growing faster than customer base
  • Infrastructure >20% of revenue
  • No idea what specific resources cost
  • No reserved instances or savings plans
  • Resources running underutilized

The Bottom Line

CloudCo spent $12,000 on optimization (plus $14,600 in migration time) to reduce their annual AWS costs by $168,000.

But here's what they really got:

  • Sustainable infrastructure cost model (12% of revenue vs. 34%)
  • Better performance from architectural improvements
  • Cost discipline and ongoing optimization processes
  • Foundation to scale profitably

The question isn't "can we afford cloud cost optimization?"

The question is: "how much are we wasting on cloud resources right now?"

For most SaaS companies with $10K+ monthly AWS bills, the answer is: "at least 40-60%, probably more."

We're Thalamus. Enterprise capability without enterprise gatekeeping.

If your AWS bill makes you wince but you don't know where to start, we should talk. Not because we're definitely the right answer, but because we might help you calculate what poor cloud hygiene is actually costing you.

Sometimes the most valuable consulting is discovering you're running $3,400/month of resources nobody can explain.

Related Products:

Related Articles

Ready to Build Something Better?

Let's talk about how Thalamus AI can help your business scale with enterprise capabilities at SMB pricing.

Get in Touch