Real cloud cost reduction project for a SaaS company. Identified and eliminated $168K annual waste through rightsizing, reserved instances, and architectural changes. Complete breakdown of what worked.

Cloud Cost Optimization: From $22K to $8K Monthly

Let's talk about the cloud cost problem nobody warns you about: your AWS bill started at $800/month, which seemed reasonable. Two years later it's $22,000/month and nobody knows exactly why or how to fix it.

We worked with a 40-person SaaS company—call them CloudCo—whose AWS costs had grown 380% year-over-year while their customer base only grew 140%. The math didn't work. Their infrastructure costs were eating 34% of revenue when industry standard is 10-15%.

The CTO knew they were overspending but infrastructure had grown organically over three years. Multiple engineers making decisions in isolation, no central oversight, no cost optimization focus during growth mode. "We'll optimize later" became "we're now spending $264,000 annually on AWS."

This is the story of a 6-week cloud cost optimization project that reduced their monthly AWS spend from $22,000 to $8,000—a 64% reduction—while actually improving performance and reliability.

Annual savings: $168,000 Project cost: $12,000 Payback period: 26 days

Here's exactly what we found and how we fixed it.

The $22K Monthly Bill Breakdown

First, we had to understand where the money was going.

Cost by Service (Initial State)

AWS Service	Monthly Cost	% of Total	Our Reaction
EC2 (compute)	$8,400	38%	Expected
RDS (databases)	$5,200	24%	High but reasonable
S3 (storage)	$1,800	8%	Should be cheaper
Data Transfer	$3,200	15%	Way too high
NAT Gateway	$1,600	7%	Outrageous
CloudWatch	$400	2%	Normal
Other services	$1,400	6%	Various
Total	$22,000	100%	Fixable

The Red Flags

EC2 instances running 24/7 including:

Development environments (unused nights and weekends)
Staging environment (rarely used, always on)
Production instances oversized for actual load
Zombie instances (nobody knew what they did)

Database overprovisioning:

Production RDS: db.r5.4xlarge ($3,100/month)
Actual usage: 15% CPU, 40% memory
Staging RDS: db.r5.2xlarge ($1,550/month)
Staging usage: <5% (barely used)

Data transfer costs ($3,200/month):

Cross-AZ traffic (architectural problem)
Unoptimized API responses (sending too much data)
No CDN (serving static assets from EC2)

NAT Gateway waste ($1,600/month):

Three NAT Gateways (one per AZ)
Minimal traffic through them
Could have used single NAT Gateway

S3 storage inefficiency:

Everything in Standard tier
Logs stored forever (compliance didn't require it)
No lifecycle policies
Uncompressed backups

The Optimization Process: 6 Weeks

Week 1: Audit and Analysis

Tagged all resources:

Environment (prod, staging, dev)
Owner (which team)
Project
Many resources had zero tags (orphaned?)

Analyzed utilization with CloudWatch:

EC2 CPU/memory usage
RDS performance metrics
Network transfer patterns
Storage growth trends

Interviewed engineers:

What's this instance for?
Why this size?
Can we turn it off nights/weekends?
What happens if we reduce it?

Found:

23 instances nobody could explain
8 development environments running 24/7
4 forgotten experiment environments
Massive overprovisioning across the board

Cost: $3,200 (audit time)

Week 2: Quick Wins (Low-Hanging Fruit)

Killed zombie resources:

23 unexplained instances: terminated
4 experimental environments: deleted
6 old load balancers: removed
12 unused EBS volumes: deleted
Immediate savings: $3,400/month

Scheduled dev/staging environments:

Auto-stop at 7pm, auto-start at 7am weekdays
Off completely on weekends
Used AWS Instance Scheduler
Immediate savings: $2,100/month (65% time reduction)

S3 lifecycle policies:

Logs to S3 Glacier after 90 days
Delete logs after 2 years
Standard → Infrequent Access for old data
Immediate savings: $600/month

Total week 2 savings: $6,100/month

Cost: $1,800 (implementation)

Week 3: Right-Sizing Compute

Analyzed actual resource usage:

Production EC2: 18 instances ranging from t3.medium to c5.2xlarge
Average CPU utilization: 12-18%
Average memory utilization: 35-45%
Massively overprovisioned

Right-sizing decisions (example):

Web servers:

Was: 6× c5.xlarge (4 vCPU, 8GB RAM) = $1,032/month
Actually needed: c5.large (2 vCPU, 4GB RAM)
New: 6× c5.large = $516/month
Savings: $516/month (50%)

Application servers:

Was: 4× c5.2xlarge (8 vCPU, 16GB RAM) = $1,104/month
Peak usage: 25% CPU, 40% memory
New: 4× c5.xlarge (4 vCPU, 8GB RAM) = $552/month
Savings: $552/month (50%)

Background workers:

Was: 8× t3.large = $536/month
Usage: Sporadic (burst workload)
New: 8× t3.medium = $268/month
Savings: $268/month (50%)

Database right-sizing:

Production RDS:

Was: db.r5.4xlarge (16 vCPU, 128GB) = $3,100/month
Usage: 15% CPU, 40% memory (50GB)
New: db.r5.xlarge (4 vCPU, 32GB) = $775/month
Plus read replica optimization
Savings: $2,325/month (75%!)

Staging/Dev RDS:

Was: db.r5.2xlarge = $1,550/month
New: db.t3.large with auto-stop = $280/month
Savings: $1,270/month (82%)

Total week 3 savings: $4,931/month

Cost: $2,400 (analysis + migration)

Week 4: Reserved Instances & Savings Plans

For resources that must run 24/7:

EC2 Reserved Instances:

Production instances that won't change
1-year reserved instances (we wanted flexibility)
Partial upfront payment
Discount: 30-40% vs on-demand

RDS Reserved Instances:

Production database
1-year commitment
Discount: 35%

Savings Plans (compute):

$600/month commitment
Flexible across EC2, Fargate, Lambda
Discount: 25% average

Total week 4 savings: $2,200/month (One-time payment: $8,400 for reserved instances, but savings worth it)

Cost: $1,200 (planning + procurement)

Week 5: Network Architecture Optimization

Data transfer analysis:

$3,200/month in transfer costs
60% was cross-AZ traffic (avoidable)
25% was serving static assets from EC2 (should be CDN)
15% was inefficient API responses (sending too much data)

Architectural changes:

Reduced cross-AZ traffic:

Moved database readers to same AZ as application servers
Implemented read replica routing by AZ
Savings: $1,920/month

Implemented CloudFront CDN:

Static assets (images, CSS, JS) via CloudFront
Removed load from EC2
CDN cost: $180/month
EC2 data transfer savings: $800/month
Net savings: $620/month

API response optimization:

Many API endpoints returning full objects when only fields needed
Implemented field selection
Reduced payload sizes 40-70%
Savings: $480/month in data transfer

NAT Gateway consolidation:

Was: 3 NAT Gateways (high availability overkill)
New: 1 NAT Gateway (good enough for their needs)
Savings: $1,066/month

Total week 5 savings: $4,086/month

Cost: $2,200 (architecture changes)

Week 6: Storage Optimization

S3 comprehensive cleanup:

Implemented lifecycle policies across all buckets
Compressed backups (reduced size 60%)
Deleted truly unnecessary data
Moved infrequent access to IA tier
Savings: $1,200/month

EBS volume optimization:

Right-sized EBS volumes (many were oversized)
Moved some to gp2 instead of io1 (didn't need IOPS)
Deleted orphaned snapshots (hundreds of them)
Savings: $600/month

Total week 6 savings: $1,800/month

Cost: $1,200 (cleanup work)

Results: From $22K to $8K Monthly

Total Savings Breakdown

Optimization	Monthly Savings	Annual Savings
Killed zombie resources	$3,400	$40,800
Scheduled dev/staging	$2,100	$25,200
Compute right-sizing	$4,931	$59,172
Reserved instances	$2,200	$26,400
Network optimization	$4,086	$49,032
Storage optimization	$1,800	$21,600
Total Savings	$18,517	$222,204

Wait, that's more than we said!

Yes. Gross savings were $18,517/month. But:

Reserved instance upfront costs amortized
Some new costs (CloudFront CDN)
Monitoring improvements added
New auto-scaling configuration

Net savings: $14,000/month = $168,000 annually

New monthly AWS cost: $8,000 (down from $22,000) Reduction: 64%

Performance Impact

Surprisingly, performance improved:

CloudFront CDN made site faster (static assets)
Better database configuration reduced query times
Auto-scaling meant resources matched demand
No degradation from right-sizing (we monitored carefully)

Reliability improved:

Implemented proper auto-scaling (didn't have before)
Better monitoring (could see issues earlier)
Disaster recovery actually tested (wasn't before)

The Ongoing Optimization Process

One-time optimization isn't enough. We implemented ongoing practices:

Monthly Cost Reviews

Cost dashboard in CloudWatch:

Daily spend by service
Trending over time
Anomaly alerts (>20% unexpected increase)

Monthly team review:

Spending by project
New resources added
Optimization opportunities
Forecasting

Tagging Policy

All resources must have:

Environment (prod/staging/dev)
Owner (team)
Project
Cost-center

Enforcement: Automated tag compliance checker, alerts on untagged resources

Right-Sizing Automation

Automated recommendations:

Weekly reports on underutilized resources
Rightsizing suggestions based on usage
Scheduled instance reviews

Cost Allocation

Showback reporting:

Each team sees their AWS costs
Creates accountability
Encourages cost-conscious decisions

Architecture Review

Before new deployments:

Cost estimation required
Architecture review for cost efficiency
Auto-scaling plans
Reserved instance strategy

ROI Analysis

Investment

Audit and optimization project: $12,000 Reserved instance upfront: $8,400 Migration and testing time: $6,200 Total investment: $26,600

Returns

Annual savings: $168,000 Payback period: 1.9 months 3-year ROI: 1,796%

But the real value:

Infrastructure costs dropped from 34% of revenue to 12%
Sustainable cost model as they scale
Better performance and reliability
Culture of cost consciousness

Lessons for Other SaaS Companies

1. Cloud costs don't optimize themselves

"We'll optimize later" turns into massive waste. Build cost discipline from day one.

2. Most companies overprovision by 50-70%

Engineers default to "better safe than sorry." Monitor actual usage and right-size accordingly.

3. Development/staging should never run 24/7

Unless you have developers working around the clock (you don't), turn them off nights and weekends. 65% savings instantly.

4. Tagging is mandatory, not optional

Can't optimize what you can't attribute. Tag everything from day one.

5. Reserved instances for steady state, on-demand for burst

1-year reserved instances for baseline, auto-scale on-demand for traffic spikes. Best of both worlds.

6. Data transfer is the hidden killer

Cross-AZ traffic, missing CDN, inefficient APIs—these add up fast. Architectural decisions have cost implications.

7. Zombie resources accumulate fast

Without discipline, old experiments and forgotten instances pile up. Regular cleanup is essential.

When This Approach Works vs. Doesn't

This works well when you:

AWS bill over $5K/month (smaller bills have less optimization potential)
Haven't done comprehensive cost optimization in 12+ months
Engineers build without cost constraints
No tagging strategy
Development/staging environments run 24/7
Growing faster than optimizing

This probably won't work if you:

Already have rigorous cost discipline
Recently did comprehensive optimization
AWS bill under $2K/month (harder to get meaningful savings)
Highly optimized architecture already

ROI indicators this is worth doing:

AWS costs growing faster than customer base
Infrastructure >20% of revenue
No idea what specific resources cost
No reserved instances or savings plans
Resources running underutilized

The Bottom Line

CloudCo spent $12,000 on optimization (plus $14,600 in migration time) to reduce their annual AWS costs by $168,000.

But here's what they really got:

Sustainable infrastructure cost model (12% of revenue vs. 34%)
Better performance from architectural improvements
Cost discipline and ongoing optimization processes
Foundation to scale profitably

The question isn't "can we afford cloud cost optimization?"

The question is: "how much are we wasting on cloud resources right now?"

For most SaaS companies with $10K+ monthly AWS bills, the answer is: "at least 40-60%, probably more."

We're Thalamus. Enterprise capability without enterprise gatekeeping.

If your AWS bill makes you wince but you don't know where to start, we should talk. Not because we're definitely the right answer, but because we might help you calculate what poor cloud hygiene is actually costing you.

Sometimes the most valuable consulting is discovering you're running $3,400/month of resources nobody can explain.

Cloud Cost Optimization: From $22K to $8K Monthly

Cloud Cost Optimization: From $22K to $8K Monthly

The $22K Monthly Bill Breakdown

Cost by Service (Initial State)

The Red Flags

The Optimization Process: 6 Weeks

Week 1: Audit and Analysis

Week 2: Quick Wins (Low-Hanging Fruit)

Week 3: Right-Sizing Compute

Week 4: Reserved Instances & Savings Plans

Week 5: Network Architecture Optimization

Week 6: Storage Optimization

Results: From $22K to $8K Monthly

Total Savings Breakdown

Performance Impact

The Ongoing Optimization Process

Monthly Cost Reviews

Tagging Policy

Right-Sizing Automation

Cost Allocation

Architecture Review

ROI Analysis

Investment

Returns

Lessons for Other SaaS Companies

1. Cloud costs don't optimize themselves

2. Most companies overprovision by 50-70%

3. Development/staging should never run 24/7

4. Tagging is mandatory, not optional

5. Reserved instances for steady state, on-demand for burst

6. Data transfer is the hidden killer

7. Zombie resources accumulate fast

When This Approach Works vs. Doesn't

This works well when you:

This probably won't work if you:

ROI indicators this is worth doing:

The Bottom Line

Related Products:

Related Articles

Custom vs. SaaS: The 3-Year Financial Comparison

SaaS Consolidation: $67,000 Annual Savings

The $250K Migration That We Abandoned: A Post-Mortem

Ready to Build Something Better?