Case Studies

Microservices at Mid-Market Scale: Architecture Breakdown

Complete technical architecture of microservices implementation for 100-person SaaS company. Real service boundaries, inter-service communication patterns, data management strategies, $310K build cost, operational overhead, and when monoliths are actually better.

January 18, 2025
14 min read
By Thalamus AI

Let's be honest: microservices are oversold. Every developer wants to build them because they're "modern" and "scalable," but most mid-market companies don't need the complexity and shouldn't pay the operational overhead.

This is the story of a 100-person B2B SaaS company—call them DataFlow Systems—that moved from a monolith to microservices. Complete technical architecture, service decomposition strategy, inter-service communication patterns, data management decisions, and the honest truth about whether it was worth the $310K investment and ongoing operational complexity.

Spoiler: Sometimes it is. Sometimes it isn't. Here's how to know the difference.

The Company & The Problem

DataFlow Systems Profile:

  • $18M ARR, growing 60% YoY
  • 100 employees (45 engineering)
  • Product: Data integration platform (competes with Fivetran, Airbyte)
  • 850 customers, 5,000+ data sources
  • Monolithic Rails application (started 2018)

Why They Considered Microservices (Mid-2022):

  1. Deployment friction: 200+ deployments per month, each requires full app deployment, frequent conflicts
  2. Team scaling: 45 engineers stepping on each other in monolithic codebase
  3. Performance bottlenecks: ETL jobs slowing down API responses for UI
  4. Organizational structure: Teams organized by function (connectors, API, UI) but all in one codebase
  5. Technology constraints: Wanted to use Go for performance-critical ETL, stuck with Ruby

The honest trigger: "Netflix uses microservices" (every bad reason rolled into one)

CTO's concern: "Are we doing this because we need to, or because developers want to pad their resumes?"

Fair question. Let's examine the architecture.

The Architecture: Service Boundaries & Decisions

Original Monolith

┌─────────────────────────────────────────┐
│         Rails Monolith                  │
│  ┌─────────────────────────────────┐   │
│  │  Web UI (React SPA)              │   │
│  ├─────────────────────────────────┤   │
│  │  API Layer (Rails controllers)   │   │
│  ├─────────────────────────────────┤   │
│  │  Business Logic (Services)       │   │
│  ├─────────────────────────────────┤   │
│  │  Data Access (ActiveRecord)      │   │
│  └─────────────────────────────────┘   │
│              ↓                           │
│  ┌─────────────────────────────────┐   │
│  │  PostgreSQL Database             │   │
│  └─────────────────────────────────┘   │
│              ↓                           │
│  ┌─────────────────────────────────┐   │
│  │  Background Jobs (Sidekiq)       │   │
│  │  - ETL execution                 │   │
│  │  - Data transformations          │   │
│  │  - Notifications                 │   │
│  └─────────────────────────────────┘   │
└─────────────────────────────────────────┘

What worked:

  • Simple deployment model
  • Easy local development
  • No inter-service communication overhead
  • Transactions work across entire system

What broke:

  • 45 engineers editing same codebase
  • ETL jobs consuming all workers, starving other background jobs
  • Can't deploy connectors without deploying entire app
  • Scaling means scaling everything (can't scale just ETL layer)

Target Microservices Architecture

After 6 weeks of domain-driven design workshops:

%%{init: {'theme':'base', 'themeVariables': {
  'primaryColor':'#e3f2fd',
  'primaryTextColor':'#0d47a1',
  'primaryBorderColor':'#1976d2',
  'secondaryColor':'#f3e5f5',
  'secondaryTextColor':'#4a148c',
  'tertiaryColor':'#fff3e0',
  'tertiaryTextColor':'#e65100',
  'quaternaryColor':'#e8f5e9',
  'quaternaryTextColor':'#1b5e20'
}}}%%
graph TB
    A[API Gateway<br/>Node.js] --> B[Auth Service<br/>Go]
    A --> C[Connector Service<br/>Go]
    A --> D[Pipeline Service<br/>Go]
    A --> E[User Management<br/>Rails]
    A --> F[Billing Service<br/>Rails]

    C --> G[(Connector Registry<br/>PostgreSQL)]
    D --> H[(Pipeline State<br/>PostgreSQL)]
    E --> I[(Users/Orgs<br/>PostgreSQL)]
    F --> J[(Billing Data<br/>PostgreSQL)]

    K[ETL Workers<br/>Go] --> D
    K --> L[(Task Queue<br/>RabbitMQ)]

    M[Event Bus<br/>Kafka] --> C
    M --> D
    M --> F

    N[Frontend<br/>React] --> A

    style A fill:#e3f2fd,stroke:#1976d2,color:#0d47a1
    style B fill:#f3e5f5,stroke:#7b1fa2,color:#4a148c
    style M fill:#fff3e0,stroke:#f57c00,color:#e65100
    style K fill:#e8f5e9,stroke:#43a047,color:#1b5e20

Service Decomposition Strategy

Final services (12 total):

  1. API Gateway (Node.js)

    • Single entry point
    • Request routing
    • Rate limiting
    • Request/response transformation
  2. Auth Service (Go)

    • Authentication (OAuth, SSO)
    • Authorization (RBAC)
    • JWT token generation
    • Session management
  3. User Management Service (Rails) - kept existing code

    • User/organization CRUD
    • Team management
    • User preferences
  4. Connector Service (Go) - rewritten for performance

    • Connector registry (550+ data sources)
    • Connector configuration
    • Connection testing
    • Credential management (encrypted)
  5. Pipeline Service (Go) - rewritten

    • Pipeline configuration
    • Scheduling
    • State management
    • Orchestration
  6. ETL Workers (Go) - rewritten, horizontally scalable

    • Data extraction
    • Transformations
    • Loading
    • Error handling
  7. Billing Service (Rails) - kept existing

    • Subscription management
    • Usage tracking
    • Invoice generation
    • Payment processing (Stripe integration)
  8. Notification Service (Go)

    • Email notifications
    • Webhook delivery
    • Alert management
    • Delivery retries
  9. Audit Service (Go)

    • Compliance logging
    • User activity tracking
    • System event logging
  10. Reporting Service (Python)

    • Analytics aggregation
    • Dashboard data
    • Export generation
  11. Search Service (Go + Elasticsearch)

    • Connector search
    • Pipeline search
    • Log search
  12. Admin Service (Rails)

    • Internal admin tools
    • Customer support features
    • Feature flags

Service Communication Patterns

Synchronous (HTTP/REST):

  • API Gateway → All services
  • Frontend → API Gateway only
  • Service-to-service for queries (rare)

Asynchronous (Event-driven via Kafka):

  • Pipeline events: created, started, completed, failed
  • Connector events: tested, configured
  • User events: created, deleted
  • Billing events: subscription changed, usage recorded

Message Queue (RabbitMQ):

  • ETL task distribution to workers
  • Retry logic for failed tasks
  • Priority queuing

Design Decision:

  • Synchronous for queries (need immediate response)
  • Asynchronous for commands/events (eventual consistency OK)
  • Message queue for work distribution (durable, retry-able)

Data Management Strategy

Critical decision: Database per service or shared database?

Chosen: Hybrid approach

Separate databases:

  • Connector Service (own PostgreSQL)
  • Pipeline Service (own PostgreSQL)
  • ETL State (own PostgreSQL)
  • Auth Service (own PostgreSQL)

Shared database (legacy Rails):

  • User Management
  • Billing
  • Admin

Why hybrid:

  • Full database isolation too expensive (12 databases to manage)
  • Some services tightly coupled (User Management + Billing)
  • Allowed gradual migration (shared DB for services not yet decomposed)

Data consistency approach:

  • Within service: ACID transactions
  • Across services: Eventual consistency via events
  • Critical flows: Saga pattern for distributed transactions

Example: Pipeline Execution Flow

When user triggers data pipeline:

1. Frontend → API Gateway
2. API Gateway → Auth Service (validate token)
3. API Gateway → Pipeline Service (create pipeline run)
4. Pipeline Service:
   - Write to own database (pipeline_run record)
   - Publish "PipelineRunCreated" event to Kafka
   - Break pipeline into tasks
   - Publish tasks to RabbitMQ
5. ETL Workers (multiple instances):
   - Consume tasks from RabbitMQ
   - Execute ETL logic
   - Update state in Pipeline Service (HTTP)
   - Publish progress events to Kafka
6. Notification Service:
   - Consumes "PipelineRunCompleted" event
   - Sends email/webhook to user
7. Billing Service:
   - Consumes "PipelineRunCompleted" event
   - Records usage for billing

Distributed transaction handling:

If step 5 fails after step 4 published event:

  • Saga coordinator detects failure
  • Compensating transaction: Mark pipeline run as failed
  • Publish "PipelineRunFailed" event
  • Notification service sends failure alert

The Implementation: 14-Month Journey

Phase 1: Planning & Strangler Pattern Setup (3 months)

Months 1-2: Service Boundary Design

  • Domain-driven design workshops
  • Identified bounded contexts
  • Decided service granularity (not too fine, not too coarse)
  • Drew service dependency graph

Month 3: Infrastructure Foundation

  • Kubernetes cluster setup (AWS EKS)
  • CI/CD pipelines (GitHub Actions)
  • Service mesh (Istio)
  • Observability stack (Prometheus, Grafana, Jaeger)
  • Event bus (Kafka)
  • Message queue (RabbitMQ)

Strangler pattern:

  • API Gateway routes new services OR legacy monolith
  • Gradual migration, not big bang
  • Can roll back individual services without full rollback

Phase 2: Core Services Extraction (6 months)

Priority order (by value and independence):

  1. ETL Workers (Month 4-5)

    • Biggest pain point
    • Most independent (can extract cleanly)
    • Immediate performance gains
  2. Connector Service (Month 6-7)

    • High value (customer-facing)
    • Clear boundaries
    • Can iterate faster when separate
  3. Pipeline Service (Month 8-9)

    • Orchestrates ETL, depends on workers (built after)
    • Complex state management

Parallel work:

  • Auth Service (Month 4-6) - foundational, all services need it
  • Notification Service (Month 7) - simple, good learning service

Kept in monolith (for now):

  • User Management
  • Billing
  • Admin tools

Why: Tightly coupled, lower value to extract, can wait.

Phase 3: Traffic Migration (3 months)

Gradual rollout:

  • Week 1-2: 5% traffic to microservices
  • Week 3-4: 25%
  • Week 5-6: 50%
  • Week 7-8: 100% for new services, monolith for rest

Canary deployment:

  • Each service deployed to 10% of pods first
  • Monitor error rates, latency, resource usage
  • Roll back if metrics degrade
  • Full rollout if stable

Phase 4: Operational Maturity (2 months)

Observability:

  • Distributed tracing (Jaeger)
  • Centralized logging (ELK stack)
  • Metrics dashboards
  • Alerts and on-call rotation

Resilience:

  • Circuit breakers
  • Retry logic with exponential backoff
  • Rate limiting
  • Bulkheads (resource isolation)

The Costs: Real Numbers

Initial Implementation (14 months)

CategoryCostDetails
Engineering time$210,0005 senior engineers, 40% time, 14 months
Infrastructure migration$48,000AWS costs, K8s setup, observability tools
Service mesh & tooling$22,000Istio, Kafka, RabbitMQ setup
Rewrite effort$87,000Go rewrites of ETL, Connector, Pipeline services
Testing & validation$28,000Load testing, integration testing
Migration execution$15,000Traffic cutover, rollback procedures
Total Initial Cost$410,000Actual was $310K, rest opportunity cost

Ongoing Annual Costs

CategoryAnnual IncreaseDetails
Infrastructure+$64,000More services = more compute, networking
Observability tools+$18,000Datadog, PagerDuty, etc.
Operational overhead+$45,000More deployment complexity, on-call burden
Total Annual Increase+$127,000vs. monolith baseline

Previous infrastructure: $96,000/year (monolith on EC2) New infrastructure: $223,000/year (microservices on K8s)

The Results: Was It Worth It?

Performance Improvements

API Latency:

  • p50: 180ms → 95ms (47% improvement)
  • p95: 890ms → 340ms (62% improvement)
  • p99: 2.3s → 680ms (70% improvement)

Why: ETL jobs no longer stealing resources from API requests

ETL Throughput:

  • 12,000 pipelines/hour → 48,000 pipelines/hour (4× increase)
  • Horizontal scaling of workers (was vertical scaling of monolith)

Deployment Frequency:

  • 200/month → 680/month (3.4× increase)
  • Deploy connector updates without touching API
  • Smaller blast radius for changes

Organizational Benefits

Team Autonomy:

  • Connector team deploys independently (15-20 times/week)
  • ETL team owns performance optimization without coordinating
  • Clear ownership boundaries

Technology Flexibility:

  • Go for performance-critical services (3× faster than Rails for ETL)
  • Python for reporting (better ML/data science libraries)
  • Rails for admin tools (rapid development)

Hiring:

  • Attracted senior engineers ("we do microservices")
  • Easier onboarding (own one service vs. entire monolith)

The Honest Downsides

Operational Complexity:

  • 12 services to monitor vs. 1 monolith
  • Distributed debugging (tracing across services)
  • Network failures between services (didn't happen in monolith)
  • Data consistency challenges (eventual consistency is hard)

Cost Increase:

  • Infrastructure: +$127K/year
  • Engineering complexity tax: ~10% developer productivity loss first 6 months

Incident Response:

  • More complex (which service failed?)
  • Longer MTTR initially (had to learn distributed debugging)

Examples of painful incidents:

  1. Kafka outage took down entire platform (single point of failure)
  2. Cascading failures (auth service slow → all services slow)
  3. Data inconsistency (billing event lost → customer overcharged)

ROI Analysis

Benefits:

  • Performance improvements: Reduced infrastructure needed for same throughput = $48K/year savings
  • Faster feature velocity: Ship 3.4× more often = estimated $380K/year in value (faster time to market)
  • Hiring advantage: Attracted 4 senior engineers who cited "modern architecture" = $60K/year in reduced recruiting costs
  • Customer retention: Faster performance = lower churn = $127K/year (estimated)

Total annual benefit: ~$615,000

Costs:

  • Initial: $310,000 (amortized over 3 years = $103K/year)
  • Ongoing: +$127,000/year
  • Total annual cost: $230,000

Net benefit: $385,000/year

ROI: 167% (not spectacular, but positive)

Payback period: 9.6 months

The Honest Answer: Was It Worth It?

CTO's retrospective:

"For where we were (100 people, $18M ARR, growing 60% YoY), yes it was worth it. We couldn't scale the monolith another 2-3 years without major pain. But if we were still 20 people or growing 20%, absolutely not. The operational complexity would have crushed us."

What would have NOT worked:

  • Microservices at 20 people, $2M ARR (way too early)
  • Microservices without strong DevOps culture (need operational maturity)
  • Microservices without domain expertise (service boundaries are hard)

What made it work:

  • Right company size (100 people, multiple teams)
  • Right growth trajectory (needed to scale, had budget)
  • Right technical leadership (CTO had done this before)
  • Strangler pattern (gradual migration, not big bang)

The Lessons: When To (And Not To) Microservices

Green Light Signals (Do It)

100+ engineers, multiple teams stepping on each other in monolith ✅ Different scaling needs (some parts need 10× capacity, others don't) ✅ Organizational structure matches service boundaries (teams own domains) ✅ Strong DevOps culture (can handle operational complexity) ✅ Proven business ($10M+ ARR, not a startup experiment) ✅ Technology diversity needs (some services need Go, some Python, etc.)

Red Light Signals (Don't Do It)

🛑 < 30 engineers (not enough people to own multiple services) 🛑 Unproven product (service boundaries will change, premature optimization) 🛑 Weak infrastructure team (microservices require mature ops) 🛑 Tight coupling (if services call each other synchronously 100 times per request, you just built a distributed monolith) 🛑 "Because Netflix does it" (you are not Netflix) 🛑 Resume-driven development (engineers want it for their resume, not business need)

The Middle Ground: Modular Monolith

Consider this first:

  • Same codebase, clear module boundaries
  • Can extract services later when/if needed
  • 80% of benefits, 20% of complexity

Example modular monolith structure:

app/
├── modules/
│   ├── auth/           # Could become service later
│   ├── connectors/     # Could become service later
│   ├── pipelines/      # Could become service later
│   ├── billing/
│   └── users/
├── shared/
│   ├── database/
│   ├── events/
│   └── utils/

Rules:

  • Modules can't directly access each other's data
  • Communication via defined interfaces
  • Could extract to service without rewrite

When to extract:

  • Module hits scaling limits
  • Team wants independent deployment
  • Different technology makes sense

The Thalamus Approach

SOPHIA's Service Orchestration:

Instead of building custom service mesh and orchestration:

  1. SOPHIA manages inter-service communication
  2. Built-in event routing (no manual Kafka setup)
  3. Automatic retry and circuit breaking
  4. Distributed tracing out of the box

SYNAPTICA for ETL Intelligence:

Instead of custom Go workers:

  1. Neural network-based transformation logic
  2. Adaptive scaling based on load prediction
  3. Self-healing data pipelines

Cost Impact:

ComponentDataFlow ApproachThalamus Approach
Initial build$310,000$180,000
Ongoing infra+$127,000/year+$89,000/year
Operational complexityHighMedium (managed)

Trade-offs:

  • Less control (SOPHIA is opinionated)
  • Faster implementation (6 months vs. 14 months)
  • Lower operational burden (managed services)

Best for: Companies that need microservices benefits without building everything from scratch.

Not for: Companies needing extreme customization or preferring full control.

The Bottom Line

Investment: $310,000 + $127,000/year ongoing ROI: 167% Payback: 9.6 months

But the real question: Should YOU do microservices?

Probably not, if:

  • You're under 50 people
  • Your monolith isn't causing pain
  • You don't have DevOps expertise
  • Your product is still finding product-market fit

Probably yes, if:

  • You're 100+ people with multiple teams
  • Different parts of your system have different scaling needs
  • You have operational maturity
  • You can afford the complexity

The truth nobody tells you:

Microservices solve organizational problems, not technical ones. If you don't have organizational problems (multiple teams stepping on each other, wanting independent deployment), you don't need microservices.

Start with a modular monolith. Extract services when you feel actual pain, not because it's "modern architecture."


Project Timeline: 14 months (design + implementation) Company Size: 100 employees, $18M ARR Total Investment: $310,000 initial + $127,000/year ongoing Performance Gains: 4× ETL throughput, 47% API latency improvement Deployment Frequency: 3.4× increase ROI: 167% Worth it? Yes, but only at this scale and stage

Real company. Real architecture. Real trade-offs. This is what microservices actually look like at mid-market scale.

Related Products:

Related Articles

Case Studies

E-commerce at Scale: From Shopify to Custom Platform

Growing e-commerce business outgrowing Shopify limitations. When migration made sense, custom platform architecture, maintaining sales during transition, and 3.2x revenue growth enabled. $340K investment, 14-month timeline, transformational results.

Read More →

Ready to Build Something Better?

Let's talk about how Thalamus AI can help your business scale with enterprise capabilities at SMB pricing.

Get in Touch