The AI Project That Delivered Nothing: What We Learned
Investing $145K in AI capabilities that never provided business value. Misaligned expectations, poor problem selection, inadequate data, and lessons about AI readiness vs. AI hype.
The AI Project That Delivered Nothing: What We Learned
Here's the AI failure nobody talks about: you spend $145,000 and 9 months building "AI-powered" capabilities, launch with fanfare, and six months later quietly shut it down because it delivered zero business value.
We worked with a 60-person e-commerce company—call them ShopCo—that got caught in the AI hype cycle. Their competitors were touting "AI-powered personalization" and "machine learning recommendations." The board asked: "Why don't we have AI?"
So they built AI.
The project: AI-powered product recommendations and demand forecasting The investment: $145,000 over 9 months The result: Recommendations were worse than their existing rule-based system, forecasts were less accurate than simple moving averages The outcome: Turned off AI features, went back to simpler approaches, $145K written off as expensive lesson
This is the complete story of an AI project that failed not because the technology didn't work, but because they answered the wrong questions, chose the wrong problems, and didn't have the foundation needed for AI to succeed.
Spoiler: Two years later, they successfully implemented AI—spending $60K on a much simpler problem with much better data. The difference? They learned what AI can actually do vs. what the hype promises.
The Setup: Why They Jumped on AI
First, understand why they thought they needed AI.
The Trigger
Board meeting, Q2 2023:
- Competitor announced "AI-powered personalization engine"
- Another competitor: "Machine learning inventory optimization"
- Board member: "Are we falling behind on AI?"
- CEO: "We need to invest in AI capabilities"
Classic mistake: Decided they needed AI before identifying what problem AI would solve.
The Initial Vision
What they imagined:
- Amazon-level product recommendations
- Netflix-quality personalization
- Accurate demand forecasting
- Inventory optimization
- Competitive advantage through "AI"
The pitch deck (yes, they made a deck):
- "AI will increase conversion 15-20%"
- "Forecasting accuracy improvement of 30%"
- "Reduced inventory carrying costs"
- "Industry-leading customer experience"
- Impressive charts and buzzwords
Where these numbers came from: Vendor white papers and competitor press releases.
Phase 1: The AI Recommendation System (Months 1-5)
The Problem (as defined)
Current state:
- Product recommendations based on simple rules
- "Customers who bought X also bought Y" (basic co-occurrence)
- Category-based suggestions
- Worked okay (2.3% clickthrough, 8% conversion on recommendations)
Desired state:
- "AI-powered personalization"
- Consider customer behavior, preferences, context
- Real-time personalization
- "Amazon-level recommendations"
The Solution (as built)
Hired AI consulting firm ($85K for this phase):
- Promised "cutting-edge machine learning"
- Collaborative filtering algorithms
- Deep learning neural networks
- Real-time recommendation engine
Tech stack:
- Python + TensorFlow
- AWS SageMaker
- Real-time inference endpoint
- "State of the art" architecture
The build process:
Month 1-2: Data preparation
- Collected 2 years of purchase history
- User behavior data (clicks, views, cart adds)
- Product catalog data
- Customer demographics
Month 3-4: Model development
- Tried 5 different algorithms
- Collaborative filtering (user-user and item-item)
- Content-based filtering
- Hybrid approach with neural networks
- Lots of hyperparameter tuning
Month 5: Deployment
- Real-time API endpoint
- Integration with website
- A/B test setup
- Monitoring dashboard
Launched with excitement.
The Results (Disappointing)
A/B test results (30 days, 50/50 traffic split):
| Metric | Old System | AI System | Change |
|---|---|---|---|
| Rec clickthrough | 2.3% | 1.8% | -22% |
| Conversion on rec | 8.1% | 6.4% | -21% |
| Avg order value | $87 | $83 | -5% |
| Customer satisfaction | 4.2/5 | 4.0/5 | -5% |
AI recommendations were worse across every metric.
What Went Wrong
Post-mortem revealed:
1. Data quality was poor:
- Purchase history had noise (gifts, one-time purchases, returns not marked)
- Behavior data was incomplete (30% of users blocked cookies)
- Many products had sparse data (only sold a few times)
- Seasonality wasn't handled well
2. Cold start problem:
- New products: no data, couldn't recommend
- New users: no history, couldn't personalize
- Fell back to popularity (which simple system already did)
- 65% of recommendations were fallback (not AI)
3. Model assumptions didn't match reality:
- Assumed taste profiles are stable (they change with seasons, trends)
- Assumed similar users like similar products (many exceptions)
- Didn't handle gift purchases well (bought for others, not self)
- Context-blind (same recommendations for browsing vs. buying mode)
4. Technical problems:
- Model inference slow (350ms average, target was 100ms)
- Recommendation cache got stale
- Cost: $1,200/month AWS SageMaker fees
- Crashed twice during peak traffic
5. Simpler was actually better:
- "Frequently bought together" was 3x more effective
- Category suggestions based on current browse were 2x more effective
- Simple rules were fast, reliable, and worked
Decision: Turned off AI recommendations, went back to rule-based system.
Cost: $85,000 + 5 months
Phase 2: The AI Demand Forecasting (Months 6-9)
"Okay, recommendations didn't work, but forecasting will be perfect for AI!"
The Problem (as defined)
Current state:
- Demand forecasting using 3-month moving average
- Manual adjustments for seasonality
- Purchase orders based on forecast + safety stock
- Inventory turnover: 4.2x annually
- Stockout rate: 6.8%
- Overstock rate: 12.3%
Desired state:
- AI predicts demand with "30% better accuracy"
- Optimal inventory levels
- Reduce stockouts and overstock
- Better cash flow
The Solution (as built)
Continued with AI consulting firm ($45K this phase):
- "Machine learning is perfect for time-series forecasting"
- LSTM neural networks
- Ensemble models
- "Industry-leading accuracy"
The build:
Month 6: Data collection
- 3 years of sales history
- 2,400 SKUs
- Seasonal patterns
- Promotion history
- External factors (weather, trends, etc.)
Month 7: Model development
- LSTM for time-series
- Prophet (Facebook's forecasting tool)
- Traditional ARIMA for comparison
- Ensemble combining all three
Month 8: Validation and tuning
- Back-testing on historical data
- Hyperparameter optimization
- Cross-validation
- Looked promising in testing!
Month 9: Production deployment
- Automated forecasting pipeline
- Weekly forecast generation
- Integration with purchasing system
- Monitoring and alerts
Launched with cautious optimism.
The Results (Also Disappointing)
90-day evaluation vs. simple moving average:
| Metric | Moving Avg | AI Forecast | Change |
|---|---|---|---|
| Forecast accuracy (MAPE) | 23.4% | 26.1% | -12% worse |
| Stockout rate | 6.8% | 8.2% | +21% worse |
| Overstock rate | 12.3% | 14.7% | +20% worse |
| Inventory turns | 4.2x | 3.8x | -10% worse |
| Cost | $0 | $800/month | Infinite worse |
AI forecasting was worse than the simple 3-month moving average.
What Went Wrong (Again)
Another post-mortem:
1. Not enough data:
- Many SKUs only had 1-2 years sales history
- New products: no historical data
- Seasonal products: only 3-4 data points per year
- AI models need lots of data, didn't have it
2. External factors were unpredictable:
- Model tried to learn from weather, social trends, competitor actions
- These relationships were noisy or non-existent
- Added complexity without value
3. Promotion effects misunderstood:
- AI couldn't distinguish organic demand from promotion-driven spikes
- Forecasts assumed promotions would repeat
- Real purchasing decisions were more strategic
4. Simple worked better:
- Moving average was stable, understandable
- Easy to adjust manually for known factors
- Buyers had domain knowledge AI didn't capture
- "Black box" AI couldn't explain its predictions
5. Overfitting:
- Models fit historical data well
- Failed on future data (what matters)
- Classic overfitting problem
Decision: Abandoned AI forecasting, went back to moving averages + buyer judgment.
Cost: $45,000 + 4 months
The Total Failure
Financial Cost
| Phase | Investment | Result | Value |
|---|---|---|---|
| AI Recommendations | $85,000 | Turned off after 30 days | $0 |
| AI Forecasting | $45,000 | Abandoned after 90 days | $0 |
| Opportunity cost | $15,000 | Delayed actual improvements | Negative |
| Total | $145,000 | Nothing deployed | $0 |
Organizational Cost
Beyond money:
- 9 months focused on AI instead of real improvements
- Team morale hit ("we spent all that for nothing?")
- Board skepticism ("AI doesn't work")
- Delayed simpler initiatives that would have helped
- Reputational cost (quietly removed "AI-powered" from marketing)
The Real Opportunity Cost
What they could have done with $145K and 9 months:
- Improved product photography ($30K) - proven ROI
- Better product descriptions ($15K) - proven ROI
- Faster website (performance optimization: $25K) - proven ROI
- Better search functionality ($35K) - proven ROI
- Email marketing automation ($20K) - proven ROI
- Customer service tooling ($20K) - proven ROI
Estimated combined impact: 15-25% revenue increase
Instead: Chased AI and got nothing.
The Post-Mortem: What We Got Wrong
1. Started with Solution, Not Problem
Mistake: "We need AI" came before "what problem are we solving?"
Should have: Identified business problems, evaluated if AI was right solution
Lesson: AI is a tool, not a goal. Start with problems, not technology.
2. Unrealistic Expectations
Mistake: Expected Amazon/Netflix-level results with fraction of their data and investment
Should have: Understood their data limitations and what's actually achievable
Lesson: Amazon has billions of data points and spent millions building their recommendation engine. You have thousands of data points and $85K. Results will differ.
3. Data Readiness
Mistake: Didn't validate data quality before building models
Should have: Data audit first, model second
Lesson: "Garbage in, garbage out" isn't just a saying. Most AI projects fail on data, not algorithms.
4. Ignored the Baseline
Mistake: Didn't respect that simple solutions were working okay
Should have: Quantified baseline performance, set realistic improvement targets
Lesson: When simple rules achieve 2.3% CTR and you need AI to beat that, you're fighting uphill. AI's advantage is when simple rules don't exist.
5. Wrong Problems for AI
Mistake: Chose problems where AI wasn't actually better solution
Should have: Evaluated if AI was right tool for these specific problems
Lesson: Not every problem is an AI problem. Sometimes simple rules, statistics, or human judgment are better.
6. Consultant Incentives
Mistake: Hired firm that sold AI, expected objective advice
Should have: Separated problem assessment from solution delivery
Lesson: If you ask an AI consulting firm if you need AI, answer is always "yes."
Two Years Later: Successful AI Implementation
The story doesn't end with failure. ShopCo learned and tried again.
The Different Approach (2025)
Problem identification first:
- Customer support receiving same questions repeatedly
- 40% of support tickets were simple FAQs
- Support team spending 15 hours/week answering same questions
- Cost: $35,000 annually in support time
Solution evaluation:
- Could AI chatbot handle FAQs?
- Problem well-suited: narrow domain, lots of examples, clear success metric
- Alternative: better FAQ page (cheaper but less effective)
- Decision: Try AI, but with realistic expectations
Implementation:
- Used OpenAI API (no custom model training)
- $60K for implementation (chatbot + knowledge base integration)
- 6-week timeline
- Clear success metric: Reduce FAQ support tickets by 50%
Results:
- FAQ tickets reduced 67% (better than target)
- Support team time freed: 22 hours/week
- Customer satisfaction maintained (4.1/5)
- Cost: $60K one-time + $400/month API costs
- ROI: Paid for itself in 10 months
Why This Worked
Right problem:
- Narrow, well-defined
- Lots of training data (support ticket history)
- Clear success metric
- AI actually better than alternatives
Realistic expectations:
- Target: 50% reduction (not 90%)
- Knew it wouldn't be perfect
- Human handoff for complex questions
- Measured actual performance, not projected
Appropriate technology:
- Used existing AI (OpenAI) instead of building custom
- Simpler, faster, cheaper
- Focused on integration and UX, not AI research
Data quality:
- Support tickets were well-structured
- Questions were labeled and categorized
- High-quality training data
Lessons for AI Projects
Before Starting Any AI Project
Questions to answer honestly:
-
What specific business problem are we solving?
- Not "we need AI"
- Specific, measurable problem
-
How does the current solution work?
- What's the baseline performance?
- Why isn't it good enough?
-
Why is AI the right solution?
- What can AI do that simple rules/statistics can't?
- What are the alternatives?
-
Do we have the data?
- Quality: Is it accurate and clean?
- Quantity: Do we have enough examples?
- Labeling: Is it properly categorized?
-
What does success look like?
- Specific metrics
- Realistic targets (not vendor promises)
- Timeline for evaluation
-
What's the fallback plan?
- If AI doesn't work, then what?
- Can we fail fast and cheap?
Red Flags for AI Projects
- "We need AI" before identifying specific problem
- Expectations based on Google/Amazon results (you're not Google/Amazon)
- No baseline performance measurement
- Data assessment comes after project starts
- Success metrics are vague ("better personalization")
- Vendor/consultant selling AI, not solving problems
- Board/executive pressure to "do AI"
- No plan B if AI doesn't work
Green Flags for AI Projects
- Specific business problem with measurable impact
- Current solution has clear limitations
- AI demonstrably better than alternatives
- High-quality, abundant data available
- Realistic success metrics set
- Pilot/MVP approach with fast feedback
- Team understands AI capabilities and limitations
- Clear ROI path
When AI Makes Sense (and When It Doesn't)
Based on failure + eventual success:
Good AI Problems
- Large amounts of data (10,000+ examples)
- Narrow, well-defined domain
- Clear success metrics
- Pattern recognition (not prediction)
- Humans struggle with scale (not complexity)
- Acceptable error rate
- Lots of examples to learn from
Bad AI Problems
- Small datasets (hundreds of examples)
- Broad, ill-defined domain
- Vague success criteria
- Forecasting complex systems with many unknowns
- Simple rules work well enough
- Zero error tolerance required
- Trying to replace human judgment on complex decisions
The Bottom Line
ShopCo spent $145,000 and 9 months on AI projects that delivered zero value.
Then they spent $60,000 and 6 weeks on an AI project that solved a real problem and paid for itself.
The difference wasn't the technology. It was the approach.
First attempt:
- "We need AI" → find problems for AI to solve
- Unrealistic expectations from vendor promises
- Poor data, wrong problems, no baseline
Second attempt:
- Real problem → evaluate if AI is right solution
- Realistic expectations based on actual data
- Right problem, good data, clear metrics
AI isn't magic. It's a tool. Like any tool, it works great for some jobs and terribly for others.
The expensive lesson: Figure out if you have an AI problem before building an AI solution.
We're Thalamus. Enterprise capability without enterprise gatekeeping.
If you're considering AI projects, we should talk. Not to sell you AI (we'll tell you if you don't need it), but to help you identify if AI is the right tool for your actual problems.
Sometimes the most valuable consulting is preventing you from wasting $145K on AI you don't need.
And sometimes the best AI strategy is knowing when not to use AI.