ASO Testing: How to Know What's Working
A/B testing content for AI search requires different approaches than traditional SEO. Learn how to test effectively, what to test, timeframes that reveal insights, and how to avoid false conclusions from small sample sizes.
You've optimized your content for AI search. You're following best practices. Your content is thorough and well-structured.
But is it actually working?
Traditional SEO testing was straightforward: Change title tag, wait 2 weeks, check Google rankings.
AI Search Optimization testing is different:
- AI recommendations don't have "rankings" you can track
- Small sample sizes make statistical significance hard
- Causation is harder to prove
- Time lag between changes and impact varies
- Multiple AI platforms behave differently
But you still need to test. Otherwise you're optimizing blind.
Let me show you how to test ASO effectively, what actually reveals insights, and how to avoid the false conclusions that come from small sample sizes.
Why ASO Testing Is Harder Than SEO Testing
Traditional SEO Testing
The process:
- Change page title
- Wait for Google to recrawl (days to weeks)
- Check rankings for target keywords
- Measure traffic change
- Clear result: Rankings up/down, traffic up/down
Why this worked:
- Rankings were observable (you could see position 5 vs position 3)
- Large sample sizes (millions of searches)
- Clear cause and effect
- Relatively stable (rankings don't fluctuate wildly daily)
ASO Testing Challenges
The problems:
Problem #1: No observable "rankings"
You can't check your "ChatGPT ranking" for "best CRM software."
AI either recommends you or it doesn't. You can test by asking questions, but:
- You don't know how often AI actually recommends you
- You don't know your "recommendation share" vs competitors
- You can't see gradual improvement (ranked #8 → #6 → #4)
Problem #2: Small sample sizes
Traditional SEO:
- Keyword gets 10,000 searches/month
- Even 10% ranking improvement = 1,000 additional sessions
- Easy to measure statistically
AI Search:
- How many people are asking AI about your category? Unknown
- How many of those get your business recommended? Unknown
- How many click through? Unknown
- Probably dozens to hundreds, not thousands
Small samples = harder to prove significance
Problem #3: Platform diversity
Traditional SEO: Google dominates (90%+ market share)
AI Search: Multiple platforms (ChatGPT, Claude, Perplexity, Gemini, etc.)
To test comprehensively, you need to test across platforms.
Problem #4: Non-deterministic responses
Ask Google same question 10 times: Same results
Ask ChatGPT same question 10 times: Potentially different answers each time
AI responses have variance built in, making testing noisier.
Problem #5: Time lag uncertainty
Traditional SEO: Google recrawls in days/weeks
AI Search: When do AI models incorporate your updated content?
- Real-time web search: Immediately
- Training data updates: Months
- Cached data: Unknown
You might change content today, but impact might not be measurable for weeks or months.
What to Test in ASO
Given these challenges, what's actually testable?
Test #1: AI Recommendation Frequency
What you're testing: How often does AI recommend your business when asked relevant questions?
Method:
1. Create test queries
Develop 20-30 questions real customers might ask:
For CRM company:
- "What's the best CRM for small businesses?"
- "I need CRM software that integrates with Gmail"
- "Affordable CRM for consulting firms"
- "How do I choose a CRM for my company?"
- "CRM with good customer support"
Make these realistic. Think like customers, not marketers.
2. Test across platforms
For each question, query:
- ChatGPT
- Claude
- Perplexity
- Google Gemini
- Microsoft Copilot
3. Record recommendations
Track whether your business was:
- Recommended (mentioned by name)
- Not recommended (competitors mentioned instead)
- Category mentioned but no specific recommendations
4. Calculate recommendation rate
Baseline (before optimization):
- 30 questions × 5 platforms = 150 tests
- Recommended: 12 times
- Recommendation rate: 8%
After optimization (4 weeks later):
- 30 questions × 5 platforms = 150 tests
- Recommended: 24 times
- Recommendation rate: 16%
Result: 2x improvement in recommendation frequency
Timeframe: Test monthly. ASO changes take time to propagate.
Test #2: Recommendation Quality
What you're testing: Are you being recommended for the right reasons? Does the AI describe you accurately?
Method:
1. Analyze recommendation context
When AI recommends you, what does it say?
Good recommendation: "For small consulting firms, consider [Your CRM]. It's designed for professional services, includes time tracking, and integrates well with QuickBooks."
This shows: AI understands your positioning, target market, and key differentiators.
Weak recommendation: "You could try [Your CRM]."
This shows: AI knows your name but not much else.
Wrong recommendation: "[Your CRM] is popular with enterprise companies needing advanced customization."
This shows: AI has incorrect information about you.
2. Track accuracy metrics
- Correct positioning: % of recommendations that describe you accurately
- Key features mentioned: % that mention your actual differentiators
- Target market correct: % that recommend you to right customer type
3. Optimize for accuracy
If AI describes you inaccurately:
- Your content might not clearly communicate positioning
- AI might be pulling from outdated sources
- Competitors' content might be influencing AI's understanding
Adjust content to improve accuracy.
Test #3: Content Resonance
What you're testing: Which content pieces get cited most by AI?
Method:
1. Create distinct content pieces
Write several articles on related topics:
- "How to Choose CRM Software: Complete Guide"
- "CRM Features That Actually Matter for Small Business"
- "CRM Integration: What You Need to Know"
- "CRM Pricing: How Much Should You Pay?"
2. Track citations
When AI recommends you, does it reference specific content?
Example AI response: "According to [Your Company]'s guide on CRM pricing, most small businesses pay $15-50 per user monthly..."
This tells you: That guide is resonating with AI.
3. Analyze patterns
Which content gets cited most?
- Long-form guides vs. specific articles
- Practical how-to vs. conceptual content
- Comparison content vs. educational content
4. Double down on what works
If AI frequently cites your comparison content, create more comparisons.
If how-to guides get ignored, maybe your audience prefers different format.
Test #4: Query Intent Matching
What you're testing: Do you appear for the right types of questions?
Method:
1. Categorize queries by intent
Information intent: "What is CRM software?" Comparison intent: "Salesforce vs HubSpot" Solution intent: "Best CRM for real estate agents" Implementation intent: "How to set up CRM"
2. Test across intents
Your goal might be:
- Appear in 50% of solution intent queries (high priority)
- Appear in 30% of comparison queries
- Appear in 20% of information queries (lower priority)
- Appear in 10% of implementation queries
3. Measure actual performance
Test 10 queries per category:
- Solution: 3/10 recommendations (30%)
- Comparison: 5/10 (50%)
- Information: 1/10 (10%)
- Implementation: 0/10 (0%)
Analysis:
- Performing well in comparison (above target)
- Under-performing in solution queries (need improvement)
- Information queries performing as expected
- Implementation queries need work
4. Adjust content strategy
Focus on improving solution-intent content since that's highest priority and currently under-performing.
How to Run ASO Tests
Step 1: Establish Baseline (Month 0)
Before any optimization, measure current state.
Test protocol:
- Create 30 test queries
- Test each query on 5 platforms
- Record: Recommendations, accuracy, context
- Calculate baseline metrics
Baseline metrics:
- Overall recommendation rate: 8%
- Accuracy rate: 65% (of recommendations)
- Key feature mentioned: 40%
- Target market correct: 70%
This is your starting point.
Step 2: Make Changes (Month 1)
Implement one significant change.
Why one change at a time?
- Can attribute impact to specific change
- Learn what works
- Avoid confounding variables
Example changes:
- Publish comprehensive guide on key topic
- Restructure homepage content
- Add detailed product comparison page
- Improve "About" page clarity
- Add case studies with specific use cases
Document exactly what you changed and when.
Step 3: Wait for Propagation (Month 1-2)
How long to wait before retesting?
Minimum: 4 weeks
Why:
- AI platforms need time to discover/process new content
- Some platforms cache data
- Training data updates take time
- Need enough user queries to accumulate
During wait period:
- Don't make additional major changes
- Monitor any real-time metrics you have
- Prepare for retest
Step 4: Retest (Month 2)
Run exact same test protocol:
- Same 30 queries
- Same 5 platforms
- Same recording methodology
New results:
- Overall recommendation rate: 14% (was 8%)
- Accuracy rate: 80% (was 65%)
- Key feature mentioned: 60% (was 40%)
- Target market correct: 85% (was 70%)
Calculate improvements:
- Recommendation rate: +75%
- Accuracy: +15 percentage points
- Feature mentions: +20 percentage points
- Target market accuracy: +15 percentage points
Step 5: Analyze Significance
Is this improvement real or random noise?
With small samples, be careful about over-interpreting.
Statistical significance testing:
Baseline: 8% recommendation rate (12 out of 150 tests) New: 14% recommendation rate (21 out of 150 tests)
Chi-square test:
- Improvement is statistically significant (p < 0.05)
- Unlikely to be random chance
Rule of thumb for ASO:
- Sample size of 100+ tests per time period
- Improvement of 50%+ more meaningful than 10%
- Test multiple times to confirm trend
Step 6: Repeat and Refine (Month 3+)
Continue monthly testing:
- Track trends over time
- Make iterative improvements
- Build understanding of what works
After 6 months:
- Clear pattern of what content AI favors
- Recommendation rate stable at 18-22%
- Accuracy consistently 80%+
This is defensible data showing ASO is working.
Common Testing Pitfalls
Pitfall #1: Testing Too Soon
The mistake: Change content, test 3 days later, see no improvement, conclude it doesn't work.
The problem: AI platforms haven't processed your changes yet.
Better approach: Wait 4+ weeks minimum before retesting.
Pitfall #2: Too Small Sample Size
The mistake: Test 5 queries, get recommended 2 times, declare success.
The problem: 5 queries is too small. Could be luck.
Better approach: 20-30 queries minimum per test cycle.
Pitfall #3: Changing Multiple Things
The mistake: Rewrite entire website, add 10 new pages, restructure everything. Then test.
The problem: What caused any improvement? No idea.
Better approach: One major change per test cycle. Learn from each change.
Pitfall #4: Confirmation Bias
The mistake: Cherry-pick queries that show improvement, ignore ones that got worse.
The problem: Seeing what you want to see, not reality.
Better approach: Pre-define query set. Test all queries every time. Report all results.
Pitfall #5: Ignoring Platform Differences
The mistake: Test only ChatGPT, optimize only for ChatGPT.
The problem: Users use multiple platforms. What works on ChatGPT might not work elsewhere.
Better approach: Test across multiple platforms. Optimize for general AI search, not one platform.
Pitfall #6: Optimizing for Test Queries Only
The mistake: Get good at predicting which queries you'll test, optimize content specifically for those.
The problem: Real users ask different questions.
Better approach: Regularly rotate test queries. Use variations. Think like real customers.
Advanced Testing Approaches
A/B Testing Different Content Approaches
For companies with multiple pages:
Test: Two different content approaches on similar pages.
Example:
Product Page A: Traditional feature list, benefits, pricing Product Page B: Story-driven, use case focused, customer outcomes
Hypothesis: Story-driven content gets recommended more by AI.
Test protocol:
- Create 15 queries relevant to each page
- Test monthly for 3 months
- Track recommendation frequency for each page
Results after 3 months:
- Page A recommended: 12% of queries
- Page B recommended: 19% of queries
Conclusion: Story-driven approach performs better. Apply to more pages.
Competitor Benchmarking
Track not just your performance, but competitors.
Method:
For same 30 test queries:
- Track your recommendations
- Track Competitor A recommendations
- Track Competitor B recommendations
- Track Competitor C recommendations
Competitive landscape:
| Company | Recommendation Rate | Change |
|---|---|---|
| You | 14% | +6% vs last month |
| Competitor A | 22% | +2% |
| Competitor B | 18% | -1% |
| Competitor C | 8% | +1% |
Analysis:
- You're improving faster than competitors
- Competitor A still leads but growth slowing
- Your rate of improvement suggests you could match them in 4-6 months
This contextualizes your performance.
Time Series Analysis
Track metrics over extended period.
6-month tracking:
| Month | Rec Rate | Accuracy | Feature Mentions |
|---|---|---|---|
| Month 1 | 8% | 65% | 40% |
| Month 2 | 14% | 80% | 60% |
| Month 3 | 16% | 82% | 65% |
| Month 4 | 17% | 85% | 70% |
| Month 5 | 19% | 88% | 72% |
| Month 6 | 20% | 90% | 75% |
Trend analysis:
- Consistent upward trajectory
- Diminishing returns (gains getting smaller)
- Possibly approaching plateau
Insights:
- ASO efforts are working
- May need different strategies to continue improvement
- Current approach yielding steady but declining marginal gains
What Good ASO Testing Data Looks Like
After 6 months of systematic testing, you should have:
1. Baseline and current metrics
- Started: 8% recommendation rate
- Current: 20% recommendation rate
- 2.5x improvement
2. Trend data
- Month-by-month improvement
- Accelerating, stable, or decelerating trend
3. Platform breakdown
- ChatGPT: 25% recommendation rate
- Claude: 20%
- Perplexity: 18%
- Gemini: 15%
- Copilot: 12%
4. Content insights
- Long-form guides: 30% citation rate
- Comparison pages: 25%
- How-to articles: 20%
- Product pages: 15%
5. Query intent performance
- Solution queries: 35% (highest)
- Comparison queries: 22%
- Information queries: 15%
- Implementation queries: 10%
6. Accuracy metrics
- 90% of recommendations describe you correctly
- 75% mention key differentiators
- 85% recommend to correct target market
This is data you can use to:
- Prove ASO is working
- Guide future content decisions
- Identify opportunities for improvement
- Justify continued investment
The Bottom Line
ASO testing is harder than traditional SEO testing because:
- No observable rankings
- Smaller sample sizes
- Non-deterministic AI responses
- Multiple platforms to test
- Uncertain time lags
But it's still necessary and possible.
The testing framework:
1. Monthly testing cycle
- 30 test queries
- 5 platforms
- 150 total tests per cycle
2. Track multiple metrics
- Recommendation rate (primary)
- Recommendation accuracy
- Feature mentions
- Target market accuracy
3. Make one significant change per cycle
- Learn what works
- Attribute improvements correctly
4. Wait 4+ weeks between changes
- Allow time for propagation
- Get meaningful samples
5. Test consistently over 6+ months
- Build trend data
- Identify patterns
- Prove causation
6. Be rigorous about avoiding pitfalls
- No cherry-picking
- Sufficient sample sizes
- Test across platforms
- Control for confounding variables
With systematic testing, you can:
- Prove ASO is working (or not)
- Identify what content approaches work best
- Optimize based on data, not guesses
- Justify continued investment in ASO
Most businesses aren't testing AI search at all. Any systematic testing gives you an advantage.
Start simple: 20 queries, 3 platforms, monthly testing. That's better than nothing.
Get more sophisticated over time as you learn what insights matter most for your business.
ASO without testing is just guessing. Test to know what's actually working.