Search Visibility

ASO Testing: How to Know What's Working

A/B testing content for AI search requires different approaches than traditional SEO. Learn how to test effectively, what to test, timeframes that reveal insights, and how to avoid false conclusions from small sample sizes.

January 6, 2025
13 min read
By Thalamus AI

You've optimized your content for AI search. You're following best practices. Your content is thorough and well-structured.

But is it actually working?

Traditional SEO testing was straightforward: Change title tag, wait 2 weeks, check Google rankings.

AI Search Optimization testing is different:

  • AI recommendations don't have "rankings" you can track
  • Small sample sizes make statistical significance hard
  • Causation is harder to prove
  • Time lag between changes and impact varies
  • Multiple AI platforms behave differently

But you still need to test. Otherwise you're optimizing blind.

Let me show you how to test ASO effectively, what actually reveals insights, and how to avoid the false conclusions that come from small sample sizes.

Why ASO Testing Is Harder Than SEO Testing

Traditional SEO Testing

The process:

  1. Change page title
  2. Wait for Google to recrawl (days to weeks)
  3. Check rankings for target keywords
  4. Measure traffic change
  5. Clear result: Rankings up/down, traffic up/down

Why this worked:

  • Rankings were observable (you could see position 5 vs position 3)
  • Large sample sizes (millions of searches)
  • Clear cause and effect
  • Relatively stable (rankings don't fluctuate wildly daily)

ASO Testing Challenges

The problems:

Problem #1: No observable "rankings"

You can't check your "ChatGPT ranking" for "best CRM software."

AI either recommends you or it doesn't. You can test by asking questions, but:

  • You don't know how often AI actually recommends you
  • You don't know your "recommendation share" vs competitors
  • You can't see gradual improvement (ranked #8 → #6 → #4)

Problem #2: Small sample sizes

Traditional SEO:

  • Keyword gets 10,000 searches/month
  • Even 10% ranking improvement = 1,000 additional sessions
  • Easy to measure statistically

AI Search:

  • How many people are asking AI about your category? Unknown
  • How many of those get your business recommended? Unknown
  • How many click through? Unknown
  • Probably dozens to hundreds, not thousands

Small samples = harder to prove significance

Problem #3: Platform diversity

Traditional SEO: Google dominates (90%+ market share)

AI Search: Multiple platforms (ChatGPT, Claude, Perplexity, Gemini, etc.)

To test comprehensively, you need to test across platforms.

Problem #4: Non-deterministic responses

Ask Google same question 10 times: Same results

Ask ChatGPT same question 10 times: Potentially different answers each time

AI responses have variance built in, making testing noisier.

Problem #5: Time lag uncertainty

Traditional SEO: Google recrawls in days/weeks

AI Search: When do AI models incorporate your updated content?

  • Real-time web search: Immediately
  • Training data updates: Months
  • Cached data: Unknown

You might change content today, but impact might not be measurable for weeks or months.

What to Test in ASO

Given these challenges, what's actually testable?

Test #1: AI Recommendation Frequency

What you're testing: How often does AI recommend your business when asked relevant questions?

Method:

1. Create test queries

Develop 20-30 questions real customers might ask:

For CRM company:

  • "What's the best CRM for small businesses?"
  • "I need CRM software that integrates with Gmail"
  • "Affordable CRM for consulting firms"
  • "How do I choose a CRM for my company?"
  • "CRM with good customer support"

Make these realistic. Think like customers, not marketers.

2. Test across platforms

For each question, query:

  • ChatGPT
  • Claude
  • Perplexity
  • Google Gemini
  • Microsoft Copilot

3. Record recommendations

Track whether your business was:

  • Recommended (mentioned by name)
  • Not recommended (competitors mentioned instead)
  • Category mentioned but no specific recommendations

4. Calculate recommendation rate

Baseline (before optimization):

  • 30 questions × 5 platforms = 150 tests
  • Recommended: 12 times
  • Recommendation rate: 8%

After optimization (4 weeks later):

  • 30 questions × 5 platforms = 150 tests
  • Recommended: 24 times
  • Recommendation rate: 16%

Result: 2x improvement in recommendation frequency

Timeframe: Test monthly. ASO changes take time to propagate.

Test #2: Recommendation Quality

What you're testing: Are you being recommended for the right reasons? Does the AI describe you accurately?

Method:

1. Analyze recommendation context

When AI recommends you, what does it say?

Good recommendation: "For small consulting firms, consider [Your CRM]. It's designed for professional services, includes time tracking, and integrates well with QuickBooks."

This shows: AI understands your positioning, target market, and key differentiators.

Weak recommendation: "You could try [Your CRM]."

This shows: AI knows your name but not much else.

Wrong recommendation: "[Your CRM] is popular with enterprise companies needing advanced customization."

This shows: AI has incorrect information about you.

2. Track accuracy metrics

  • Correct positioning: % of recommendations that describe you accurately
  • Key features mentioned: % that mention your actual differentiators
  • Target market correct: % that recommend you to right customer type

3. Optimize for accuracy

If AI describes you inaccurately:

  • Your content might not clearly communicate positioning
  • AI might be pulling from outdated sources
  • Competitors' content might be influencing AI's understanding

Adjust content to improve accuracy.

Test #3: Content Resonance

What you're testing: Which content pieces get cited most by AI?

Method:

1. Create distinct content pieces

Write several articles on related topics:

  • "How to Choose CRM Software: Complete Guide"
  • "CRM Features That Actually Matter for Small Business"
  • "CRM Integration: What You Need to Know"
  • "CRM Pricing: How Much Should You Pay?"

2. Track citations

When AI recommends you, does it reference specific content?

Example AI response: "According to [Your Company]'s guide on CRM pricing, most small businesses pay $15-50 per user monthly..."

This tells you: That guide is resonating with AI.

3. Analyze patterns

Which content gets cited most?

  • Long-form guides vs. specific articles
  • Practical how-to vs. conceptual content
  • Comparison content vs. educational content

4. Double down on what works

If AI frequently cites your comparison content, create more comparisons.

If how-to guides get ignored, maybe your audience prefers different format.

Test #4: Query Intent Matching

What you're testing: Do you appear for the right types of questions?

Method:

1. Categorize queries by intent

Information intent: "What is CRM software?" Comparison intent: "Salesforce vs HubSpot" Solution intent: "Best CRM for real estate agents" Implementation intent: "How to set up CRM"

2. Test across intents

Your goal might be:

  • Appear in 50% of solution intent queries (high priority)
  • Appear in 30% of comparison queries
  • Appear in 20% of information queries (lower priority)
  • Appear in 10% of implementation queries

3. Measure actual performance

Test 10 queries per category:

  • Solution: 3/10 recommendations (30%)
  • Comparison: 5/10 (50%)
  • Information: 1/10 (10%)
  • Implementation: 0/10 (0%)

Analysis:

  • Performing well in comparison (above target)
  • Under-performing in solution queries (need improvement)
  • Information queries performing as expected
  • Implementation queries need work

4. Adjust content strategy

Focus on improving solution-intent content since that's highest priority and currently under-performing.

How to Run ASO Tests

Step 1: Establish Baseline (Month 0)

Before any optimization, measure current state.

Test protocol:

  1. Create 30 test queries
  2. Test each query on 5 platforms
  3. Record: Recommendations, accuracy, context
  4. Calculate baseline metrics

Baseline metrics:

  • Overall recommendation rate: 8%
  • Accuracy rate: 65% (of recommendations)
  • Key feature mentioned: 40%
  • Target market correct: 70%

This is your starting point.

Step 2: Make Changes (Month 1)

Implement one significant change.

Why one change at a time?

  • Can attribute impact to specific change
  • Learn what works
  • Avoid confounding variables

Example changes:

  • Publish comprehensive guide on key topic
  • Restructure homepage content
  • Add detailed product comparison page
  • Improve "About" page clarity
  • Add case studies with specific use cases

Document exactly what you changed and when.

Step 3: Wait for Propagation (Month 1-2)

How long to wait before retesting?

Minimum: 4 weeks

Why:

  • AI platforms need time to discover/process new content
  • Some platforms cache data
  • Training data updates take time
  • Need enough user queries to accumulate

During wait period:

  • Don't make additional major changes
  • Monitor any real-time metrics you have
  • Prepare for retest

Step 4: Retest (Month 2)

Run exact same test protocol:

  • Same 30 queries
  • Same 5 platforms
  • Same recording methodology

New results:

  • Overall recommendation rate: 14% (was 8%)
  • Accuracy rate: 80% (was 65%)
  • Key feature mentioned: 60% (was 40%)
  • Target market correct: 85% (was 70%)

Calculate improvements:

  • Recommendation rate: +75%
  • Accuracy: +15 percentage points
  • Feature mentions: +20 percentage points
  • Target market accuracy: +15 percentage points

Step 5: Analyze Significance

Is this improvement real or random noise?

With small samples, be careful about over-interpreting.

Statistical significance testing:

Baseline: 8% recommendation rate (12 out of 150 tests) New: 14% recommendation rate (21 out of 150 tests)

Chi-square test:

  • Improvement is statistically significant (p < 0.05)
  • Unlikely to be random chance

Rule of thumb for ASO:

  • Sample size of 100+ tests per time period
  • Improvement of 50%+ more meaningful than 10%
  • Test multiple times to confirm trend

Step 6: Repeat and Refine (Month 3+)

Continue monthly testing:

  • Track trends over time
  • Make iterative improvements
  • Build understanding of what works

After 6 months:

  • Clear pattern of what content AI favors
  • Recommendation rate stable at 18-22%
  • Accuracy consistently 80%+

This is defensible data showing ASO is working.

Common Testing Pitfalls

Pitfall #1: Testing Too Soon

The mistake: Change content, test 3 days later, see no improvement, conclude it doesn't work.

The problem: AI platforms haven't processed your changes yet.

Better approach: Wait 4+ weeks minimum before retesting.

Pitfall #2: Too Small Sample Size

The mistake: Test 5 queries, get recommended 2 times, declare success.

The problem: 5 queries is too small. Could be luck.

Better approach: 20-30 queries minimum per test cycle.

Pitfall #3: Changing Multiple Things

The mistake: Rewrite entire website, add 10 new pages, restructure everything. Then test.

The problem: What caused any improvement? No idea.

Better approach: One major change per test cycle. Learn from each change.

Pitfall #4: Confirmation Bias

The mistake: Cherry-pick queries that show improvement, ignore ones that got worse.

The problem: Seeing what you want to see, not reality.

Better approach: Pre-define query set. Test all queries every time. Report all results.

Pitfall #5: Ignoring Platform Differences

The mistake: Test only ChatGPT, optimize only for ChatGPT.

The problem: Users use multiple platforms. What works on ChatGPT might not work elsewhere.

Better approach: Test across multiple platforms. Optimize for general AI search, not one platform.

Pitfall #6: Optimizing for Test Queries Only

The mistake: Get good at predicting which queries you'll test, optimize content specifically for those.

The problem: Real users ask different questions.

Better approach: Regularly rotate test queries. Use variations. Think like real customers.

Advanced Testing Approaches

A/B Testing Different Content Approaches

For companies with multiple pages:

Test: Two different content approaches on similar pages.

Example:

Product Page A: Traditional feature list, benefits, pricing Product Page B: Story-driven, use case focused, customer outcomes

Hypothesis: Story-driven content gets recommended more by AI.

Test protocol:

  • Create 15 queries relevant to each page
  • Test monthly for 3 months
  • Track recommendation frequency for each page

Results after 3 months:

  • Page A recommended: 12% of queries
  • Page B recommended: 19% of queries

Conclusion: Story-driven approach performs better. Apply to more pages.

Competitor Benchmarking

Track not just your performance, but competitors.

Method:

For same 30 test queries:

  • Track your recommendations
  • Track Competitor A recommendations
  • Track Competitor B recommendations
  • Track Competitor C recommendations

Competitive landscape:

CompanyRecommendation RateChange
You14%+6% vs last month
Competitor A22%+2%
Competitor B18%-1%
Competitor C8%+1%

Analysis:

  • You're improving faster than competitors
  • Competitor A still leads but growth slowing
  • Your rate of improvement suggests you could match them in 4-6 months

This contextualizes your performance.

Time Series Analysis

Track metrics over extended period.

6-month tracking:

MonthRec RateAccuracyFeature Mentions
Month 18%65%40%
Month 214%80%60%
Month 316%82%65%
Month 417%85%70%
Month 519%88%72%
Month 620%90%75%

Trend analysis:

  • Consistent upward trajectory
  • Diminishing returns (gains getting smaller)
  • Possibly approaching plateau

Insights:

  • ASO efforts are working
  • May need different strategies to continue improvement
  • Current approach yielding steady but declining marginal gains

What Good ASO Testing Data Looks Like

After 6 months of systematic testing, you should have:

1. Baseline and current metrics

  • Started: 8% recommendation rate
  • Current: 20% recommendation rate
  • 2.5x improvement

2. Trend data

  • Month-by-month improvement
  • Accelerating, stable, or decelerating trend

3. Platform breakdown

  • ChatGPT: 25% recommendation rate
  • Claude: 20%
  • Perplexity: 18%
  • Gemini: 15%
  • Copilot: 12%

4. Content insights

  • Long-form guides: 30% citation rate
  • Comparison pages: 25%
  • How-to articles: 20%
  • Product pages: 15%

5. Query intent performance

  • Solution queries: 35% (highest)
  • Comparison queries: 22%
  • Information queries: 15%
  • Implementation queries: 10%

6. Accuracy metrics

  • 90% of recommendations describe you correctly
  • 75% mention key differentiators
  • 85% recommend to correct target market

This is data you can use to:

  • Prove ASO is working
  • Guide future content decisions
  • Identify opportunities for improvement
  • Justify continued investment

The Bottom Line

ASO testing is harder than traditional SEO testing because:

  • No observable rankings
  • Smaller sample sizes
  • Non-deterministic AI responses
  • Multiple platforms to test
  • Uncertain time lags

But it's still necessary and possible.

The testing framework:

1. Monthly testing cycle

  • 30 test queries
  • 5 platforms
  • 150 total tests per cycle

2. Track multiple metrics

  • Recommendation rate (primary)
  • Recommendation accuracy
  • Feature mentions
  • Target market accuracy

3. Make one significant change per cycle

  • Learn what works
  • Attribute improvements correctly

4. Wait 4+ weeks between changes

  • Allow time for propagation
  • Get meaningful samples

5. Test consistently over 6+ months

  • Build trend data
  • Identify patterns
  • Prove causation

6. Be rigorous about avoiding pitfalls

  • No cherry-picking
  • Sufficient sample sizes
  • Test across platforms
  • Control for confounding variables

With systematic testing, you can:

  • Prove ASO is working (or not)
  • Identify what content approaches work best
  • Optimize based on data, not guesses
  • Justify continued investment in ASO

Most businesses aren't testing AI search at all. Any systematic testing gives you an advantage.

Start simple: 20 queries, 3 platforms, monthly testing. That's better than nothing.

Get more sophisticated over time as you learn what insights matter most for your business.

ASO without testing is just guessing. Test to know what's actually working.

Related Products:

Related Articles

Ready to Build Something Better?

Let's talk about how Thalamus AI can help your business scale with enterprise capabilities at SMB pricing.

Get in Touch