From Detection to Proof: The Causal Evidence Ladder for AI Growth
Published June 4, 20267 min readGlobal Gravity
By now, the full AI Measurement Partner framework is in place: MMP's structural blind spot (Article 1), the AI traffic iceberg (Article 2), the Signal Bridge (Article 3), and multi-vertical outcome measurement (Article 4).
One final question remains — the hardest one:
You see AI sources growing and conversions growing. But is it "because of" AI, or just "coincidentally at the same time"?
This is the chasm between detection and proof. Crossing it requires not more data, but a higher level of evidence methodology.
#### The CMO's Real Question
The performance marketing team spent months building AI source visibility, connecting the MMP Signal Bridge, and seeing the AI channel conversion funnel. Then they approach the CMO/CFO for budget to increase GEO investment.
The CMO asks: "AI-sourced conversions are indeed growing. But the whole market is growing. How do you prove our GEO investment drove this growth, rather than natural ChatGPT user growth?"
This question is fundamentally about causal inference — separating your contribution from market trends when both happen simultaneously.
This is not unique to AI measurement. The advertising industry has explored causal inference for decades — from A/B testing to Google's Geo Lift and Meta's Conversion Lift. But AI has a unique difficulty: you cannot "turn off" AI's recommendation of your brand — you cannot make ChatGPT recommend you in group A and not in group B.
This means traditional ad causal verification methods need adaptation.
#### The Five-Level Causal Evidence Ladder
In CitationGraph, we structure the path from "observational data" to "causal proof" into five levels:
C0: Observational Correlation. The baseline. AI source visits grew X%, conversions grew Y%. This is correlation, not causation. Evidence strength: Low. Use case: initial trend discovery.
C1: Benchmarked Comparison. Add control baselines to observational data. Compare AI-sourced users vs. non-AI users: KYC completion rates, deposit amounts, LTV. If AI-sourced users show meaningful differences across multiple dimensions, causal evidence strengthens — but selection bias remains possible. Evidence strength: Medium-low.
C2: Trend Discontinuity. Search for "breakpoints" in time — did metrics show a trend-defying jump after a specific event (deploying new GEO content, optimizing Schema, updating llms.txt)? This leverages natural experiments. Evidence strength: Medium.
C3: Statistical Controls (DiD / IV). Use rigorous statistical methods. Difference-in-Differences: find a "treatment group" (product lines where you did GEO optimization) and a "control group" (where you didn't). Compare growth differences over the same period. Instrumental Variables: use exogenous variation in AI crawler frequency as an instrument to estimate AI exposure's causal effect on conversions. Evidence strength: Medium-high.
C4: Controlled Experiments (Geo Lift / Holdout). The gold standard. Design and execute a controlled experiment:
Geo Lift: Select treatment and control regions. Increase GEO investment in treatment regions, hold control regions constant. Compare AI source growth differences. Statistical test for significance. (4-6 week experiment window.)
Holdout: More aggressive but stronger evidence. Pause GEO investment in certain regions or product lines. If AI source metrics decline during pause and recover after resumption — strong causal evidence.
Evidence strength: High — gold standard causal proof. Use case: CFO/Board budget justification.
#### Why Most Companies Stall at C0
Most companies' AI source analysis stops at C0. Three real obstacles:
Obstacle 1: Insufficient data foundation. C2+ methods require 3-6 months of historical data. But most brands have not yet deployed basic AI source detection — no historical data means no trend analysis. The earlier you start detecting, the sooner you accumulate the data assets needed for causal verification.
Obstacle 2: Methodology barriers. DiD, IV, and Geo Lift require statistical expertise most performance marketing teams lack. This is why CitationGraph designs causal verification as a product feature — not a coding exercise.
Obstacle 3: Organizational willingness. Holdout experiments mean intentionally pausing investment in some markets — requiring management's decision support. Many teams are unwilling to risk "metrics declining during pause."
CitationGraph does not require clients to leap to C4. We provide a progressive path:
Day 1-30: Establish C0 baseline. Deploy AI source detection. Begin accumulating data. See the basic AI channel funnel on CitationGraph.
Day 30-90: Upgrade to C1. Connect custom conversion events via the Custom Outcome Layer. Compare AI-sourced vs. non-AI user behavior. Produce first AI channel quality analysis report.
Day 90-180: Push to C2-C3. Sufficient historical data accumulated. Run trend discontinuity analysis. If the client has multiple product lines or markets, run DiD analysis.
Day 180+: Execute C4 experiments. Design Geo Lift or Holdout experiments. Execute for 4-6 weeks. Produce CFO/Board-grade causal evidence reports.
Key principle: each level delivers independent value. C0 answers "does AI traffic exist?" C1 answers "what is AI user quality?" C2 answers "did my GEO action have an effect?" You do not need C4 to guide decisions — but C4 is the ultimate weapon for convincing the CFO.
#### Why CitationGraph Is Better Suited for Causal Verification Than Self-Build
Cross-client baselines. A single brand's AI source growth may be industry-wide. CitationGraph's cross-client data can separate "industry growth" from "brand-specific growth" — the key control group for DiD analysis.
Citation SOV as an instrument. CitationGraph's AI answer citation monitoring (Citation SOV) provides a unique instrumental variable: changes in brand citation rates in AI answers. This variable is influenced by GEO actions but independent of ad spending — satisfying the exclusion restriction for IV estimation.
Data completeness. Causal analysis is extremely sensitive to data quality. CitationGraph's multi-level visibility (client-side to server-side) ensures the most complete AI source data possible — the foundation for reliable causal analysis.
#### Answering the Ultimate Question
Back to the series' starting point: "I don't know how to calculate the output."
With CitationGraph's five-level causal evidence ladder, the performance marketing lead can now answer the CFO:
C0: "AI sources bring X citations, Y web visits, Z app installs per month."
C1: "AI-sourced users complete KYC at 15% higher rates and deposit 22% more than ad-sourced users."
C2: "In the 4 weeks after publishing GEO-optimized content, AI-sourced signups grew X%, exceeding the market growth rate."
C3: "DiD analysis shows product lines with GEO optimization had AI-sourced signups grow significantly more than control lines — difference Y% (p < 0.05)."
C4: "We ran a 6-week Geo Lift experiment. In control regions where GEO investment was paused, AI-sourced signups dropped Z%. After resuming, they recovered. GEO's incremental ROI is W:1."
Each level is more persuasive than the last. And the path begins the day you deploy CitationGraph.
#### Core Argument
Detecting AI traffic is step one, but far from the endpoint. Performance marketing teams need not more data — but higher-level evidence. From observational correlation (C0) to controlled experimental causal proof (C4), five rungs are required. CitationGraph's Causal Evidence Ladder turns this path from "academic methodology" to "product feature" — any brand can progressively move from C0 to C4, using CFO-grade evidence to justify AI/GEO investment.
FAQ
Q1: How much traffic does a Geo Lift experiment require?
A: It depends on the effect size you want to detect. Generally, if AI sources generate 1,000+ monthly conversion events, a 4-6 week window can detect 10-15% level incremental effects. CitationGraph provides statistical power analysis during experiment design to ensure sufficient detection power.
Q2: Can you do causal verification with only one market (no geographic split)?
A: Yes. Holdout experiments can split by product line (pause GEO for certain product lines). Time-series analysis (C2) needs no geographic split — just a sufficiently long timeline and a clear intervention event. C3 DiD can use "product line" or "content topic" as the split dimension.
Q3: How are causal verification results presented to the CFO?
A: CitationGraph's causal evidence reports are designed for non-technical decision-makers. Core output is one number: "After stopping GEO investment, AI-sourced signups are projected to decline X% (95% CI: Y%-Z%)." With ROI conversion: "Each $1 of GEO investment generates $W in incremental revenue." The CFO does not need to understand DiD or IV — she needs a signable number.