Two requests appear in your server logs.
The first: User-Agent is GPTBot/1.0, requesting /products/solar-panel-400w, from an OpenAI IP range.
The second: User-Agent is ChatGPT-User/1.0, requesting the same /products/solar-panel-400w, from the same OpenAI IP range.
In GA4, both requests are invisible — GA4 depends on client-side JavaScript, which AI crawlers do not execute. In GSC, they are also invisible — GSC's AI reports only track Google's own AI features. In most analytics tools, they are either ignored or lumped together as "bot traffic."
But these two requests carry fundamentally different commercial meaning.
GPTBot visits to train a model. It systematically crawls your product pages, feeding content into OpenAI's training dataset. This is a long-term signal — your content is being incorporated into AI's knowledge base, and future AI responses may reference your brand and products. But it does not mean a user is asking about you right now.
ChatGPT-User visits because a user is asking ChatGPT about you in a conversation right now. This is a real-time, high-intent commercial signal. A real person is chatting with ChatGPT, and ChatGPT determined that your product page could help answer the user's question, so it fetched your page in real time for current information.
The first request carries low direct commercial value (indirect contribution to long-term brand knowledge). The second represents a high-intent potential customer actively researching your product.
If your analytics system cannot distinguish these two crawler types, you are measuring fundamentally different commercial signals with the same metric.
Based on official documentation from OpenAI, Anthropic, Google, and other AI platforms, AI crawler visits can be classified into five intent types. This is not a theoretical taxonomy — each intent corresponds to specific official User-Agent identifiers that can be precisely identified through server-side log analysis.
Representative crawlers: GPTBot (OpenAI), ClaudeBot (Anthropic), Meta-ExternalAgent (Meta), GoogleOther (Google), Bytespider (ByteDance), CCBot (Common Crawl)
Purpose: Systematic content crawling for model training datasets. These crawlers are characterized by high frequency, broad coverage, and no concern for real-time accuracy — they are collecting data, not answering user questions.
Commercial value: Long-term foundational value. Your content entering training data means AI models may "know" your brand in future responses. But this is indirect and long-term — you cannot trace a specific order back to a training crawl.
Brand action: Ensure robots.txt allows content you want trained to be crawled. Control content quality — if training data contains errors, AI models may propagate them. Consider using llms.txt to provide structured brand knowledge.
Representative crawlers: FacebookExternalHit/Facebot (Meta), platform-specific search index crawlers
Purpose: Building indexes for AI search engines, analogous to Googlebot's crawling for traditional search. Unlike Training, Indexing focuses on structural page elements (title, description, Schema, Open Graph) rather than full text.
Commercial value: Infrastructure value. Being indexed is a prerequisite for being recommended. If an AI search engine hasn't indexed your page, it cannot recommend you. But being indexed does not equal being recommended.
Brand action: Ensure Schema completeness, meta tag accuracy, and correct Open Graph tags on key pages. This is the foundation layer of AI discoverability.
Representative crawlers: OAI-SearchBot (OpenAI), PerplexityBot (Perplexity), Claude-SearchBot (Anthropic), Google-CloudVertexBot (Google), YisouSpider (China)
Purpose: AI search engines fetching relevant pages in real time while processing a user's search query. These crawlers target specific pages related to specific queries — they are not running site-wide scans.
Commercial value: Medium-high value. A Search crawler visit means a user is currently searching for a topic related to your page through an AI search engine. This is a market demand signal — someone is looking for products or information in your category.
Brand action: Optimize Answer-First page structure — ensure the page contains a direct answer to the target query within the first 200 characters. Optimize FAQ structure. Ensure price, inventory, and other critical information is current.
Representative crawlers: ChatGPT-User (OpenAI), Claude-User/Claude-Web (Anthropic), meta-externalfetcher (Meta), Perplexity-User
Purpose: A user in an AI conversation asks the AI to access or learn about a specific page. The AI platform fetches the page on the user's behalf in real time, integrating content into the conversation response.
Commercial value: High value. This is the closest AI behavior to a human visit. A real person is actively asking AI to help them understand you — they may be comparing you against competitors, checking your return policy, or confirming your pricing. This is a high-intent signal.
Brand action: Ensure key conversion pages (product details, pricing, policies) are AI-friendly — structured, accurate, and current. Include clear CTAs on these pages, because AI may include your CTA in the response to the user.
Representative crawlers: Google-Agent (Google), Manus-User (Manus), NovaAct (Amazon)
Purpose: An AI agent executes specific operations on behalf of a user — filling forms, initiating queries, comparing prices, even completing purchase flows. This is the frontier of Agentic Commerce, still in early stages.
Commercial value: Highest value. A User Action crawler visit means a user is asking AI to do something on your website on their behalf. This is not just understanding — this is action. As AI agent capabilities grow and UCP/ACP protocols mature, this layer's commercial value will grow exponentially.
Brand action: Ensure complete structured data, programmatically accessible forms, and available pricing/inventory APIs. This is foundational preparation for Agentic Commerce.
training ——→ indexing ——→ search ——→ user_fetch ——→ user_action Long-term Base Medium- High Highest high ← Ascending commercial value →
This is not a linear increase — the value jump from search to user_fetch is the most significant, representing the qualitative shift from "the AI ecosystem is paying attention to you" to "a specific user is actively learning about you."
GA4 is a client-side analytics tool — it only runs in users' browsers. AI crawlers are not browsers and do not execute JavaScript, so in GA4's world they simply do not exist. GA4 cannot classify crawler intent because it cannot see crawlers at all.
GSC's AI reports only track Google's own AI feature impressions. They do not track third-party AI crawler visits and do not perform intent classification.
Achieving crawler intent classification requires three conditions:
Gravity's CitationGraph platform provides intelligent classification based on crawler intent, covering major AI platform crawlers.
With five intent types understood, brands can craft more precise GEO strategies:
Intent | What You Should Do | What You Can Stop Doing |
|---|---|---|
Training | Provide accurate brand facts, use llms.txt | No need to optimize page speed for training crawlers |
Indexing | Ensure complete Schema, meta tags, Open Graph | No need to provide real-time data for indexing |
Search | Optimize Answer-First content and FAQ structure | Don't worry about search crawler frequency |
Theory is complete. The next article goes practical: how a DTC brand upgrades from "AI traffic shows 0.5% in GA4" to "full-view AI impact of 8–12%" through a four-level monitoring architecture.
A: Through User-Agent field matching. OpenAI uses three standard UAs: GPTBot (training), OAI-SearchBot (search), ChatGPT-User (user_fetch). Anthropic uses ClaudeBot (training), Claude-SearchBot (search), Claude-User (user_fetch). Google uses GoogleOther (training), Google-CloudVertexBot (search), Google-Agent (user_action). These UA identifiers are officially published by each platform.
A: Primarily long-term impact. Once your content enters training data, AI models may "know" your brand when answering future related questions. If your brand description in the training data is inaccurate (outdated product info, wrong pricing), AI may propagate those errors. Ensuring content accuracy for training is the most important brand action.
A: The biggest difference: user_fetch does not trigger GA4. It is AI visiting your page on behalf of a user, without executing JavaScript or creating a session. But the commercial signal it represents is equally strong — because there really is a human actively learning about you. From a commercial value perspective, user_fetch should be treated as a "high-quality near-human visit."
A: Still in early stages. Google-Agent is currently the most common user_action crawler, primarily appearing in Google Gemini's Agent features. Manus-User and NovaAct represent emerging AI agent trends. While volumes are small now, growth will accelerate rapidly as Agentic Commerce protocols (UCP/ACP) mature. We recommend brands start monitoring now to establish baseline data.
A: Chinese AI platforms' UA standards are less standardized and transparent than OpenAI/Anthropic/Google, but Gravity's CitationGraph platform covers crawler identification for multiple mainstream Chinese AI platforms. Specific intent classification requires combining UA matching with behavioral analysis.
Get a free AI search audit report to understand your brand's visibility in AI search.
Free AI Search AuditUser Fetch |
Ensure key pages are accurate, real-time, with CTAs |
Monitor which pages get the most user_fetch visits |
User Action | Ensure structured data and APIs are available | Start preparing for Agentic Commerce now |