Not All AI Crawlers Are Equal: Five Intent Types, Five Business Values

Not All AI Crawlers Are Equal: Five Intent Types, Five Business Values | Gravity Founder's Column

Two requests appear in your server logs.

The first: User-Agent is GPTBot/1.0, requesting /products/solar-panel-400w, from an OpenAI IP range.

The second: User-Agent is ChatGPT-User/1.0, requesting the same /products/solar-panel-400w, from the same OpenAI IP range.

In GA4, both requests are invisible — GA4 depends on client-side JavaScript, which AI crawlers do not execute. In GSC, they are also invisible — GSC's AI reports only track Google's own AI features. In most analytics tools, they are either ignored or lumped together as "bot traffic."

But these two requests carry fundamentally different commercial meaning.

GPTBot visits to train a model. It systematically crawls your product pages, feeding content into OpenAI's training dataset. This is a long-term signal — your content is being incorporated into AI's knowledge base, and future AI responses may reference your brand and products. But it does not mean a user is asking about you right now.

ChatGPT-User visits because a user is asking ChatGPT about you in a conversation right now. This is a real-time, high-intent commercial signal. A real person is chatting with ChatGPT, and ChatGPT determined that your product page could help answer the user's question, so it fetched your page in real time for current information.

The first request carries low direct commercial value (indirect contribution to long-term brand knowledge). The second represents a high-intent potential customer actively researching your product.

If your analytics system cannot distinguish these two crawler types, you are measuring fundamentally different commercial signals with the same metric.

Five AI Crawler Intent Types

Based on official documentation from OpenAI, Anthropic, Google, and other AI platforms, AI crawler visits can be classified into five intent types. This is not a theoretical taxonomy — each intent corresponds to specific official User-Agent identifiers that can be precisely identified through server-side log analysis.

Intent One: Training

Representative crawlers: GPTBot (OpenAI), ClaudeBot (Anthropic), Meta-ExternalAgent (Meta), GoogleOther (Google), Bytespider (ByteDance), CCBot (Common Crawl)

Purpose: Systematic content crawling for model training datasets. These crawlers are characterized by high frequency, broad coverage, and no concern for real-time accuracy — they are collecting data, not answering user questions.

Commercial value: Long-term foundational value. Your content entering training data means AI models may "know" your brand in future responses. But this is indirect and long-term — you cannot trace a specific order back to a training crawl.

Brand action: Ensure robots.txt allows content you want trained to be crawled. Control content quality — if training data contains errors, AI models may propagate them. Consider using llms.txt to provide structured brand knowledge.

Intent Two: Indexing

Representative crawlers: FacebookExternalHit/Facebot (Meta), platform-specific search index crawlers

Purpose: Building indexes for AI search engines, analogous to Googlebot's crawling for traditional search. Unlike Training, Indexing focuses on structural page elements (title, description, Schema, Open Graph) rather than full text.

Commercial value: Infrastructure value. Being indexed is a prerequisite for being recommended. If an AI search engine hasn't indexed your page, it cannot recommend you. But being indexed does not equal being recommended.

Brand action: Ensure Schema completeness, meta tag accuracy, and correct Open Graph tags on key pages. This is the foundation layer of AI discoverability.

Intent Three: Search

Representative crawlers: OAI-SearchBot (OpenAI), PerplexityBot (Perplexity), Claude-SearchBot (Anthropic), Google-CloudVertexBot (Google), YisouSpider (China)

Purpose: AI search engines fetching relevant pages in real time while processing a user's search query. These crawlers target specific pages related to specific queries — they are not running site-wide scans.

Commercial value: Medium-high value. A Search crawler visit means a user is currently searching for a topic related to your page through an AI search engine. This is a market demand signal — someone is looking for products or information in your category.

Brand action: Optimize Answer-First page structure — ensure the page contains a direct answer to the target query within the first 200 characters. Optimize FAQ structure. Ensure price, inventory, and other critical information is current.

Intent Four: User Fetch

Representative crawlers: ChatGPT-User (OpenAI), Claude-User/Claude-Web (Anthropic), meta-externalfetcher (Meta), Perplexity-User

Purpose: A user in an AI conversation asks the AI to access or learn about a specific page. The AI platform fetches the page on the user's behalf in real time, integrating content into the conversation response.

Commercial value: High value. This is the closest AI behavior to a human visit. A real person is actively asking AI to help them understand you — they may be comparing you against competitors, checking your return policy, or confirming your pricing. This is a high-intent signal.

Brand action: Ensure key conversion pages (product details, pricing, policies) are AI-friendly — structured, accurate, and current. Include clear CTAs on these pages, because AI may include your CTA in the response to the user.

Intent Five: User Action

Representative crawlers: Google-Agent (Google), Manus-User (Manus), NovaAct (Amazon)

Purpose: An AI agent executes specific operations on behalf of a user — filling forms, initiating queries, comparing prices, even completing purchase flows. This is the frontier of Agentic Commerce, still in early stages.

Commercial value: Highest value. A User Action crawler visit means a user is asking AI to do something on your website on their behalf. This is not just understanding — this is action. As AI agent capabilities grow and UCP/ACP protocols mature, this layer's commercial value will grow exponentially.

Brand action: Ensure complete structured data, programmatically accessible forms, and available pricing/inventory APIs. This is foundational preparation for Agentic Commerce.

The Intent Spectrum and Ascending Value

training ——→ indexing ——→ search ——→ user_fetch ——→ user_action Long-term Base Medium- High Highest high ← Ascending commercial value →

This is not a linear increase — the value jump from search to user_fetch is the most significant, representing the qualitative shift from "the AI ecosystem is paying attention to you" to "a specific user is actively learning about you."

Why GA4 and GSC Cannot Make This Distinction

GA4 is a client-side analytics tool — it only runs in users' browsers. AI crawlers are not browsers and do not execute JavaScript, so in GA4's world they simply do not exist. GA4 cannot classify crawler intent because it cannot see crawlers at all.

GSC's AI reports only track Google's own AI feature impressions. They do not track third-party AI crawler visits and do not perform intent classification.

Achieving crawler intent classification requires three conditions:

Server-side log analysis — directly reading the User-Agent field from HTTP requests
Official UA standard library — matching against official User-Agent identifiers published by OpenAI, Anthropic, Google, and others
Behavioral analysis — for crawlers that don't use standard UAs, secondary classification through visit frequency, path patterns, and header characteristics

Gravity's CitationGraph platform provides intelligent classification based on crawler intent, covering major AI platform crawlers.

Intent-Based Differentiated GEO Strategy

With five intent types understood, brands can craft more precise GEO strategies:

Intent	What You Should Do	What You Can Stop Doing
Training	Provide accurate brand facts, use llms.txt	No need to optimize page speed for training crawlers
Indexing	Ensure complete Schema, meta tags, Open Graph	No need to provide real-time data for indexing
Search	Optimize Answer-First content and FAQ structure	Don't worry about search crawler frequency

What Comes Next

Theory is complete. The next article goes practical: how a DTC brand upgrades from "AI traffic shows 0.5% in GA4" to "full-view AI impact of 8–12%" through a four-level monitoring architecture.

FAQ

Q1: How do I identify different AI crawler intents in my server logs?

A: Through User-Agent field matching. OpenAI uses three standard UAs: GPTBot (training), OAI-SearchBot (search), ChatGPT-User (user_fetch). Anthropic uses ClaudeBot (training), Claude-SearchBot (search), Claude-User (user_fetch). Google uses GoogleOther (training), Google-CloudVertexBot (search), Google-Agent (user_action). These UA identifiers are officially published by each platform.

Q2: What is the practical impact of training crawlers on brands?

A: Primarily long-term impact. Once your content enters training data, AI models may "know" your brand when answering future related questions. If your brand description in the training data is inaccurate (outdated product info, wrong pricing), AI may propagate those errors. Ensuring content accuracy for training is the most important brand action.

Q3: How does user_fetch differ from a regular human visit?

A: The biggest difference: user_fetch does not trigger GA4. It is AI visiting your page on behalf of a user, without executing JavaScript or creating a session. But the commercial signal it represents is equally strong — because there really is a human actively learning about you. From a commercial value perspective, user_fetch should be treated as a "high-quality near-human visit."

Q4: Are user_action crawlers common today?

A: Still in early stages. Google-Agent is currently the most common user_action crawler, primarily appearing in Google Gemini's Agent features. Manus-User and NovaAct represent emerging AI agent trends. While volumes are small now, growth will accelerate rapidly as Agentic Commerce protocols (UCP/ACP) mature. We recommend brands start monitoring now to establish baseline data.

Q5: Do Chinese AI platform crawlers have intent classifications too?

A: Chinese AI platforms' UA standards are less standardized and transparent than OpenAI/Anthropic/Google, but Gravity's CitationGraph platform covers crawler identification for multiple mainstream Chinese AI platforms. Specific intent classification requires combining UA matching with behavioral analysis.