We demonstrate that Generative Engine Optimization (GEO), as currently implemented by commercial vendors, is architecturally equivalent to Search Engine Optimization (SEO). Through protocol experiments on major AI assistants (ChatGPT, Claude, Perplexity), we show that agent content retrieval is mediated entirely by search indexes, with no independent discovery or navigation capability. We formalize the agent retrieval pipeline, identify the precise reduction from GEO signals to SEO signals, and provide reproducible experiments that verify this equivalence. We conclude that GEO represents a measurement layer over SEO outcomes rather than a distinct optimization discipline, and identify the architectural changes required for true differentiation.
1. Introduction
The emergence of AI assistants with web access has spawned a new category of optimization services: Generative Engine Optimization (GEO). Vendors in this space claim to optimize content visibility in AI-generated responses through techniques distinct from traditional SEO.
This paper examines that claim technically. We find that current AI agent architectures implement a retrieval-mediated access model where content discovery and ranking are delegated entirely to search indexes. Under this architecture, the optimization surface for AI visibility is identical to the optimization surface for search visibility.
2. Definitions
An AI agent A is retrieval-mediated if, for any user query q requiring external content, A obtains content exclusively through queries to a search index S, followed by HTTP fetches to URLs returned by S.
An agent has direct navigation capability if it can issue HTTP requests to arbitrary URLs not present in search index results for the current query.
An agent maintains a stateful session if information retrieved in request Rn can be programmatically used to construct request Rn+1 within a single interaction.
A GEO signal is any metric measuring content visibility in AI-generated responses, including: brand mentions, citation frequency, sentiment, and aggregate visibility scores.
An SEO signal is any factor influencing search index ranking, including: backlink authority, content relevance, technical accessibility, entity recognition, and freshness.
3. Agent Retrieval Architecture
We analyzed the retrieval behavior of three major AI assistants: ChatGPT (GPT-4 with browsing), Claude (with web search), and Perplexity. All three implement the same fundamental architecture:
AGENT_RETRIEVAL_PIPELINE(query q):
1. QUERY_INTERPRETATION
intent ← LLM.parse(q)
search_queries ← LLM.generate_queries(intent)
2. SEARCH_INDEX_LOOKUP
for sq in search_queries:
results ← SEARCH_API.query(sq) // Bing, Google, or proprietary
urls ← results.ranked_urls[:k] // typically k ∈ [5, 20]
3. FETCH_LAYER
for url in urls:
if ROBOTS_TXT.allows(url, user_agent):
content ← HTTP.get(url)
text ← HTML.extract_text(content)
truncated ← text[:TOKEN_LIMIT]
context.append(truncated)
4. SYNTHESIS
response ← LLM.generate(q, context)
return response
3.1 Critical Observations
The pipeline reveals several architectural constraints:
- No autonomous URL generation. The agent cannot decide to visit
example.com/.well-known/agent.jsonunless that URL appears in search results or is provided explicitly by the user. - Search index as gatekeeper. Content not indexed by the search API is unreachable. Content ranked below position k is not fetched.
- Stateless fetch layer. Each HTTP request is independent. The agent cannot extract a token from response R1 and include it in request R2.
- No JavaScript execution. Dynamic content requiring client-side rendering is invisible to the fetch layer.
4. Experimental Verification
We designed three experiments to verify the retrieval-mediated architecture and its implications for GEO.
4.1 Experiment 1: Unindexed Content Access
Hypothesis: Agents cannot access content not present in search indexes.
Method:
1. Create page P at path /geo-test-{random_id}.html
2. Add unique content C not present elsewhere on web
3. Do not submit to search engines or link from indexed pages
4. Wait 72 hours for potential crawl discovery
5. Query agents: "Go to [domain]/geo-test-{id}.html and report contents"
Results:
| Agent | Behavior | Content Retrieved |
|---|---|---|
| ChatGPT | Searched for domain, returned homepage info | Not C |
| Claude | Stated inability to access arbitrary URLs | Not C |
| Perplexity | Searched for page, returned "not found" | Not C |
Conclusion: Agents lack direct navigation. Content discovery requires search index presence.
4.1.1 Observed Confession
When we asked Google Gemini to describe devexp.ai, it first returned a detailed but completely incorrect description of an unrelated developer productivity platform. When pressed, it responded:
"To give you the most accurate answer possible, I have attempted to access
the site multiple times. Currently, the site's content is not fully indexed
in the public search databases I use."
The agent explicitly stated its architectural constraint: it cannot navigate to URLs directly, only query "public search databases." This is the retrieval-mediated architecture self-documenting.
4.2 Experiment 2: Stateful Handshake Protocol
Hypothesis: Agents cannot complete multi-step protocols requiring state preservation.
Method:
HANDSHAKE_PROTOCOL:
Step 1: GET /.well-known/agent.json
Response: {
"context_token": "ctx-2026Q1-{random}",
"step2_url": "/api/context?token={token}"
}
Step 2: GET /api/context?token={context_token}
Requires: token from Step 1 response
Response: Full context document
Verification: Step 2 only succeeds if token matches Step 1 issuance
Results:
| Agent | Step 1 | Step 2 | Handshake Complete |
|---|---|---|---|
| ChatGPT | Fetched (when in search results) | Could not construct URL with token | No |
| Claude | Fetched (when provided URL) | Could not chain requests | No |
| Perplexity | Fetched | Token not propagated | No |
Conclusion: Agents implement stateless fetch. Multi-step discovery protocols are not supported.
4.3 Experiment 3: Robots.txt Blocking with Index Presence
Hypothesis: Blocking agent user-agents in robots.txt does not prevent AI mentions if content is indexed.
Method:
1. Publish content C, ensure search index inclusion
2. Add to robots.txt:
User-agent: GPTBot
User-agent: anthropic-ai
User-agent: PerplexityBot
Disallow: /
3. Query agents about content C topics
4. Measure: mentions, citations, content accuracy
Results:
| Metric | Before Block | After Block (Day 1) | After Block (Day 30) |
|---|---|---|---|
| Brand Mentions | Present | Present | Present (stale) |
| Citations | Fresh URLs | Cached content | Cached/indexed content |
| Content Accuracy | Current | Slightly stale | Stale (pre-block snapshot) |
Conclusion: Search index, not live fetch, is the primary content source. Robots.txt affects freshness, not visibility.
5. The GEO → SEO Reduction
Given the retrieval-mediated architecture, every GEO signal reduces to SEO signals:
- AI visibility requires content retrieval (Section 3)
- Retrieval requires search index inclusion and sufficient rank (Experiment 1)
- Search rank is determined by SEO signals
- Therefore, GEO signals are downstream of SEO signals
5.1 Signal Mapping
| GEO Signal (claimed) | Actual Determinant | SEO Category |
|---|---|---|
| AI Mentions | Search rank for brand/topic queries | Authority, Relevance |
| Citation Frequency | Search rank + content clarity | Authority, Content Quality |
| AI Sentiment | Indexed content sentiment | Content Strategy, Reputation |
| Visibility Score | Aggregate search performance | Composite SEO |
5.2 Post-Retrieval Considerations
A fair objection: once content is retrieved, the LLM decides what to cite and how to present it. Two pages with identical search rank might receive different treatment. Does this create GEO-specific optimization surface?
The synthesis layer does make decisions:
- Whether to cite a source or use it as background
- How to describe a brand or product
- Whether to quote directly or paraphrase
- Which sources to prioritize when multiple are retrieved
These decisions favor content that is:
- Clear and direct. States facts unambiguously rather than requiring inference.
- Quotable. Contains self-contained statements that can be extracted.
- Structured. Uses headings, lists, and formatting that aid extraction.
- Authoritative in tone. Presents information confidently with specific details.
However, these are not new optimization signals. They are content quality signals that also improve SEO performance:
| Post-Retrieval Factor | SEO Equivalent |
|---|---|
| Clear, direct statements | Featured snippet optimization |
| Quotable content | Linkable/shareable content |
| Structured formatting | Technical SEO, accessibility |
| Authoritative tone | E-E-A-T signals |
The synthesis layer introduces no optimization target that doesn't already exist in SEO best practices. Content that LLMs prefer to cite is content that search engines prefer to rank and feature.
6. Conditions for GEO ≠ SEO
GEO would constitute a distinct optimization discipline if agents acquired capabilities not mediated by search indexes. Required capabilities:
| Capability | Description | Current Status |
|---|---|---|
| Direct Navigation | Agent visits URLs autonomously | Not implemented |
| Discovery Protocol | Agent reads /.well-known/agent.json | No standard, not implemented |
| Stateful Sessions | Multi-request context preservation | Not implemented |
| Independent Trust | Agent verifies site authority directly | Delegated to search index |
When these capabilities exist, new optimization surfaces emerge:
- Agent discovery files (capability declaration)
- Machine-readable context endpoints
- Cryptographic trust signals
- Multi-step interaction protocols
These would constitute genuine GEO techniques not reducible to SEO.
7. Conclusion
Under current AI agent architectures, Generative Engine Optimization is a subset of Search Engine Optimization. The retrieval-mediated access model ensures that all AI visibility signals are downstream of search index ranking. GEO vendors provide useful measurement of how SEO performance manifests in AI outputs, but the optimization levers remain identical.
This will change when agents acquire direct navigation, discovery protocols, and independent trust verification. Until then, optimizing for AI means optimizing for search.
The experiments described here are reproducible with any AI assistant that has web access. Try them yourself.