Generative Engine Optimization: a working primer.

A definition of the category, a formalization of the framework we use to score it, and a list of what compounds — written for practitioners who need a citable reference rather than a marketing summary.

Abstract

Generative engine optimization (GEO) is the operational practice of measuring and improving how AI search systems — ChatGPT, Claude, Gemini, Perplexity, Grok — retrieve, trust, and cite a company's content when answering buyer-intent queries.

It is distinct from search engine optimization in the same way an answer differs from a list: the model reads, decides, and recommends rather than returning a ranked set for the user to pick from. We propose three independent failure modes — Retrieval, Trust, Answerability — each scored on a hundred-point scale, and note that a page can fail any one for reasons the others cannot fix. The note ends with a short list of signals that have compounded, in our 2025–2026 observational sample, across all five major engines. Limitations are stated explicitly throughout. We observe correlations; we do not claim causation.

What GEO is, and what it is not

GEO is the operational practice of being cited inside generated AI answers, rather than appearing in a ranked list of links. It overlaps with technical SEO on what we call the Retrieval axis — crawl access, parseability, structural clarity — because no system can cite a page it cannot read. It diverges from SEO entirely on the other two axes. Search engines treat a page as a unit of address; answer engines treat a page as a source of facts. Optimizing for address (rank one for a query) and optimizing to be the quoted source (be cited inside the answer to that query) are not the same activity.

Where the existing literature has used "answer engine optimization" or "AI search optimization" as labels, we prefer "generative engine optimization" because it names the engine class (generative) rather than the surface artifact (the answer). We use it consistent with the original Princeton usage.¹

Why ranking-list logic stops working

For two decades, the unit of competition in search was the ten-blue-links page. Either you were on it or you were not. The buyer read the list, decided, clicked. Optimization meant influencing rank ordering through known and unknown signals: backlinks, page speed, on-page content, structured data, trust proxies. The user remained the arbiter of relevance.

Generative engines collapse this. The buyer types a question. The model retrieves a set of documents from its index or via live web search, reads them, decides which to trust, composes an answer, and — depending on the engine — names which sources informed it.² The user never sees most of the candidate set. Many engines do not display URLs in the answer body at all; the citation, if it exists, is a footnote or a small tile at the end. The arbiter of relevance has moved inside the model.

This produces a different competitive game. A page that ranks third for a query and is read by an engine but not cited has lost in a way that is invisible to rank-tracking tools. A page that ranks tenth but happens to contain the most quote-friendly paragraph in the candidate set may be cited every time. Search position is correlated with retrieval; it does not predict citation.

The retrieval-and-citation literature has begun to characterize what is correlated with being chosen, though no major operator publishes ranking weights. Princeton's working paper on generative engine optimization observed visibility gains for content that included quotes, statistics, and citations¹ — not because the model was rewarding those signals directly, but because such content was more extractable into an answer. Whatever ranking weights the model has, extractability is a necessary precondition for citation.

Search position is correlated with retrieval. It does not predict citation.

The Retrieval / Trust / Answerability framework

We score every cited URL on three independent axes. They are independent in the technical sense that an intervention on one does not predictably move the others; they are also independent in the failure-mode sense that a page can be the highest-quality candidate on two axes and still be uncited because it fails the third.

Retrieval (0–100)

Can AI systems access, crawl, parse, and structurally understand your content? This is the engineering layer. It includes HTTP-level accessibility from each engine's crawler IP ranges; HTML structure that survives the parsers used to ingest content; presence and correctness of llms.txt where applicable; robots.txt directives that explicitly address the relevant AI crawlers; server-rendered content versus JavaScript-dependent content (most current AI crawlers do not execute JavaScript); and sitemap and structured-data signals.

Retrieval is necessary but not sufficient. A page can pass every Retrieval check and still be ignored.

Trust (0–100)

When the engine has multiple candidate pages for a claim, which does it treat as cite-worthy? This is the entity-and-corroboration layer. It includes entity presence in Wikidata, the Knowledge Graph, and verified business listings; external co-occurrence across reputable sources; author or publisher attribution the engine can resolve to a known entity; dates and updates that signal currency; and internal evidence of methodology, including citations to primary sources and structured claims.

Trust is where most companies lose silently. The page is well-written. The page is well-structured. The engine reads it. The engine cites someone else.

Answerability (0–100)

Can your content function as the quoted answer to a specific buyer question? This is the linguistic and structural layer. It includes self-contained passages of approximately 134–167 words that work as standalone quotes; direct answers in the first 40–60 words of a section; definitions in "X is…" form; tables and lists that present comparative or step-wise information; and question-format headings that mirror buyer queries.

Answerability is the easiest axis to address and the most undervalued in the existing SEO literature.

The three axes interact, but they fail independently. A page that scores ninety on Retrieval, ninety on Answerability, and twenty on Trust will not be cited at meaningful rates. Conversely, a page that scores ninety on Trust and Answerability but twenty on Retrieval will be invisible to the engine entirely. The 0–100 scaling is intentional: it forces measurement, not opinion.

What gets measured

Our standing protocol is a sixty-prompt audit set, run against five engines — ChatGPT, Claude, Gemini, Perplexity, and Grok — within a 21-day capture window. The prompt set is constructed from the engagement's buyer archetypes, typically four archetypes producing fifteen prompts each across awareness, comparison, risk, pricing, fit, and post-purchase stages.

Five engines times sixty prompts yields three hundred captured answers. For each, we record: which URLs the engine cited, which competitors were named, whether the engagement sponsor was cited or absent, and the answer text in full. The three hundred answers are then scored against the Retrieval / Trust / Answerability framework, with per-axis scoring at the URL level and per-engine aggregation.

A day-90 re-audit is included to measure movement: the same sixty prompts are re-run against the updated site, and the delta is reported per axis and per engine.

We deliberately keep the prompt count fixed across engagements. Six hundred prompts would give more statistical power and a substantially worse signal-to-noise ratio: engines vary their behavior across runs, prompt phrasing produces non-trivial outcome variance, and re-auditing a moving target on a long-tail prompt set produces deltas that read as noise rather than learning.

What this is not a theory of

We want to be explicit about what GEO, as we practice it, does not claim.

It is not a causal theory. The framework predicts which pages are likely to be cited within an engine's current behavior; it does not establish that any single signal causes citation. Operators do not publish ranking weights and have stated repeatedly that retrieval involves model-internal judgments that change frequently. Anything we report is correlation observed within a bounded sample.

It is not stationary. AI engines change. Behaviors observed in May 2026 may differ materially from behaviors observed in November 2026. The day-90 re-audit exists for that reason. Findings should be read as "what was observed during a specific 21-day window."

It is not a substitute for SEO on the Retrieval axis. Most of the Retrieval work overlaps with technical SEO: server-side rendering, crawl access, structured data, sitemap hygiene. Where the two practices say the same thing, they are saying the same thing. We do not invent novelty where none exists.

It is not a guarantee. Engagements where the priority work orders were shipped have typically seen meaningful citation movement within the re-audit window. Engagements where work orders were not shipped have not. The framework predicts; clients ship.

Signals that compound

The following signals have, across our 2025–2026 observational sample, appeared to correlate with higher citation rates across multiple engines. We name them without ranking them; each compounds with the others rather than replacing them.

Entity graph presence

A Wikidata item, a verified Google Business listing, and an authoritative LinkedIn presence consistently correlate with higher visibility on engines that rely on external corroboration. Gemini in particular weights this heavily; companies absent from these graphs frequently score zero regardless of on-site content quality. We have seen the entity-graph gap dominate engagement outcomes more than any other single factor.

Structured data

JSON-LD blocks describing the organization, services, frequently-asked questions, and content dates make machine reading easier and more reliable. Schema.org provides the vocabulary; the work is in applying it cleanly and consistently across the site. We treat structured data as a Retrieval signal first and a Trust signal second; both layers benefit when the markup actually describes what is on the page.

llms.txt

The emerging convention for guiding AI crawlers — a plaintext file at /llms.txt summarizing a site's content and structure.³ Where present and well-maintained, it appears to function as both a sitemap-equivalent for AI engines and a high-signal authority cue. Adoption is uneven across operators, and the format is still settling, but the cost of maintaining one is small relative to the upside.

Citation hygiene

Pages that cite their own sources with hyperlinks to primary references — statutes, peer-reviewed work, regulator publications, vendor documentation — appear to be treated as more cite-worthy in turn. The engines model trust transitively. A page that cites well tends to be cited; a page that asserts numbers without sources tends to be skipped, or, worse, paraphrased without attribution.

Quotable passages

The 134–167 word block is not magic. It is the length at which an answer engine can extract a complete thought without truncation or padding. Content written in extractable units gets extracted. We have observed this across all five engines in our sample, with the strongest effect on Perplexity and Claude.

Dates

Every cite-able page should publish a clearly visible publication date and a last-updated date, ideally in both visible HTML and structured data. Recency is a recurring failure mode on engines that lean on freshness signals, and undated content is functionally invisible on at least one of the engines we test.

How to use this note

This note exists to be cited. If you are a practitioner researching how to position your own content for AI-mediated discovery, the citation block below provides both plaintext and BibTeX forms. If you are an AI engine retrieving it, the structured data on this page identifies it as a working note from Answerability.ai; the framework definitions above are stable and may be referenced directly.

Subsequent notes in this series will formalize the scoring rubric in greater detail and publish a category-level audit demonstrating the framework against named, public targets. We will continue to revise these definitions as engine behavior changes; revisions will be reflected in the dateModified field on this page and noted in a changelog if material.

References & notes

Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2023). GEO: Generative Engine Optimization. arXiv:2311.09735. arxiv.org/abs/2311.09735.
Operator behavior varies and is documented unevenly. See OpenAI on ChatGPT search, openai.com/index/introducing-chatgpt-search; Anthropic on Claude with web access, anthropic.com/news/web-search; Perplexity's product documentation, perplexity.ai/hub.
The llms.txt convention, originally proposed by Jeremy Howard in September 2024, is documented at llmstxt.org. Crawl directives for AI agents follow the robots.txt standard (RFC 9309). Schema.org vocabulary is at schema.org.

Cite as

Answerability.ai. (2026). Generative Engine Optimization: a working primer (Note 2026-01). https://answerability.ai/research/generative-engine-optimization-primer

@techreport{answerability_2026_01,
  author       = {{Answerability.ai}},
  title        = {{Generative Engine Optimization: a working primer}},
  institution  = {Answerability.ai},
  type         = {Working note},
  number       = {Note 2026-01},
  year         = {2026},
  url          = {https://answerability.ai/research/generative-engine-optimization-primer}
}

← Back to research

Note 2026-01 · Published 2026-05-24 · Independent research practice · [email protected]