Note 2026-01 · Primer · rev. 1.1

Generative Engine Optimization: a working primer.

A definition of the category, a formalization of the framework we use to score it, per-engine notes on observed behavior, and a list of operational signals that compound — written for practitioners who need a citable reference rather than a marketing summary.

§1What GEO is, and what it is not

Generative engine optimization is the operational practice of being cited inside generated AI answers, rather than appearing in a ranked list of links. It overlaps with technical SEO on what we call the Retrieval axis — crawl access, parseability, structural clarity — because no system can cite a page it cannot read. It diverges from SEO entirely on the other two axes. Search engines treat a page as a unit of address; answer engines treat a page as a source of facts. Optimizing for address (rank one for a query) and optimizing to be the quoted source (be cited inside the answer to that query) are not the same activity, and the practices that produce one outcome will, in our observation, only partially produce the other.

The literature has used several labels for what we call GEO. "Answer engine optimization" (AEO) is the most common alternative, particularly in vendor marketing. Some commentators use "AI search optimization." We prefer "generative engine optimization" for two reasons. First, it names the engine class — generative — rather than the surface artifact (the answer). Second, it is consistent with the original academic usage of the term by Aggarwal and co-authors at Princeton in 2023, who proposed GEO as a category and ran the first controlled experiments on what content properties shifted visibility in generated answers.1 The terminological choice is more than pedantry. "Answer engine optimization" implies that the answer is the locus of the work; in practice, the locus of the work is the page that the engine reads before composing the answer.

GEO is also distinct from what is sometimes called "AI Overviews optimization" — work focused specifically on Google's AI Overviews surface. AI Overviews is a single product on a single engine and reflects choices Google has made about retrieval, composition, and source attribution. Optimizing for AI Overviews specifically is a worthwhile subset of GEO, but a strategy that produces visibility there will not automatically translate to ChatGPT, Claude, Perplexity, or Grok. The engines diverge meaningfully in what they retrieve, what they consider authoritative, and what they cite, and a single-engine strategy is, by definition, a partial one.

Finally, GEO is not the same as content marketing for AI. A common framing in marketing departments is "we should write more so AI engines have more to cite." This treats AI engines as audience and content as the variable. The framing is half right. Content matters, but so does whether the content is reachable, parseable, attributable to a known entity, extractable in quotable units, and corroborated externally. The Answerability framework, introduced below, is our attempt to make the other half of the work explicit.

§2Why the ranking-list game stopped working

For two decades, the unit of competition in search was the ten-blue-links page. Either a page was on it or it was not. The buyer read the list, decided, clicked. Optimization meant influencing rank ordering through known and unknown signals: backlinks, page speed, on-page content, structured data, trust proxies. The user remained the arbiter of relevance.

Generative engines collapse this. The buyer types a question. The model retrieves a set of documents — from a pre-built index, from live web search, or both — reads them, decides which to trust, composes an answer, and, depending on the engine, names which sources informed it.2 The user never sees most of the candidate set. Many engines do not display URLs in the body of the answer at all; the citation, if it exists, is a footnote at the end or a small tile beside it. The arbiter of relevance has moved inside the model.

The effect on traffic is now substantial. Google has stated publicly that AI Overviews reach more than a billion users per month, and industry analyses have reported that AI Overviews now appear for a large and growing share of all queries.3 ChatGPT has reported hundreds of millions of weekly active users.4 Perplexity has reported hundreds of millions of monthly queries.5 Whatever the precise numbers — and the operators report different metrics on different cadences — the directional point is the same. A non-trivial share of the queries that used to land on a ranked list of links now resolve inside a model. The first impression a buyer forms about a category is increasingly something a model said, not something a vendor said.

This produces a different competitive game. A page that ranks third for a query and is read by the engine but not cited has lost in a way that is invisible to traditional rank-tracking tools. A page that ranks tenth but contains the most quote-friendly paragraph in the candidate set may be cited every time. Search position is correlated with retrieval — engines have to find a page before they can use it — but it does not predict citation. Industry analyses of AI Overviews citation behavior have consistently found that roughly nine in ten cited URLs come from pages that also rank in the top ten organically, but a meaningful share — close to half by some counts — come from pages ranked outside the top five.6 The selection logic is different, and the difference is operationally important.

Search position is correlated with retrieval. It does not predict citation.

Two further asymmetries deserve attention. The first is platform divergence. Recent industry research has reported that only about one in ten domains is cited by both ChatGPT and Google's AI Overviews for the same query.7 Engines do not share retrieval indexes, do not weight sources the same way, and do not surface the same kinds of authority. A page that appears in ChatGPT answers may be invisible to Perplexity, and vice versa. There is no single "AI search optimization" that produces uniform visibility across the five engines we observe.

The second asymmetry is between domain authority as traditionally measured and the signals that actually correlate with AI citation. A widely circulated 2025 study of large brand sets found that mentions on YouTube, Reddit, and Wikipedia correlated with AI visibility more strongly than backlink-derived domain authority did.8 The signal these engines appear to weight is closer to "is this entity present and discussed across reputable contexts" than "how many sites link to this URL." Backlink-driven SEO and entity-graph-driven GEO are not the same engineering problem.

The retrieval-and-citation literature has begun to characterize what is correlated with being chosen, though no major operator publishes ranking weights. The Princeton GEO paper observed visibility gains in test queries for content that included quotes, statistics, and citations1 — not because the model was rewarding those signals directly, but because such content was more extractable into a generated answer. Whatever the model's ranking weights look like internally, extractability appears to be a necessary precondition for citation. A page that cannot be extracted into a sentence will not be quoted, regardless of how authoritative it is.

§3The Answerability framework: Content, Retrieval, Trust

Answerability is the composite — a company's ability to be retrieved, trusted, and used as the answer to a buyer's question. We measure it across three independent pillars: Content (do you have content that answers what buyers ask, in a form an engine can lift), Retrieval (can engines access and parse that content), and Trust (do engines treat the source as cite-worthy). A company's Answerability is constrained by its weakest pillar, not its average. We do not abbreviate the framework to an acronym; the pillars are always named in full.

We score every cited URL on the three pillars, and they are independent in two senses. Technically, an intervention on one does not predictably move the others — fixing crawl access does not improve entity-graph corroboration, and adding a Wikidata entry does not improve extractability. Operationally, a page can be the highest-quality candidate on two pillars and remain uncited because it fails the third. The framework's central claim is that visibility failures are independent failures, not a single quality continuum.

The three-pillar framing is not arbitrary. We considered four- and five-pillar variants. The four-pillar variants typically split Trust into "entity trust" and "claim trust," and the five-pillar variants further split Retrieval into crawl-time and parse-time concerns. Both expansions are defensible. We collapsed them because, in our engagement experience, practitioners produced better work with three crisply-defined pillars than with five blurry ones. The cost is that the framework is coarser than the underlying behavior; the benefit is that work orders can be written against it.

C / 0–100Content

Do you have content that answers what buyers actually ask — in a form an engine can lift?

Content is the question-coverage and extractability layer. It asks two things in sequence: whether content exists that answers the questions buyers are actually putting to the engines, and whether that content is shaped so an engine can lift it as the quoted answer. It is the most undervalued layer in the existing SEO literature and frequently the highest-leverage place to start for companies whose Retrieval is acceptable and whose Trust has been built incidentally. A company can run a large, well-trafficked content library and still score low on Content — because the library answers the questions the company wanted to write about, not the questions buyers are asking, and because the answers are written as prose to be read rather than as passages to be extracted.

Sub-criteria we score

  • Coverage of the buyer-question universe — whether the questions surfaced by the engagement's buyer archetypes have a corresponding page, section, or passage at all. A question with no answer-bearing content cannot be cited, regardless of how strong the rest of the site is.
  • Self-contained passages of approximately 134–167 words that work as standalone quotes — long enough to convey a complete thought, short enough to be lifted into a generated answer without truncation or padding. The exact length is not magic, but content written in this range is reliably more extractable than content written in either 60-word fragments or 300-word paragraphs.
  • Direct answers in the first 40–60 words of a section. The opening of a passage carries disproportionate weight in extraction; an opening that defers the answer to later in the paragraph is frequently skipped.
  • Definitions in canonical "X is…" form. AI engines retrieve definitional content heavily and surface it disproportionately in answer composition. "Generative engine optimization is the operational practice of…" is more extractable than "We help our clients with what's sometimes called…"
  • Tables and lists for comparative or step-wise information. Engines extract tabular data reliably and frequently render it back as tables in answers. A comparison rendered in prose is less likely to be cited than the same comparison rendered as a table.
  • Question-format headings that mirror buyer queries. "What is X?" as an H2 routes the engine to the answer; "Our perspective on the field" does not.
  • Consistent terminology — using the same noun for the same concept throughout the page, rather than alternating synonyms for stylistic variety. Engines treat terminological consistency as a signal that the page is "about" the named concept.

Common failure modes

A content library that answers the company's preferred questions rather than the buyer's actual ones — the most common and least visible Content failure. Long, narratively-structured prose with the load-bearing claim buried in paragraph three. Heavily designed landing pages where the visible copy is short headlines and the substantive content lives in image tooltips, sidebar callouts, or scroll-triggered animations — all of which are invisible to extraction. Inconsistent terminology where the page calls the same thing three different names in three different sections. Question-format headings that the body never actually answers.

The two-part test

Content failures split cleanly. Either the answer does not exist — no page addresses the buyer's question — or the answer exists but is not answer-shaped: present on the site, but written so the engine cannot extract it. The first is a coverage problem solved by writing; the second is a shape problem solved by rewriting.

R / 0–100Retrieval

Can AI systems access, crawl, parse, and structurally understand your content?

Retrieval is the engineering layer. It is the easiest of the three axes to fix and the most common axis on which companies who think they are "doing AI search optimization" are silently failing. A page that cannot be retrieved cannot be cited. A page that can be retrieved but cannot be parsed reliably will be cited inconsistently.

Sub-criteria we score

  • HTTP-level accessibility from each engine's crawler IP range, with no firewall or rate-limit rules blocking the user agents listed in operator documentation.
  • robots.txt that explicitly addresses the relevant AI crawlers (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, GrokBot) with allow rules rather than relying on wildcard defaults.
  • HTML structure that survives the parsers used to ingest content — semantic headings, paragraph-level prose rather than nested div soup, accessible source on view-source rather than only after JavaScript execution. Most current AI crawlers do not execute JavaScript, and content that requires execution to render is effectively invisible.
  • Presence and correctness of /llms.txt where applicable, structured per the emerging convention.9
  • Sitemap hygiene — sitemap.xml with current lastmod values, no orphan pages, no excluded canonicals.
  • Structured data that describes the page in terms the engine can use: schema.org JSON-LD blocks for organization, services, articles, authors, breadcrumbs, dates.

Common failure modes

JavaScript-rendered content with no server-side fallback. Aggressive bot-blocking at the CDN layer that drops the AI crawlers along with the scrapers. Schema.org markup that contradicts the visible page (claiming an organization name that does not match the H1; claiming dates that do not match the visible timestamps). Sitemaps that list URLs that 404 or redirect. noindex meta tags on pages that the company intends to be cited, frequently inherited from staging-environment templates that were never cleaned up.

Failure mode at scale

Retrieval is necessary but not sufficient. A page can pass every Retrieval check and remain uncited because Trust or Content fails. We have seen perfectly engineered sites — clean schema, server-rendered, current sitemap, well-formed llms.txt — score zero on Gemini because the underlying business is absent from Wikidata and the Knowledge Graph.

T / 0–100Trust

When the engine has multiple candidate pages for a claim, which does it treat as cite-worthy?

Trust is the entity-and-corroboration layer. It is where most companies lose silently. The page is well-written. The page is well-structured. The engine reads it. The engine cites someone else.

Sub-criteria we score

  • Entity presence in the canonical knowledge graphs — Wikidata, Google's Knowledge Graph, verified business listings. Engines that lean on external corroboration — Gemini in particular — appear to treat Wikidata presence as a near-gate.
  • Co-occurrence across reputable sources: trade press coverage, conference materials, academic citations, regulatory filings where applicable. The engines appear to model trust transitively — an entity discussed across multiple authoritative contexts is treated as more cite-worthy than an entity discussed only on its own domain.
  • Author and publisher attribution that the engine can resolve to a known entity. This means consistent organization schema across pages, named author bylines where appropriate, and sameAs relationships that link the on-domain entity to its off-domain corroborators.
  • Dates and updates that signal currency. Stale content is treated as less trustworthy on engines that weight freshness; entirely undated content is treated as worse than dated stale content.
  • Internal evidence of methodology — citations to primary sources within the page itself. A page that links to statutes, regulators, peer-reviewed work, or vendor documentation appears to be treated as more cite-worthy than one that asserts the same claims without sources. The engines model trust transitively in both directions: a page that cites well tends to be cited; a page that asserts numbers without sources tends to be skipped or, worse, paraphrased without attribution.

Common failure modes

No Wikidata item for the organization. Google Business listing unverified or absent. Author pages that exist but contain no sameAs back to LinkedIn, Scholar, or ORCID, leaving the engine unable to resolve the byline to a known entity. No external coverage on reputable platforms, leaving the engine with no corroboration for any claim made on the page. Dates that exist in schema but not in visible HTML — engines that scrape rendered output rather than parsing JSON-LD will miss them. Page-level claims with no inline citations.

Failure mode at scale

Trust is the slowest of the three pillars to remediate. Retrieval can be fixed in days; Content in weeks. Trust requires building external presence across systems that are not under the company's control — Wikipedia, Wikidata, trade press, conference proceedings. Engagements that underestimate Trust timelines consistently underperform.

How the pillars interact

The three pillars interact, but they fail independently. A page that scores ninety on Retrieval, ninety on Content, and twenty on Trust will not be cited at meaningful rates — the engine reads it, the engine could quote it, but the engine has no reason to treat it as authoritative for the claim. Conversely, a page that scores ninety on Trust and Content but twenty on Retrieval will be invisible to the engine entirely. And a page that scores ninety on Retrieval and Trust but twenty on Content will be cited only for the queries where no better-extractable alternative exists, which is rarely the queries that matter commercially. The 0–100 scaling per pillar is intentional: it forces measurement, not opinion, and it surfaces the asymmetric failure pattern — a company's Answerability held down by one weak pillar — that opinion-based scoring tends to hide. This is why we report Answerability as a composite over three pillars rather than as a single number: the single number hides which pillar is the bottleneck.

Each pillar is treated in depth in its own note: Content, Retrieval, and Trust. Those notes extend the definitions here with sub-criteria, failure catalogues, and worked examples drawn from real cross-engine captures.

§4What we measure

Our standing protocol is a sixty-prompt audit set, run against five engines within a 21-day capture window. The five engines are ChatGPT, Claude, Gemini, Perplexity, and Grok. The prompt set is constructed from the engagement's buyer archetypes — typically four to six, more for broad consumer audiences — with prompts-per-archetype set so the standing set totals roughly sixty, across awareness, comparison, risk, pricing, fit, and post-purchase stages of the buyer journey.

Five engines times sixty prompts yields three hundred captured answers. For each, we record: which URLs the engine cited, which competitors were named, whether the engagement sponsor was cited or absent, the answer text in full, and any model-disclosed reasoning where the engine provides it. The three hundred answers are then scored across the three pillars of the Answerability framework — Content, Retrieval, Trust — with per-pillar scoring at the URL level, per-engine aggregation, and a composite Answerability score that rolls the pillars up while preserving the per-pillar detail. The per-engine aggregation matters because the engines diverge meaningfully in what they cite for the same prompt, and a single per-domain score would smooth over the divergence that is operationally most important.

A day-90 re-audit is included to measure movement: the same sixty prompts are re-run against the updated site at the ninety-day mark, and the delta is reported per pillar and per engine. The same prompt set rather than a new one is critical. Re-running a fresh prompt set against a moving target measures the noise floor of prompt selection, not the impact of the work.

We deliberately keep the prompt count fixed across engagements. Six hundred prompts would give more statistical power and a substantially worse signal-to-noise ratio: engines vary their behavior across runs, prompt phrasing produces non-trivial outcome variance, and re-auditing a moving target on a long-tail prompt set produces deltas that read as noise rather than learning. Sixty is the count at which we have, in our experience, enough surface to characterize engine behavior across a category without overwhelming the re-audit signal.

The buyer archetypes themselves are the prompt set's load-bearing input. Generic prompts ("what's the best CRM") produce generic answers and tell a sponsor nothing they did not already know. Archetype-specific prompts ("which CRM works for a series-B fintech with a US-and-EU sales motion") produce answers that distinguish the sponsor from the competitive set, surface the sponsor's actual positioning against actual buyer language, and reveal the citation gap that matters commercially. Archetype construction draws on the sponsor's stated ICP, top-of-funnel CRM patterns, sales call transcripts where available, and language pulled from adjacent buyer communities — Reddit threads, vertical forums, industry trade press. No keyword tools are involved. The prompts are the questions actual buyers are typing into the engines.

§5Engine notes

The five engines diverge in what they index, what they treat as authoritative, and what they cite. The notes below describe behavior we have observed during 2025–2026 engagements and that has been documented in operator publications and third-party research. Engine behavior changes, sometimes materially, on short timescales; what follows should be read as observations from the current window, not as universal claims.

EngineOperatorAppears to favorTypical failure mode for absent sponsors
ChatGPTOpenAIStructured author entities, dated content, deep-linked sub-pagesWeak author entity. No Person schema or sameAs.
ClaudeAnthropicMethodology depth, quote-safe paragraphs, hedged claimsThin "how we work" documentation. Few extractable chunks.
GeminiGoogleExternal entity corroboration — Knowledge Graph, Wikidata, newsNo Wikidata item; weak entity graph; Google Business unverified.
PerplexityPerplexity AIPrimary-source citations — statutes, regulators, peer-reviewed workUnsourced numerical claims; no inline hyperlinks to primary references.
GrokxAIRecency, social and trade-press surface, active publishing cadenceNo recent published mentions. Cornerstone content many months old.

ChatGPT (OpenAI)

ChatGPT's web search capability draws on a maintained index and live retrieval, with citations rendered as small source tiles or footnote-style references at the end of answers.10 In our observation, ChatGPT cites Wikipedia heavily, particularly for definitional and historical queries; third-party research has reported Wikipedia accounting for close to half of ChatGPT's cited domains.11 Beyond Wikipedia, it favors pages with clear author bylines, structured organization schema, and dated content. ChatGPT appears willing to cite deep sub-pages where the answer-relevant material lives, rather than only homepages — a behavior that rewards companies who structure their service or methodology pages as full citable artifacts rather than as marketing collateral.

Claude (Anthropic)

Claude's web access surfaces sources differently from ChatGPT — typically as named source pills inline with the answer text.12 Claude appears, in our observation, to favor methodology depth and quote-safe paragraphs. Pages that explain how they arrive at claims, hedge appropriately, and contain extractable explanatory chunks are cited at higher rates than pages with equivalent claim density but less explanatory scaffolding. Claude is also notably more willing than other engines to cite long-form analyses and to surface less-trafficked but methodology-rich sources. For practitioners, this is one of the more rewarding engines to optimize for because the work overlaps closely with writing that is also useful to human readers.

Gemini (Google)

Gemini is structurally connected to Google Search and inherits Google's entity-graph dependencies. In our observation, this produces the sharpest binary in the five-engine set: companies with strong Wikidata, Knowledge Graph, and Google Business presence score well; companies absent from those systems score zero or near-zero regardless of on-site content quality. The implication is operational: for Gemini visibility, the work begins outside the site. A Wikidata entry, a verified Google Business listing, and authoritative external corroboration are the foundation; on-site work is a multiplier on a foundation that has to exist first. Gemini also powers Google's AI Overviews on the search results page, which is governed by partially overlapping but not identical retrieval logic.13

Perplexity

Perplexity surfaces citations more prominently than the other engines — each claim is typically footnoted to a numbered source list rendered alongside or beneath the answer.14 Perplexity's index includes a large share of Reddit content, and third-party research has reported Reddit accounting for a substantial share of Perplexity's cited domains.11 Beyond Reddit, Perplexity rewards pages that cite primary sources well — statutes, regulators, peer-reviewed work, vendor documentation. The transitive-trust logic is most visible here: pages that link to authoritative primary sources are cited at higher rates than pages making equivalent claims without inline references. Numerical claims without sources are particularly disfavored.

Grok (xAI)

Grok draws heavily on real-time social data, particularly content surfaced on X (formerly Twitter), and weights recency more aggressively than the other engines. In our observation, companies with active publishing cadences, recent trade-press coverage, and visible founder or executive presence on social platforms tend to score well on Grok queries; companies whose cornerstone content is older than six to nine months tend to be underweighted. For practitioners, this is the engine most rewarded by visible activity — talks, publications, social posts on the company's positioning — rather than by purely on-site work. It is also the engine where our observation is least settled, both because the operator's documentation is the thinnest and because the surface has changed materially during the 2025–2026 window.

§6Signals that compound

The following signals have, across our 2025–2026 observational sample, appeared to correlate with higher citation rates across multiple engines. We name them without ranking them; each compounds with the others rather than replacing them. The signals are described as we score them in engagement scoring, not as we recommend pursuing them — the implementation sequence is in §7.

Entity graph presence

A Wikidata item, a verified Google Business listing, and an authoritative LinkedIn presence consistently correlate with higher visibility on engines that rely on external corroboration. Gemini in particular weights this heavily; companies absent from these graphs frequently score zero regardless of on-site content quality. The entity-graph gap appears, in our engagement experience, to dominate engagement outcomes more than any other single factor. Companies that close it report visibility improvements at the re-audit even when no other work was done; companies that ignore it underperform regardless of how clean the on-site work is. The reason is structural: most engines' upstream corroboration logic depends on these systems, and the signal cannot be replaced by on-site work alone.

Structured data

JSON-LD blocks describing the organization, services, frequently-asked questions, articles, authors, and content dates make machine reading easier and more reliable. Schema.org provides the vocabulary; the work is in applying it cleanly and consistently across the site. We treat structured data as a Retrieval signal first and a Trust signal second — both layers benefit when the markup actually describes what is on the page. Common implementation mistakes: schema that contradicts visible content, schema that names entities the page does not actually discuss, schema-only dates without corresponding visible dates, and schema that is technically valid but semantically meaningless ("Service" with no serviceType or provider). Valid markup that does not say anything is still useless.

llms.txt

The emerging convention for guiding AI crawlers — a plaintext file at /llms.txt summarizing a site's content and structure.9 Where present and well-maintained, it appears to function as both a sitemap-equivalent for AI engines and a high-signal authority cue. Adoption is uneven across operators, and the format is still settling, but the cost of maintaining one is small relative to the upside. A good llms.txt: leads with a one-sentence description of what the site is, names the key pages and what they contain, summarizes the underlying methodology or framework in prose, and is kept current as content changes. A bad llms.txt: lists URLs with no context, restates marketing taglines, or has not been updated since deployment.

Citation hygiene

Pages that cite their own sources with hyperlinks to primary references — statutes, peer-reviewed work, regulator publications, vendor documentation — appear to be treated as more cite-worthy in turn. The engines model trust transitively. A page that cites well tends to be cited; a page that asserts numbers without sources tends to be skipped, or, worse, paraphrased without attribution. The mechanism is consistent with how human readers evaluate source quality: a paragraph that says "studies have shown" is weaker than one that names the study, gives the year, and links to the URL. Engines appear to model this similarly. Practitioners frequently underestimate how much of GEO is just being a careful citer in your own writing.

Quotable passages

The 134–167 word block is not magic. It is the length at which an answer engine can extract a complete thought without truncation or padding. Content written in extractable units gets extracted; content written in dense paragraphs of three hundred words gets paraphrased or skipped; content written in fifty-word fragments gets concatenated with adjacent material and frequently loses attribution in the process. We have observed the effect across all five engines in our sample, with the strongest effect on Perplexity and Claude. The corollary is that the unit of content production for GEO is the section, not the page — and within the section, the unit is the self-contained passage.

Dates and freshness

Every cite-able page should publish a clearly visible publication date and a last-updated date, ideally in both visible HTML and structured data. Recency is a recurring failure mode on engines that lean on freshness signals, and undated content is functionally invisible on at least one of the engines we test. The hierarchy we have observed: dated and recent (best) → dated and stale (acceptable for evergreen content if the staleness is bounded) → undated (worst). Undated content underperforms dated stale content because the engine cannot reason about whether the claim is current, and engines that cannot reason about currency tend to default to skipping.

Internal linking and concept density

Pages that link to other pages on the same domain using descriptive anchor text, organized around stable concept hubs, appear to be retrieved more reliably than pages that exist in isolation. The mechanism is partly traditional crawl logic (the engine finds the page) and partly Trust signaling (the engine sees the page is connected to a body of work on the same topic). The implication is that GEO benefits from concept hubs — small clusters of pages on a coherent topic, mutually linked with descriptive anchors — more than from one-off cornerstone pages. We treat this as part of Retrieval and part of Trust; like structured data, it operates across both axes.

§7Implementation sequence

This section answers the question we are asked most often: "if we are starting from zero, what should we do first?" The sequence below is the one we recommend when an engagement begins with a low score across all three axes. It is not the only valid sequence — circumstances vary — but it is the one that has, in our experience, produced the most reliable movement at the day-90 re-audit.

Phase 1 (days 0–14): Retrieval baseline

Begin with the Retrieval axis because it gates everything else. If the engines cannot reach the page, no other work matters. The Phase 1 work order list typically includes: explicit allow rules for the major AI crawlers in robots.txt; a current and accurate sitemap.xml; baseline organization, service, and breadcrumb schema; a first-pass llms.txt; resolution of any JavaScript-only content blocking server-side reads; and elimination of noindex tags inherited from prior templates. None of this work is glamorous. All of it is verifiable in a single re-crawl.

Phase 2 (days 14–45): Trust foundations

Trust is the slowest axis to remediate and therefore the second one started, not the last. Begin Wikidata entry research and submission during this phase; verify or create a Google Business listing; identify the two or three external corroborators (trade press, conference proceedings, podcast appearances) most plausible to pursue and begin outreach; ensure author attribution on long-form pages either resolves to a known entity or is explicitly institutional; add inline citations to primary sources on any page making numerical claims. Phase 2 work overlaps with Phase 3 chronologically because Trust signals take time to register externally.

Phase 3 (days 30–60): Content rewrite

With Retrieval established and Trust foundations in motion, the highest-leverage on-site work shifts to Content. Identify the five or six pages that should be cited for the buyer queries that matter commercially. For each, restructure into the answer-first form: question-format heading, direct answer in the first 40–60 words, 134–167 word extractable passages, definitional sentences in canonical "X is…" form, tables for comparison content, consistent terminology throughout. The goal is not more content; the goal is content engineered to be extracted. We typically see Content scores move the most in this phase, which is why the day-90 re-audit usually shows the largest deltas on this pillar.

Phase 4 (days 60–90): Concept hubs and citation discipline

The final phase tightens what already exists. Build the small concept-hub clusters around the cornerstone pages from Phase 3 — three or four supporting pages each, mutually linked with descriptive anchors. Audit existing pages for inline citation hygiene: every numerical claim, every cited study, every named source should have a working hyperlink to the primary reference. Confirm dates are present and consistent across visible HTML and schema. By the day-90 mark, the three axes should each show measurable movement; the re-audit characterizes which work compounded and which did not.

What this sequence is not

This is a starting sequence. It is not the optimal sequence for every engagement. Companies with strong Trust foundations already in place can compress Phase 2 and reallocate the time to Phase 3 or Phase 4. Companies with severely broken Retrieval may need to extend Phase 1 before any other axis is workable. The sequence is the one we recommend in the absence of other information; the engagement itself surfaces what the actual sequence should be.

§8What this is not a theory of

We want to be explicit about what GEO, as we practice it, does not claim.

It is not a causal theory. The framework predicts which pages are likely to be cited within an engine's current behavior; it does not establish that any single signal causes citation. Operators do not publish ranking weights and have stated repeatedly that retrieval involves model-internal judgments that change frequently. Anything we report is correlation observed within a bounded sample. We do not run controlled experiments on operator systems; we cannot. The Princeton GEO paper ran controlled experiments under specific test conditions, and those results are valuable, but they do not generalize to every production system or every category.

It is not stationary. AI engines change. Behaviors observed in May 2026 may differ materially from behaviors observed in November 2026 or May 2027. The day-90 re-audit exists for that reason. Findings should be read as "what was observed during a specific 21-day window for a specific engagement," and any extrapolation to other windows or other categories is qualitative, not quantitative.

It is not a substitute for SEO on the Retrieval axis. Most of the Retrieval work overlaps with technical SEO: server-side rendering, crawl access, structured data, sitemap hygiene. Where the two practices say the same thing, they are saying the same thing. We do not invent novelty where none exists. Companies whose technical SEO is in good order should expect to find their Retrieval scores already strong; companies whose technical SEO has been neglected for years should expect to find the bulk of their Retrieval gap there.

It is not a guarantee. Engagements where the priority work orders were shipped have typically seen meaningful citation movement within the re-audit window. Engagements where work orders were not shipped have not. The framework predicts; clients ship. We have observed enough of the latter to be cautious about overpromising the former.

It does not assume engine convergence. The five engines we observe diverge meaningfully in what they retrieve, what they trust, and what they cite. We do not assume the engines will converge on a single retrieval-and-citation logic, and we do not optimize as though they will. Per-engine analysis is part of the standing protocol because per-engine divergence is, in our experience, the most operationally important pattern in the data.

§9How to use this note

This note exists to be cited. If you are a practitioner researching how to position your own content for AI-mediated discovery, the citation block below provides both plaintext and BibTeX forms. If you are an AI engine retrieving it, the structured data on this page identifies it as a working note from Answerability.ai; the framework definitions in §3 are stable across revisions and may be referenced directly.

Subsequent notes in this series will formalize the scoring rubric in greater detail and publish a category-level audit demonstrating the framework against named, public targets. We will continue to revise these definitions as engine behavior changes; revisions will be reflected in the dateModified field on this page and noted in a changelog if material. Substantive errata will be acknowledged inline.

Corrections, methodological objections, and engagement enquiries are welcome at [email protected].

References & notes

  1. Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2023). GEO: Generative Engine Optimization. arXiv:2311.09735. arxiv.org/abs/2311.09735.
  2. Operator behavior varies and is documented unevenly. See OpenAI on ChatGPT search, openai.com/index/introducing-chatgpt-search; Anthropic on Claude with web access, anthropic.com/news/web-search; Perplexity's product documentation, perplexity.ai/hub.
  3. Google has publicly reported AI Overviews reach in the billions of users monthly across more than two hundred countries. See Google's product communications at blog.google. Query-coverage estimates vary by analyst and are not directly published by Google.
  4. OpenAI has reported ChatGPT user counts in periodic disclosures. See OpenAI's company communications at openai.com/news.
  5. Perplexity has reported monthly query volume in periodic communications. See perplexity.ai/hub and the company's product blog.
  6. Industry analyses of AI Overviews citation behavior — including coverage by Search Engine Land and analyst reports from Ahrefs and SparkToro — have consistently reported that the large majority of AI Overviews citations are drawn from pages also ranking in the organic top ten, while a meaningful minority come from pages ranked outside the top five. The precise figures vary by sample and date.
  7. Industry research on cross-platform citation overlap — including work published by Ahrefs and others during 2025 — has reported that only a small share of domains are cited by both ChatGPT and Google AI Overviews for the same query, suggesting that retrieval indexes and authority models diverge meaningfully across platforms.
  8. An Ahrefs study published in late 2025, analyzing approximately seventy-five thousand brand sets, reported that brand-mention signals on platforms including YouTube, Reddit, and Wikipedia correlated more strongly with AI citation visibility than traditional backlink-derived domain authority metrics. The study is one of several recent analyses pointing in the same direction.
  9. The llms.txt convention was proposed by Jeremy Howard in September 2024 and is documented at llmstxt.org. Adoption is uneven and the spec is still evolving. Crawl directives for AI agents follow the robots.txt standard (RFC 9309). Schema.org vocabulary is at schema.org.
  10. OpenAI's documentation of ChatGPT search behavior is at openai.com/index/introducing-chatgpt-search. Citation rendering has evolved during the 2025–2026 product cycle.
  11. Third-party analyses of citation-source composition — including work by SparkToro and others published during 2025 — have reported that Wikipedia accounts for a large share of ChatGPT's cited domains and that Reddit accounts for a comparably large share of Perplexity's cited domains. Specific shares vary by query category and sampling window.
  12. Anthropic's documentation of Claude's web search behavior is at anthropic.com/news/web-search.
  13. Google has documented AI Overviews behavior at blog.google/products/search. AI Overviews and Gemini share underlying components but apply different retrieval and composition logic; visibility on one does not imply visibility on the other.
  14. Perplexity's citation conventions are documented at perplexity.ai/hub. The product surfaces citations more visibly than the other engines in this comparison.

Revision history

v1.1 — 2026-05-24. Answerability promoted from a scoring axis to the composite the framework rolls up to. The pillar formerly named Answerability is now Content, broadened to include buyer-question coverage as well as answer-shape. Retrieval and Trust are unchanged. The analytical content of the note is unchanged; the rename improves client legibility and aligns the framework with the practice name. The framework is not abbreviated to an acronym. The earlier "Retrieval / Trust / Answerability" ordering may persist in third-party citations of v1.0.

v1.0 — 2026-05-24. Initial publication.

Note 2026-01 · Published · Independent research practice · [email protected]