Guide · The retrieval layer

The retrieval layer nobody optimizes for

Every argument about "AI visibility" happens at the chatbot. The decisions about whether you make it into the answer happen one floor down — in search infrastructure most companies have never heard of, let alone optimized for.

Layered translucent diagram sheets on a dark surface — the stacked retrieval layer beneath AI

When you ask ChatGPT about a company, the chatbot is the part you can see. Behind it sits a quieter machine: a system that turns your question into searches, sends those searches to one or more web indexes, and pulls back the handful of pages it will actually read. That middle system — the part that decides which documents even enter the room — is the retrieval layer. It is where being found is won or lost, and almost nobody is optimizing for it, because almost nobody knows it's there.

Define your terms

Retrieval Surface: the specific slice of documents, pages, and entity records that AI systems can reach, parse, and repeatedly surface when answering questions about a company — the machine-readable shadow your brand casts across embeddings indexes, structured feeds, and agentic search layers. It is not the same as "your website."

The reason this matters now, and didn't 18 months ago, is that the retrieval layer stopped being one thing. The answer layer is plural, embeddings-driven, and agent-consumed. Three shifts, each worth understanding.

The plumbing changed, and nobody told the marketers

For a decade, "search a webpage index programmatically" effectively meant the Bing Search API. A huge share of apps that needed real-time web data — including early AI products — were quietly built on it. Then Microsoft retired the Bing Search API on August 11, 2025, having disabled new API keys months earlier, in March.1 Apps broke; pipelines had to be re-plumbed.

Microsoft's own replacement, Grounding with Bing Search inside Azure AI Foundry, isn't a like-for-like swap: it returns synthesized answers with citations rather than raw results, and it comes with platform lock-in. Independent replacements exist, but reporting at the time put their cost anywhere from 40% to 483% higher than the old API.1 A boring piece of infrastructure went away, and the boring change reshaped who AI systems can see.

A new layer rose: search built for machines

Into that gap came a category of search engines whose customer is not a human but a model. They return clean, structured, token-efficient content designed to drop straight into an LLM's context window. Described by the role each plays, as of May 2026:

LayerWhat it isWhat it returns to the model
ExaNeural / embeddings search (plus keyword and an auto mode); built for agents and RAGConcept-matched excerpts — "highlights" — for roughly a 90% token reduction vs full pages2
BraveAn independently crawled web index — 35B+ pages, 100M+ changes/day3Pre-chunked, relevance-ranked markdown for grounding
TavilyA search layer purpose-built for RAG pipelinesAggregated results from up to 20 sites per call3
Perplexity APIA retrieval + synthesis serviceSummarized answers with citations, rather than raw context

Roles, not rankings — and benchmarks deliberately disagree on a "best." In one independent 100-query test the top several providers were statistically indistinguishable.4 Which is the point: this is a plural layer, and no single index sees the whole web — or all of you.

The key word is embeddings. Exa and its peers don't match your keywords; they match concepts, by turning text into vectors and finding what sits nearest in meaning. So whether you surface for "battery storage challenges" can depend on whether your content reads, in vector space, like "difficulties accumulating renewable power" — even though you never used those words. Keyword SEO does not reliably move that.

You have a Retrieval Surface — and it's probably thin, and inconsistent

Here is the reframe. The chatbots you test are the surface; underneath, agents are querying these embeddings-native indexes, and you may simply not be in some of them. Being "on Google" does not mean being legible to Exa, retrievable from Brave's index, or chunked cleanly by Tavily. Your Retrieval Surface is the union of where the machine layer can actually reach you — and for most companies it is both thinner and more uneven than they assume.

Uneven, because each layer holds a different index and matches differently. So the same question can resolve differently — or not at all — depending on which retrieval path an engine took. That is not a glitch; it is the structure.

Illustrative reconstructionone query · three retrieval paths
"Who are the leading firms for [your category]?"

Path A

Firm A · Firm B · Firm C — built from encyclopedic + review-site sources

Path B

Firm B · Firm D · Firm E — built from community discussion and recent threads

Path C

your firm — not retrieved on this path

Illustrative, not a transcript. Different retrieval layers hold different indexes, so the same question yields different shortlists — and on some paths a real company simply isn't reachable. The shape is the lesson; the names are stand-ins.
The retrieval stack: query → fan-out → search APIs → the indexed web → the cited web THE RETRIEVAL STACK — WHERE BEING FOUND IS DECIDED You ask Chatbot Fan-out queries Search APIs · Exa / Brave / Tavily Indexed Web everything the search APIs can reach (billions of pages) you Cited Web the few sources that survive into the final answer you ↑ the gap between Indexed and Cited is what a diagnostic measures — your slice shrinks at each step your reachable slice across both = your Retrieval Surface
Your slice narrows at every step: present in an index isn't retrieved; retrieved isn't cited. The Retrieval Surface is the reachable union — and the gap to the Cited Web is where most companies lose.

How to think about your Retrieval Surface

You don't need our help to start reasoning about this. The useful questions are concrete, and they map to the things the layer actually checks:

Answer those honestly and you'll already know more about your Retrieval Surface than most of your competitors know about theirs. None of it requires buying anything; it requires looking at the layer instead of the chatbot.

We map this layer for a living — measuring a company's Retrieval Surface across the five major engines is what the AI Answerability Diagnostic does. But the structure above holds whether or not you ever talk to us. That's rather the point: the retrieval layer is a real, describable thing, and it was worth describing.

References

  1. Bing Search API retirement (effective Aug 11, 2025; key creation disabled March 2025) and replacement guidance: Microsoft Lifecycle announcement. On replacement cost (40–483% higher) and the re-architecture impact: PPC Land (2025); The Register (2025).
  2. Exa product and benchmark claims (neural/keyword/auto; "highlights" ≈ 90% token reduction; 54.4% on FRAMES vs Perplexity 44.5% / Brave 21.6%): exa.ai; morphllm.com (accessed May 2026).
  3. Search-API roles and index sizes (Brave's independent index ~35B+ pages, 100M+ changes/day, pre-chunked markdown; Tavily aggregating up to 20 sites/call): AIMultiple, "Agentic Search" (2026) and vendor documentation.
  4. Benchmark indistinguishability among top providers (100-query evaluation): AIMultiple, "Agentic Search" (2026).

Guide · Published · [email protected]