The Answerability Index · commercial insurance brokers · pilotReal capture · 2026-05-28 · updated 2026-06-01

We asked five AI systems which commercial insurance brokers to consider. They named 72 — and agreed on almost none.

Name: Which Commercial Insurance Brokers AI Systems Surface — The Answerability Index
Published: 2026-05-28
License: https://answerability.ai/terms

Across six commercial buyer-intent prompts run three times against ChatGPT, Claude, Gemini, Perplexity, and Grok in May 2026, the five engines named 72 distinct brokerages with an inter-engine Jaccard overlap of 0.38. No prompt produced a unanimous top broker. The top three brokerages aggregated across all six prompts captured only 27% of total citations — a sharply lower concentration than B2B SaaS, where the same protocol produced 0.85 overlap and three unanimous winners. ChatGPT systematically defaults to global brokers (Marsh, Aon, Lockton); Perplexity and Grok lean toward insurtechs and regional specialists (Embroker, Founder Shield, Vouch). The two cleanest signals appear to be corroboration density and entity clarity. A brokerage operating in the United States today faces five different shortlists in five different engines, and the question for any in-market broker is which shortlist they are on, not how high they rank.

Five engines · five shortlists Buyer prompt: brokers for venture-backed startup cyber & E&O (INS-02)

72 brokers named · 0.38 inter-engine overlap · 0% unanimous top broker · 27% held by the top three aggregated across all six prompts · the exhibit above shows only INS-02

Executive takeaway

Commercial-insurance retrieval is fragmented by risk type, company size, and specialty language — not settled. Broad broker authority wins the generic questions; explicit, machine-readable specialization wins the high-intent ones (cyber, D&O, construction). The consequence for a broker: AI is already reshaping discovery — which firms make the shortlist — well before it touches placement, which still runs on human expertise.

What this means for you

0.38 inter-engine overlap. Five AI engines disagree on which broker to name — for every commercial buyer prompt we tested, the engines return five different shortlists.
72 brokers named, no unanimous winner on any prompt. ChatGPT defaults to global brokers (Marsh, Aon, Lockton); Perplexity and Grok lean toward regional specialists and insurtechs (Embroker, Founder Shield, Vouch) instead.
Embroker is the only firm named by all five engines — on the startup cyber & E&O prompt (INS-02). When a buyer’s question matches the content you publish, retrieval flips from molten to settled. That’s the unit of opportunity here.
Implication for your firm: the buyer-side consequence is that your prospect arrives at the demo call with a name already in mind, and which name depends on which engine they used. The Diagnostic identifies which engines surface your firm on which lines of business, and where the closeable gap sits.

Cost of inaction: 72 named brokers and no unanimous winner means each engine routes its prospects to a different five-firm consideration set. If your firm is missing from three of the five engines, the buyer prompts that route through those engines route past you — currently, not someday.

For what to do about it, see the Commercial Insurance Brokers industry brief →

What this page measures

Each row is not a ranking. It is observed surfacing — how often a company entered the AI-mediated consideration set across a bounded battery of buyer questions. The heatmap maps citation territory: for each question the engines repeatedly surface a small set of companies, and those companies currently hold the answer layer for that question. The question is not "who is best?" — it is "who appears when the buyer asks?"

Observed surfacing — not endorsement, advice, or a suitability judgment. This measures observed AI surfacing behavior, not broker quality, broker suitability, or recommendation quality. Insurance is regulated and varies by state and line; nothing here is insurance advice or a solicitation. These pages sit inside the same Content / Retrieval / Trust architecture as the rest of our working papers on AI-mediated buyer discovery.

What AI returns depends on the kind of risk being placed

Observation The 0.38 overlap is an average, and it hides the real pattern: hold one buyer question fixed, vary only the engine, and not only do the firms change — the type of firm changes with the question. Across the six situations we tested, four distinct retrieval patterns appeared.

Buyer situation	What AI surfaces	Why (mechanism)	Implication
Established & generic mid-market manufacturer	National brokers — Lockton, Aon, Marsh. The most concentrated question (overlap 0.27).	Breadth plus the largest corroborated public footprint; the engines reach for the obvious incumbents.	Incumbents dominate; on-page work alone rarely displaces them.
Complex / multi-line multi-location placement	The widest, most fragmented field — 23 firms, overlap 0.11, no consensus.	No single firm is corroborated as "the" answer for complex risk, so the engines spread.	Open territory — specialization and segment content can claim it.
Digital-native startup cyber & E&O	Insurtech wins: Embroker (named by all five engines), Founder Shield, Vouch — the global brokers fall back.	Content-first, machine-readable firms, corroborated in startup channels, out-surface size.	A legible, answer-shaped digital presence beats incumbency.
Product-led construction liability	Carriers appear alongside brokers.	The query implies a coverage product, not an advisor, so distribution shifts upstream.	Brokers must own the advisory framing to stay in the set.

Implication "AI visibility" is not one thing in insurance — it is per line of business. A broker can own the manufacturer answer and be entirely absent from cyber. A single score would hide that; the measurement has to be taken line by line.

Why insurance retrieval fragments

Most categories settle on a canonical answer. Commercial insurance resists it for structural reasons — and knowing them is the difference between "we should do some SEO" and knowing which page to build.

Specialization is the unit of relevance. Retrieval rewards explicit risk-class language — "cyber for SaaS," "wrap-up for general contractors," "D&O for pre-IPO." A generic "commercial insurance" page matches everything weakly and owns nothing.
Broker and carrier blur. Brokers place coverage; carriers underwrite it. When a prompt sounds product-like, the engines pull carriers into a broker question — so the firm that frames itself as the advisor for a risk, not the product, holds the slot.
Local versus national. National brokers own breadth; regional and specialty firms own depth in a line or a geography. The engines resolve that tension differently — which is why Perplexity surfaces regional shops the others never name.
Entity ambiguity from consolidation. A roll-up industry means acquired brands, regional names, and parent groups blur together; the engines struggle to credit one firm, and reputation leaks across entities.
Complex placements reward specificity. The harder the risk, the more the engines favor narrow, demonstrated expertise over size — which is why complex, multi-line placement is the most fragmented question of all.

Category temperature: how settled is the answer?

Frozen

High cross-engine overlap (Jaccard ≥ ~0.55) and low rank variance — the engines have converged on a small canonical set. On-page changes alone are unlikely to displace the top tier; the question shifts to defending edge cases.

Molten

Moderate overlap (~0.30–0.55) and high churn across engines — the answer has not set. Adjacent firms enter and exit the set, and answer-shaped, entity-clear, retrievable content can still claim citation territory.

Commercial insurance brokers are a molten category — mean inter-engine overlap 0.38, and no prompt produced the same top broker across all five engines. For calibration, US airlines is frozen (0.64); industrial machinery is molten (0.34).

Temperature	What it means	What tends to work
Frozen	The same firms across every engine	Brand authority and corroboration; on-page work alone rarely displaces the top tier
Molten	The set reshuffles by engine; no consensus	Claim territory — answer-shaped, entity-clear content can still change who gets named
Fragmented	A different answer for each buyer question	Persona- and line-specific retrieval surfaces — win each question, not "the category"

How to read this

Observed surfacing & cross-engine divergence5 engines · 6 prompts · 3 runs/engine · captured 2026-05-28

Surfacing rate0%100%· share of the 6 prompts in which the broker was surfaced

σ — cross-engine divergence (std. dev. across the 5 engines)

Hover or tap any engine cell above to see the real prompts behind that number.

AEven the leaders are contested. The most-surfaced brokers — Hub International, Marsh, Aon, Gallagher — still swing hard by engine: Marsh appears on every ChatGPT run but only about half of Gemini's and Perplexity's. No broker was the unanimous top pick on a single prompt, and on average only ~1.3 brokers per question were named by all five engines.
BThe split is engine-personality, not noise. ChatGPT leans on the global giants (Marsh, Aon, Lockton at 100%); Perplexity ranks those low and instead surfaces a long tail of regional and specialty brokers no other engine names. Ask ChatGPT and Perplexity the same question and you get a different broker.

One question, five answers

The clearest way to see fragmentation is to ask all five engines the same question and read across. Here is the broker each one recommends for a venture-backed startup buying cyber & E&O coverage — top three, by surfacing. ChatGPT names the global giants; every other engine has moved to insurtech.

ChatGPT

Aon

Marsh

Lockton

Claude

Embroker

Corgi

Woodruff Sawyer

Gemini

Embroker

Insureon

Founder Shield

Perplexity

Founder Shield

Embroker

Nadler

Grok

Embroker

Alliance Risk

Founder Shield

National brokerInsurtech / digitalRegional / specialty

Top three per engine for "which broker should a venture-backed startup use for cyber & E&O?" — observed across three runs, 2026-05-28. Only ChatGPT returns the incumbents; the other four surface digital-native firms. The same buyer, on a different engine, meets a different market. What that means for a broker →

The six prompts behind these numbers (commercial / B2B buyer-intent)

Which commercial insurance brokers are commonly recommended for mid-market manufacturers?
Which insurance brokers are commonly suggested for venture-backed startups buying cyber and E&O coverage?
Which commercial insurance brokers are commonly surfaced for multi-location businesses with complex risk?
Which insurance brokers are commonly recommended for construction firms with liability exposure?
Which commercial insurance brokers are commonly recommended for healthcare practices and groups?
Which insurance brokers are commonly suggested for placing management-liability and D&O coverage?

Scope — commercial brokers, US. We measured the firms that help a business buy insurance (brokers and agencies), not carriers/underwriters; carriers that surfaced (e.g. Chubb, Travelers, The Hartford) were set aside as a different question. Broker names were canonicalized to a parent where engines used variants (e.g. "Arthur J. Gallagher & Co." → Gallagher; the Marsh-family names → Marsh). The rows shown are the brokers surfaced across multiple engines; a long tail of single-engine regional shops is summarized in the notes, not named individually.

Strategic reading: a molten field is contestable

Unlike a frozen category, no single broker owns the commercial-insurance answer. The most-surfaced names are large national brokers — but even they are named inconsistently, and a wide field of regional and specialty firms surfaces on one engine and not the others. For a broker, that is the opportunity: the answer is not locked.

What appears to move surfacing here, in rough order: corroboration density (how redundantly the web names you for a line and a segment), entity clarity (whether your firm resolves cleanly across acquired brands and regional offices — a recurring problem in a roll-up-heavy industry), retrieval surface (whether an AI crawler can reach your practice and line-of-business pages), and answer-shaped content for the specific buyer question — cyber for startups, D&O, contractor liability. A broker named by only one engine has a legible exposure, and a path to claim the territory the others haven't settled. Hover any cell above to see the prompts behind each number.

Two engines, two different brokers

The clearest pattern is not a ranking — it is that the engines have personalities. ChatGPT behaves like a brand-name search: it returns the global brokers (Marsh, Aon, Lockton, Gallagher) on almost every prompt. Perplexity behaves like a directory crawl: it ranks those giants low and surfaces regional and specialty brokers — names no other engine mentions — drawn from agency directories and local "best brokers" lists. Claude, Gemini, and Grok sit between the two. The same buyer need, asked of two systems, returns two different shortlists.

And one firm is punching above its size. Hub International surfaces most overall — ahead of both Marsh and Aon — despite not being a top-tier global broker by revenue. In a molten category, a deep, machine-readable content footprint can out-surface market share: AI prominence and real-world size are not the same ranking.

Search builds a candidate list. AI builds the consideration set.

Observation Search hands a buyer a candidate list — a page of links to work through. The five engines hand back a consideration set — a short, named shortlist — before the buyer clicks anything. The shortlisting has already happened.

Mechanism The model has already done the shortlisting. It read the directories, the trade press, and the firms' own pages, and handed back a handful of names. In this capture the engines leaned on Risk & Insurance, Insurance Journal, and broker directories more than on the firms' own claims — earned media decides it.

Implication The consideration set now forms privately, inside the model, before a sales conversation. If your firm is not in it, you are not losing the deal — you are never in the running. See what your firm's set looks like →

Observed hypotheses

Patterns this capture is consistent with — stated as hypotheses, not conclusions, given a bounded sample.

Specialization is legible. Firms with explicit vertical or line specialization appear more often on specialized prompts — insurtech on cyber/startup, specialty brokers on complex placement.
Breadth wins the generic question. Broad, established prompts favor the largest national brokers and the directories that aggregate them.
Advice versus product shifts the surface. When a prompt implies a coverage product, carriers enter the set; when it implies advisory or complex placement, brokers dominate.

What to do next

Find the lines where you never appear — that is where the consideration set forms without you.
Build answer-shaped, segment-specific pages (cyber, D&O, construction) rather than one "commercial insurance" page.
Resolve your entity across acquired brands and offices so the engines credit a single firm.
Earn third-party corroboration — trade press, directories, associations — which the engines weight over your own claims.

Methodology note. A bounded pilot capture: 5 AI systems, 6 commercial buyer-intent prompts, 3 runs per engine, captured 2026-05-28. We measured commercial insurance brokers (firms that place coverage), not carriers; carrier names that surfaced were set aside. Rows show observed surfacing within this prompt battery — not endorsements, quality rankings, broker suitability, or general market-share estimates. Broker names were canonicalized from extracted outputs; ambiguous aliases were reviewed. The Answerability Index · pilot.

Research publication based on sampled AI outputs collected on 2026-05-28. Findings reflect observed outputs in this sample and are not statements of company quality, broker suitability, recommendation quality, or business performance, and are not insurance advice or a solicitation.

We asked five AI systems which commercial insurance brokers to consider. They named 72 — and agreed on almost none.

What this page measures

What AI returns depends on the kind of risk being placed

Why insurance retrieval fragments

Category temperature: how settled is the answer?

Frozen

Molten

How to read this

One question, five answers

Strategic reading: a molten field is contestable

Two engines, two different brokers

Search builds a candidate list. AI builds the consideration set.

Observed hypotheses

See where AI routes your buyers