When AI engines disagree, the disagreement has shape.

Category temperature: frozen, fragmented, and molten markets — with a two-axis frame for the structure inside fragmentation.

Method 5 industries · 6 buyer-intent prompts per industry · 5 AI engines (ChatGPT, Claude, Gemini, Perplexity, Grok) · 3 runs each · 450 captures total · programmatically captured and scored line-by-line over a 21-day window in May 2026. Firm names canonicalised via per-industry alias dictionaries; captures and aliases versioned at /clients/answerability/_{slug}_pagedata.json. The pipeline is deterministic and reproducible by sector. We report observed pattern; the underlying engines are non-stationary.

The answer layer has a temperature

When buyers ask AI to recommend firms, the engines do not always agree. Sometimes they agree almost completely — in our B2B SaaS & Industrial Manufacturing capture, three of six buyer prompts produced a single firm named first by every one of the five engines.¹ Sometimes they barely overlap — in Personal Injury Law, the same six-prompt design produced 80 distinct firms across engines, with only 24% inter-engine overlap (Jaccard).²

That spread is not noise. Across the five sectors we measured, the inter-engine overlap moves predictably with other features — how many firms are named in total, whether any single firm is unanimous, whether each engine returns the same firms across runs. These properties together describe what we call category temperature: the shape of AI agreement and disagreement on a given commercial question.

Frozen and fragmented categories are not just different in score; they are structurally different markets. The work of being named by AI in each is also structurally different. Below we describe the three top-level temperatures, then introduce a sub-distinction inside fragmented — polar versus diffuse — that turns out to matter for the work. Each industry we measured sits somewhere on this frame. The frame is the contribution; the placement is the evidence.

Categories are not equally contestable, and the engines tell you which is which — if you read the disagreement instead of averaging it away.

Three states: frozen, fragmented, molten

Frozen. High cross-engine consensus. The engines name the same handful of firms across prompts and runs. The category is settled in the public record, and the engines reflect that consensus cleanly. B2B SaaS & Industrial Manufacturing is the clearest case we have measured: 0.85 inter-engine overlap, with Procore (construction PM), ServiceTitan (HVAC/plumbing field service), and Lincoln Electric (arc welding) named first by every engine on their respective prompts.¹

Fragmented. Low cross-engine consensus. The engines hold meaningfully different shortlists for the same buyer question. Most of the industries we measured — Commercial Insurance Brokers (0.38), Wealth Management & RIAs (0.32), Personal Injury Law (0.24) — sit here. Inside fragmented, two distinct mechanisms appear; we describe them in the next section.

Molten. No answer has formed yet. The engines do not commit because the public record does not commit. This is rare in established commercial categories; we did not observe a molten sector in this pilot. The state is described for taxonomic completeness — emerging product categories, technologies that have not yet accumulated a corroboration network — and to clarify what fragmented is not. Fragmented categories have answers; they have too many of them, distributed unevenly across engines. Molten categories have none.

Earlier notes (2026-02) used “molten” more broadly to mean “still shifting, not yet settled.” Here we use it more narrowly — categories that have not formed an answer at all — to make room for the structure inside fragmentation described next.

Figure 1 · Three top-level temperatures

Three temperatures by the degree to which AI engines have settled on an answer for a category. The first two are the typical states for established commercial markets; the third is rare.

Fragmented markets split two ways

Two of the industries we measured have similar inter-engine overlap scores — Wealth Management & RIAs at 0.32 and Personal Injury Law at 0.24 — and both sit firmly in the fragmented region. Read the captures more carefully and the two look structurally different.

In wealth management, each AI engine commits clearly to a roster of firms. ChatGPT consistently names trust companies and private banks — Bessemer Trust, Rockefeller Capital, J.P. Morgan Private Bank, Pathstone. The other four engines consistently name independent RIAs — Creative Planning, Mariner Wealth, Cresset, Mercer Advisors. Each engine is internally stable across runs; the engines disagree because they are pulling from different corroboration networks.³ In Personal Injury Law's mass-tort tail, no engine is internally stable on any particular firm; the named mass-tort plaintiff firms rotate widely even within a single engine's three runs.² Same Jaccard region, opposite mechanism.

We propose two axes to describe this:

Cross-engine consensus (high ↔ low) — do all five engines name the same firms?
Within-engine concentration (concentrated ↔ scattered) — does each engine name the same firms across runs of the same prompt?

Figure 2 · The two-axis frame inside fragmentation

Quadrant mapping · industries by mechanism · work required
Quadrant	Industries	Mechanism	Work required
Frozen high consensus · concentrated	B2B SaaS & Industrial Manufacturing	One firm holds the answer across engines and runs; corroboration is dense in the public record.	Get inside the moat, or compete in a different category.
Polar low consensus · concentrated	Wealth Management & RIAs; Commercial Insurance Brokers	Each engine commits to a stable roster; the rosters differ because the engines pull from different corroboration networks (press vs community, brokers vs insurtechs).	Target the specific engines holding rosters that exclude you, via the source-set those engines weight.
Diffuse low consensus · scattered	Personal Injury Law (mass-tort tail)	No source-set is dense enough to anchor any engine; named firms rotate even within a single engine's runs.	Seed corroboration broadly across several source families; marginal evidence has a wider distribution of effect.
Hybrid varies by sub-category	Multi-location Home Services	Different sub-categories sit in different cells — restoration is frozen on national consolidators (SERVPRO, Paul Davis); HVAC and plumbing fragment across franchise networks.	Pick the sub-category, then run the appropriate cell's playbook.

The fourth quadrant — high cross-engine consensus with scattered within-engine rosters — is theoretically possible but absent from this pilot. The table mirrors the diagram so the classification is readable by machines as well as by readers.

Five industries on the spectrum

Plotting the five sectors against the simpler inter-engine-overlap dimension gives the spectrum below. The right pole is where the engines disagree most; the left pole is where they agree most.

Figure 3 · Inter-engine overlap, five industries

Frozen engines agree B2B SaaS & Industrial0.85 Multi-location Home Services0.54 Commercial Insurance Brokers0.38 Wealth Management & RIAs0.32 Personal Injury Law0.24 Fragmented engines diverge

Inter-engine overlap (Jaccard) across six buyer-intent prompts, five engines, three runs each. Higher = more consensus on the named firms.

Read with the two-axis frame, the spectrum tells a slightly richer story than the single number suggests. B2B SaaS & Industrial Manufacturing sits firmly in frozen, with three unanimous winners across all five engines.¹ Multi-location Home Services is the cleanest hybrid: restoration prompts (HMS-02, HMS-06) produce SERVPRO unanimous across all engines; HVAC and plumbing prompts split across franchise networks (Mr. Rooter on three engines, Benjamin Franklin on Perplexity, 1-800-Plumber on Claude).⁴ Commercial Insurance Brokers and Wealth Management both sit in polar — in insurance, ChatGPT routes startup cyber and E&O to global brokers (Marsh, Aon, Lockton) while the other four engines converge on insurtechs (Embroker, Founder Shield);⁵ in wealth, ChatGPT names trust companies and private banks while the other four lean independent-RIA. Personal Injury Law is the most fragmented sector we measured and the only one with a substantial diffuse region in the mass-tort tail, though it also has its own polar element (Morgan & Morgan dominates ChatGPT, Claude, and Gemini; Perplexity replaces it with mass-tort specialty firms).²

What changes the work required

The temperature describes the shape of the market. The temperature also changes what the work is for a firm that wants to be named by AI in its category.

Frozen markets. Engines reflect a dense public consensus around named leaders. The corroboration is in trade press, integration directories, structured documentation, well-known install bases. Two paths: build the corroboration evidence the engines already weight (case studies in the same trade outlets, presence in the same integration directories, schema and authorship the engines parse), or pick a different category to contest. The displacement work is concentration on a specific corroboration network.

Polar markets. Each engine has stable rosters but the rosters differ. The mechanism we observe is source-set divergence: ChatGPT's retrieval surface appears to draw heavily from press and ranking lists (Barron's, Forbes, established trade media); the other engines pull more from community sources (Reddit threads, SmartAsset, kitces.com) and category-specific databases (broker directories, certification registries). One plausible reading is that each engine's retrieval surface is weighted toward different source families, and those families surface different firms. The work for a firm absent from one engine's roster is targeted at the source-set that powers that specific engine. Engine-specific, source-specific.

Diffuse markets. No source-set is dense enough to anchor any engine to a stable answer. We observe diffuse fragmentation most clearly in the long tail of mass-tort plaintiff firms, where each engine's named set rotates across runs. The likely driver is buyer-archetype heterogeneity — the engines name different firms because there is no single canonical buyer-question to anchor against (mass-tort case types span talc, glyphosate, hernia mesh, defective devices). The work is breadth: seed corroboration across several source families, because the marginal evidence has a wider distribution of where it lands.

Hybrid markets. The sub-category determines the work. A multi-location operator may sit in two cells at once (frozen on restoration, polar on HVAC) and need two different programs running in parallel.

We observe these mechanisms in the captures; we do not claim them as exhaustive or causal. The full per-industry build queues — which specific assets, in which order — live in the Diagnostic. The temperature reading tells you which kind of build queue applies.

Limits and next questions

This is a working frame, induced from a finite sample.

Five datapoints. The taxonomy is built from the sectors we have measured to date. Later captures may surface mechanisms we have not yet seen — the empty quadrant in Figure 2 (high cross-engine consensus with scattered within-engine rosters) is theoretically possible and we cannot rule out cases that will land there.

Engines move. A bounded capture window is a snapshot. The placement of a sector can shift across model updates, retrieval index changes, and changes in the underlying web. Monthly re-runs (the Visibility Intelligence subscription artefact) are how we track movement.

Observed, not predictive. We describe the structure the engines currently exhibit. We do not claim a generative model of why they exhibit it. The drivers section above offers candidate mechanisms; they are observations consistent with the data, not laws.

Open questions for later notes: does a category's temperature predict its volatility over time? Do specific source-set interventions move a polar market toward a different equilibrium? What does the inter-prompt structure inside a single industry look like — are some prompts frozen while others fragment? We expect to revisit these as additional sectors and time-series captures land.

References

B2B SaaS & Industrial Manufacturing — AI Answerability Diagnostic. Answerability.ai, Industries. 0.85 inter-engine overlap; three unanimous winners across six buyer-intent prompts. Pilot capture 2026-05-30.
Personal Injury Law Firms — AI Answerability Diagnostic. Answerability.ai, Industries. 0.24 inter-engine overlap; zero unanimous winners; conservative-legal voice throughout, consistent with ABA Model Rule 7.1. Pilot capture 2026-05-30.
Wealth Management & RIAs — AI Answerability Diagnostic. Answerability.ai, Industries. 0.32 inter-engine overlap; ChatGPT systematic outlier (trust companies and private banks) versus the other four engines (independent RIAs). Pilot capture 2026-05-30.
Multi-location Home Services — AI Answerability Diagnostic. Answerability.ai, Industries. 0.54 inter-engine overlap; restoration unanimous (SERVPRO + Paul Davis), HVAC and plumbing fragmented across franchise networks. Pilot capture 2026-05-30.
Commercial Insurance Brokers — AI Answerability Diagnostic. Answerability.ai, Industries. 0.38 inter-engine overlap; ChatGPT routes startup cyber to global brokers; other four engines converge on insurtechs. Pilot capture 2026-05-28.
Citation territory: mapping the AI answer layer (Note 2026-02). Answerability.ai, Insights. Predecessor frame; introduces frozen/molten as a binary.
The Answerability Index. Answerability.ai. Full per-sector capture data and heatmaps.

Cite as

Answerability.ai. (2026). When AI engines disagree: frozen, fragmented, and molten markets (Note 2026-03). https://answerability.ai/insights/category-temperature

@techreport{answerability_2026_03,
  author       = {{Answerability.ai}},
  title        = {{When AI engines disagree: frozen, fragmented, and molten markets}},
  institution  = {Answerability.ai},
  type         = {Working note},
  number       = {Note 2026-03},
  year         = {2026},
  url          = {https://answerability.ai/insights/category-temperature}
}

The AI Answerability Diagnostic measures your sector's temperature for your specific firm — which engines hold which rosters, where the gaps are, and the build queue that would put you inside the consideration set.

Order the Diagnostic

← Back to insights

Note 2026-03 · Published 2026-05-31 · hello@answerability.ai