AI Strategy· Jun 12, 2026 · 4 min read

The First Vendor Evaluation Now Happens Inside a Language Model

A 6,000-prompt audit found a pharma brand ranked fourth in AI answers exactly where its positioning said it should be first. B2B buying now runs through models most sellers cannot see into.

In a 2025–26 audit, GSK tested how AI assistants describe one of its treatments for chronic obstructive pulmonary disease. Across 6,000 prompts covering nine points in a clinician’s decision process, the brand ranked first when the question was broad. When the prompt narrowed to patients starting treatment for the first time—the segment the brand is positioned to own—it dropped to fourth¹.

The mismatch matters because the audience has moved. OpenEvidence, a clinical decision-support assistant, reports daily use by more than 40 percent of U.S. physicians and 20 million queries in January 2026, up from 2.6 million in December 2024². The figures are the company’s own, but the direction matches what researchers are documenting across B2B categories: buyers brief themselves through AI tools before any salesperson knows they exist. IDC forecasts that 62 percent of traditional B2B demand generation will be AI-led by 2028³.

The buyer arrives pre-briefed

B2B go-to-market was built on controlled channels: sales representatives, distribution networks, partner ecosystems, owned media. Awareness traveled through peers and industry relationships, and evaluation was slow because understanding complex offerings took work. The slowness favored incumbents—it gave relationship investments time to pay off.

AI assistants compress that sequence. At IMI, a UK engineering company, executives describe HVAC installers who no longer search Google; they ask ChatGPT or Gemini which product to use and arrive equipped to push the conversation with suppliers⁴. The traffic is invisible to sellers—the researchers call it the dark funnel—and a competitor’s framing can become the category default if a model finds it easier to retrieve and distill.

Machine readability is infrastructure

The fragility of the new channel shows up in mechanical details. The same GSK audit found that every model tested anchored its answers in GOLD, the leading global guideline for the disease, which updates annually. Yet the models consistently cited the 2024 edition. The guideline’s website had changed how it published: files that once opened as embedded PDFs now downloaded on click, and the documents stopped being machine readable.

A formatting decision on one website was propagating outdated clinical guidance at scale. The lesson for sellers is narrower but uncomfortable: visibility in AI answers depends on publishing mechanics—schema markup, open access, page structure—that no executive reviews, because until recently they did not matter. The dynamic extends the one in what gets documented: when a system reads artifacts instead of asking people, the structure of the artifact becomes the message.

What early movers are doing

The researchers propose a four-part response—coordinating the narrative across functions, making content citable, building credibility signals, and monitoring AI answers¹. In practice the moves converge on three habits. Companies are unifying language across marketing, technical documentation, and communications so models retrieve one story instead of five. They are restructuring content for ingestion: product schema, comparison tables, short direct answers to the questions buyers ask models.

And they are running generative listening programs—standing audits of how AI assistants answer the prompts that matter in their category. GSK’s COPD audit is one. IMI runs another, and it surfaced a finding that should unsettle every seller: customers tend to accept the AI answer without probing it.

Three problems the playbook leaves open

Circularity.
A Digitas pilot on B2B payment solutions found that more than 80 percent of the sources the models leaned on came from the brands themselves⁵. When the judge reads mostly the contestants’ own material, visibility measures content operations as much as product quality—an advantage for incumbents with large content teams, whatever their offering deserves.
The treadmill.
Every generative-optimization tactic generalizes. When all vendors in a category structure content for retrieval, the advantage cancels, and the retrieval logic itself belongs to model providers who can change it without notice. The spend looks less like differentiation and more like a new cost of being findable.
Deference.
If buyers accept synthesized answers without probing, an error is no longer a wasted click; it flows into decisions. In regulated categories—medicine, finance, industrial safety—who is accountable when a model misrepresents a product has no settled answer. The researchers raise the prospect of legal crisis in their closing line; the topic deserves more than a line.

• • •

A company’s market position now has a machine-readable layer, and that layer can drift out of alignment with strategy without anyone noticing. GSK positioned its brand to own first-line treatment and found it ranked fourth precisely there; no dashboard the company controlled could have surfaced the gap. The early advantage may belong less to the companies with the best content than to the ones that thought to check what the models are saying about them.

From research published in June 2026 by Amit Joshi, Ivy Buche, and Caroline Schwaer, based on executive interviews and company audits. ↩︎ ↩︎
Company-reported figures, cited in the same research. ↩︎
IDC forecast, cited in the same research. A forecast, not a measurement. ↩︎
Executive interviews with IMI, reported in the same research. ↩︎
Pilot by Digitas UK covering B2B fintech payment solutions in the UK and U.S. markets, reported in the same research. ↩︎

← Workslop Is a Tax Organizations Levy on Themselves Cost-Cutting Has a Floor. Growth Does Not. →