Fleet Intelligence · Internal

How AI Is Rewriting Search — and what we do about it

Deep dive · 24 June 2026 · 10-agent web-verified research fan-out, triggered by Adam's LinkedIn-signal scan + the OMG "How AI Is Rewriting Search for 2026" white paper. Every claim cross-checked against primary sources.

TL;DR

The research is unusually consistent — Ahrefs, Princeton, Semrush, SE Ranking, Vercel and a dozen named practitioners independently land on the same place. That convergence is the signal.

The one-sentence version

AI cites where the web already talks about you (brand mentions), pulls from clean raw HTML (not JS, not schema tricks, not llms.txt), and rewards original data over restatement — so the win is getting mentioned + indexed in Bing/Brave + publishing things only we have, not on-page tinkering.

The most valuable output of this dive is that it kills wasted effort: schema-for-citations, llms.txt, TikTok and Quora are all dead ends. The real levers are cheap and mostly things we haven't done yet.

The 5 highest-confidence conclusions

Each one surfaced independently across multiple studies — so we can act on them with confidence.

1 · Off-site brand mentions > backlinks > on-page schema Verified

Ahrefs (75k brands): branded web mentions correlate with AI visibility at 0.664 vs 0.218 for backlinks. ConvertMate (80M citations), SE Ranking (129k domains) and practitioners Capper, Williams-Cook and Petrovic all agree.

→ The game shifts from "write better pages" to "get mentioned more." This is the practitioner version of the "branded queries +31%" screenshot.

2 · Our biggest concrete gap: money-page indexing in Bing + Brave Verified

ChatGPT runs on Bing (~87% citation match — Seer / Glenn Gabe). Claude runs on Brave. We've only ever submitted homepages. Submitting every money-page URL is the cheapest way to unlock two whole engines.

→ Still pending for MI, MHQ, PeptideClear, FundBiz, BBL, SEOCompare, Kartapay.

3 · Schema & llms.txt are hygiene, NOT citation levers Verified

This contradicts the OMG white paper ("schema matters more than backlinks"). The hard evidence says no:

  • Ahrefs 1,885-page test: adding JSON-LD gave no lift (AIO −4.6%).
  • Williams-Cook deliberately broke his schema — LLMs read the HTML anyway ("a placebo").
  • Cyrus Shepard scored llms.txt 2.0/10; Adobe's log audit: it gets 1.1% of AI-bot traffic; Ahrefs: 97% never read; Google confirms neither is used.

→ Keep both as baseline hygiene. Spend zero incremental effort chasing citations through them.

4 · Raw HTML wins — AI crawlers don't run JavaScript Verified

Vercel/MERJ (500M+ fetches): GPTBot, OAI-SearchBot, ClaudeBot and PerplexityBot read raw HTML only — JS-injected content is invisible to them. Our static Astro is a structural advantage here.

→ Risk = anything client-side-rendered: the JS calculators (FitCalcs / datesandtimes / babydata), studio bars, JS-injected schema (the Rochelle failure mode). We should codify a render-gate so it can't regress.

5 · Information gain — original data is the moat Verified

Princeton GEO (causal): adding statistics +32%, quotations +41%, cited sources +28%; keyword-stuffing −9%. Google's info-gain patent literally ranks redundant content lower.

→ Same force as the Discover "combined cards" finding: when 30 publishers cover one story, the commodity version gets absorbed by the AIO and the citation goes to whoever has the unique data. Proprietary numbers get cited and clicked; restatement gets deleted.

The strategic frame

AI is a low-volume, high-intent branding channel — not a traffic channel.

AI referral traffic is <1% and largely invisible (no referrer; AI Mode uses noreferrer) — but AI visitors convert ~4–5× organic (Adobe 4.4×; Seer: ChatGPT 15.9%, Claude ~16.8% vs organic 1.76%). Zero-click is now 68% of US Google searches.

→ Budget zero AI referral traffic. Measure branded search volume + citations (Bing Webmaster's free "AI Performance" grounding-queries report), not clicks. This reinforces our lead-gen model — it's lead quality, not volume.

What we need to do

Deduped across all 10 research streams, tiered by effort and decision-needed.

QUICK WINS  No decision — start now

  1. Submit money-page URLs (not just homepages) to Bing + Brave. Build a submit_brave.py sibling to the existing submit_indexnow.py. Biggest unclaimed lever — unlocks ChatGPT (Bing) + Claude (Brave).
  2. Enrol every domain in Bing Webmaster Tools "AI Performance" (public preview since Feb 2026). It reports Grounding Queries — the literal phrases Copilot/ChatGPT use to cite us — plus Citation Share vs competitors. Free GSC-for-AI. Pipe into the weekly question-monitor.
  3. Audit every Cloudflare zone for accidental AI-bot blocking. The footgun: CF now blocks AI bots by default on new domains and delisted Perplexity (Aug 2025). If any zone has Bot Fight Mode / managed AI rules on, we're paying to be invisible. Add a FLEET_REALITY check that curls each zone as PerplexityBot / OAI-SearchBot / GPTBot and asserts a 200.
  4. Stop all llms.txt / schema-for-citations effort — redirect that energy into mentions + freshness.

BUILD  Gates + assets (fold into fleet-tools & /build-site)

  1. Upgrade the gates we already run with the validated findings: passage-shape gate → require a question-H2 + crisp first-sentence answer in 120–180 words (both Indig and SALT studies agree); add an info-gain prewrite check (every money page must declare ≥1 net-new asset — own data, stat, test, or contrarian take); add a raw-HTML render gate to regression-scan.mjs (flag JS-injected content/schema).
  2. Fan-out coverage on pillar/cluster hubs — answer the 8–12 decomposed sub-queries AI Mode generates, not just the head term (Semrush: +150% citations). Our cluster IA already does half of this.
  3. Original data assets, quarterly, per site — MI rate tracker, peptide paper index, merchant-fee benchmark. The info-gain moat. Publish the data ones to GitHub + HuggingFace with citation metadata so they become grounding sources.
  4. On-site transcript pages for the daily YouTube videos we already make — captures both surfaces: YouTube is the #1 AIO/Gemini source (incl. ~23% in finance), and the text transcript captures ChatGPT/Perplexity, which video alone does not. Add VideoObject + Clip schema, chapter by question.
  5. Brand-mention / entity layer (the #1 lever): Wikidata items for our Companies-House brands + entity-home About pages with sameAs clusters + founder Person entities for the YMYL/finance authors; plus digital PR off our own data, HARO answers, review-site profiles (Trustpilot/G2/Clutch per vertical), and genuine Reddit participation.
  6. Turn fleet_bot_hits into a citation detector — we already log ASN; add bot verification (catch the ~5–8% spoofs via published IP-range JSONs) and a "we were just cited" alert on ChatGPT-User / Perplexity-User / Claude-User real-time fetches. Earliest possible signal we're in an answer.

WATCH / DEFENSIVE

DROP

llms.txt · schema-as-citation-lever · Quora/StackExchange seeding · dedicated TikTok · Perplexity Pages as owned content · pay-per-crawl / HTTP 402 (actively harmful for us).

Engine backends & platform levers

Each engine sources differently — only ~10–14% domain overlap. There's no single play.

EngineIndex it usesThe lever
ChatGPTBing + OpenAI cacheBe indexed/submitted in Bing (87% match). Reddit + Wikipedia + review sites.
ClaudeBrave ("AI Grounding")Submit money-page URLs to Brave; row-level tables; clean HTML.
Gemini / AIOGoogle + Shopping Graph (renders JS)YouTube, entity/Knowledge-Graph signals, query fan-out coverage.
PerplexityOwn crawler + Bing/Google blendStrongest recency bias — freshness + Reddit + listicles.
CopilotBingSame as ChatGPT; under-cites Reddit, over-cites authoritative/structured pages.

Where to earn mentions (off-domain)

The technical layer

Query fan-out Verified mechanism

AI Mode decomposes one query into ~8–12 parallel sub-queries (NoGood) and synthesises the answer. Google publishes no count. Optimise for the sub-queries via broad pillar/cluster coverage. Tool: Qforia (iPullRank). Semrush: targeting 10–20 fan-out queries per article lifted citations ~150%.

Embeddings & chunks

Retrieval scores each chunk by cosine similarity in isolation. Sweet spot ~256 tokens (128–512, 10–20% overlap, heading-aligned). Write self-contained chunks with no anaphora ("as noted above") — that lets a passage rank for queries not literally on the page. ~85% of retrieved pages are never cited (the "considered vs cited" funnel).

On-page structure Verified

The 2024 Google API leak

Confirmed real signals: NavBoost (clicks, "last longest click"), siteAuthority (feeds Q*), twiddlers (post-ranking re-rank), hostAge (sandbox for fresh spam), three freshness dates, Chrome data. Whether these feed AI Mode selection is a plausible hypothesis, not confirmed — AI selection likely rides the same core stack + entities + cross-source consensus.

Entity establishment (the new lever we'd missed)

The highest non-content ROI for Gemini/AIO is making our brands machine-recognised entities. The opening: Wikidata's notability bar is low — any Companies-House / Crunchbase-registered brand qualifies (Criterion 2: "serious public references"). A Wikidata item feeds Google's Knowledge Graph (property P2671) → Gemini / AIO / Perplexity cards, and goes live in a day. Low risk.

The 7 LinkedIn signals — verdicts

SignalReal?What it means for us
Brave powers ClaudeYesBrave = Claude's index. Submit money pages; use row-level tables; clean HTML; caption videos.
HTTP 402 / pay-per-crawl (AWS, Cloudflare)YesNot for us — we're citation-seekers, not a publisher. Defensive only: make sure no CF zone blocks AI bots.
CMA forces Google transparencySoftenedUK-only. Fair ranking (incl. AIO) + notice of significant changes + complaints process, live ~Dec 2026. Bank ranking-drop evidence; keep opted in to AIO.
Discover: social + AIO + combined cardsTrend yesCommodity content gets summarised away. Differentiated angle + original data + YouTube/X = the defence. (Exact %s are Reach's proprietary figures.)
AI-traffic tracking is brokenYesStop counting AI clicks. Track branded search + ChatGPT-User log hits + GSC AI-Mode filter.
Model-agnostic opsn/aOps, not SEO. Our fleet is already prompt/CLAUDE.md-portable.
GBP agentn/aWe don't run GBP-listing brands.

Myths to stop repeating

"Schema / llms.txt drive AI citations." The OMG white paper leans on this; every controlled study says no.
"Google must warn before every update / pay publishers." CMA reality: notice of significant changes only, UK-only, opt-out + attribution, no payment scheme.
"TikTok is replacing Google for Gen Z." Reversed into 2026 — the preference actually halved; ChatGPT is now the bigger challenger.
"AI Mode runs ~20 sub-searches (per Tom Capper)." Unverified conflation; Google publishes no number, real range ~8–12.