AI Search Intelligence — Deep Dive 24 Jun 2026

TL;DR

The research is unusually consistent — Ahrefs, Princeton, Semrush, SE Ranking, Vercel and a dozen named practitioners independently land on the same place. That convergence is the signal.

The one-sentence version

AI cites where the web already talks about you (brand mentions), pulls from clean raw HTML (not JS, not schema tricks, not llms.txt), and rewards original data over restatement — so the win is getting mentioned + indexed in Bing/Brave + publishing things only we have, not on-page tinkering.

The most valuable output of this dive is that it kills wasted effort: schema-for-citations, llms.txt, TikTok and Quora are all dead ends. The real levers are cheap and mostly things we haven't done yet.

The 5 highest-confidence conclusions

Each one surfaced independently across multiple studies — so we can act on them with confidence.

1 · Off-site brand mentions > backlinks > on-page schema Verified

Ahrefs (75k brands): branded web mentions correlate with AI visibility at 0.664 vs 0.218 for backlinks. ConvertMate (80M citations), SE Ranking (129k domains) and practitioners Capper, Williams-Cook and Petrovic all agree.

→ The game shifts from "write better pages" to "get mentioned more." This is the practitioner version of the "branded queries +31%" screenshot.

2 · Our biggest concrete gap: money-page indexing in Bing + Brave Verified

ChatGPT runs on Bing (~87% citation match — Seer / Glenn Gabe). Claude runs on Brave. We've only ever submitted homepages. Submitting every money-page URL is the cheapest way to unlock two whole engines.

→ Still pending for MI, MHQ, PeptideClear, FundBiz, BBL, SEOCompare, Kartapay.

3 · Schema & `llms.txt` are hygiene, NOT citation levers Verified

This contradicts the OMG white paper ("schema matters more than backlinks"). The hard evidence says no:

Ahrefs 1,885-page test: adding JSON-LD gave no lift (AIO −4.6%).
Williams-Cook deliberately broke his schema — LLMs read the HTML anyway ("a placebo").
Cyrus Shepard scored llms.txt 2.0/10; Adobe's log audit: it gets 1.1% of AI-bot traffic; Ahrefs: 97% never read; Google confirms neither is used.

→ Keep both as baseline hygiene. Spend zero incremental effort chasing citations through them.

4 · Raw HTML wins — AI crawlers don't run JavaScript Verified

Vercel/MERJ (500M+ fetches): GPTBot, OAI-SearchBot, ClaudeBot and PerplexityBot read raw HTML only — JS-injected content is invisible to them. Our static Astro is a structural advantage here.

→ Risk = anything client-side-rendered: the JS calculators (FitCalcs / datesandtimes / babydata), studio bars, JS-injected schema (the Rochelle failure mode). We should codify a render-gate so it can't regress.

5 · Information gain — original data is the moat Verified

Princeton GEO (causal): adding statistics +32%, quotations +41%, cited sources +28%; keyword-stuffing −9%. Google's info-gain patent literally ranks redundant content lower.

→ Same force as the Discover "combined cards" finding: when 30 publishers cover one story, the commodity version gets absorbed by the AIO and the citation goes to whoever has the unique data. Proprietary numbers get cited and clicked; restatement gets deleted.

The strategic frame

AI is a low-volume, high-intent branding channel — not a traffic channel.

AI referral traffic is <1% and largely invisible (no referrer; AI Mode uses noreferrer) — but AI visitors convert ~4–5× organic (Adobe 4.4×; Seer: ChatGPT 15.9%, Claude ~16.8% vs organic 1.76%). Zero-click is now 68% of US Google searches.

→ Budget zero AI referral traffic. Measure branded search volume + citations (Bing Webmaster's free "AI Performance" grounding-queries report), not clicks. This reinforces our lead-gen model — it's lead quality, not volume.

What we need to do

Deduped across all 10 research streams, tiered by effort and decision-needed.

QUICK WINS No decision — start now

Submit money-page URLs (not just homepages) to Bing + Brave. Build a submit_brave.py sibling to the existing submit_indexnow.py. Biggest unclaimed lever — unlocks ChatGPT (Bing) + Claude (Brave).
Enrol every domain in Bing Webmaster Tools "AI Performance" (public preview since Feb 2026). It reports Grounding Queries — the literal phrases Copilot/ChatGPT use to cite us — plus Citation Share vs competitors. Free GSC-for-AI. Pipe into the weekly question-monitor.
Audit every Cloudflare zone for accidental AI-bot blocking. The footgun: CF now blocks AI bots by default on new domains and delisted Perplexity (Aug 2025). If any zone has Bot Fight Mode / managed AI rules on, we're paying to be invisible. Add a FLEET_REALITY check that curls each zone as PerplexityBot / OAI-SearchBot / GPTBot and asserts a 200.
Stop all llms.txt / schema-for-citations effort — redirect that energy into mentions + freshness.

BUILD Gates + assets (fold into fleet-tools & /build-site)

Upgrade the gates we already run with the validated findings: passage-shape gate → require a question-H2 + crisp first-sentence answer in 120–180 words (both Indig and SALT studies agree); add an info-gain prewrite check (every money page must declare ≥1 net-new asset — own data, stat, test, or contrarian take); add a raw-HTML render gate to regression-scan.mjs (flag JS-injected content/schema).
Fan-out coverage on pillar/cluster hubs — answer the 8–12 decomposed sub-queries AI Mode generates, not just the head term (Semrush: +150% citations). Our cluster IA already does half of this.
Original data assets, quarterly, per site — MI rate tracker, peptide paper index, merchant-fee benchmark. The info-gain moat. Publish the data ones to GitHub + HuggingFace with citation metadata so they become grounding sources.
On-site transcript pages for the daily YouTube videos we already make — captures both surfaces: YouTube is the #1 AIO/Gemini source (incl. ~23% in finance), and the text transcript captures ChatGPT/Perplexity, which video alone does not. Add VideoObject + Clip schema, chapter by question.
Brand-mention / entity layer (the #1 lever): Wikidata items for our Companies-House brands + entity-home About pages with sameAs clusters + founder Person entities for the YMYL/finance authors; plus digital PR off our own data, HARO answers, review-site profiles (Trustpilot/G2/Clutch per vertical), and genuine Reddit participation.
Turn fleet_bot_hits into a citation detector — we already log ASN; add bot verification (catch the ~5–8% spoofs via published IP-range JSONs) and a "we were just cited" alert on ChatGPT-User / Perplexity-User / Claude-User real-time fetches. Earliest possible signal we're in an answer.

WATCH / DEFENSIVE

CMA fair-ranking complaints process (~Dec 2026, UK-only) — keep banking dated ranking-drop evidence; keep content opted in to AIO.
Harden lead forms for agent-fill — agent browsers (Comet, ChatGPT Atlas) will fill our forms, and submissions look like bot fraud. Keep a capture-first step 1; add a "likely-agent" lead class; don't auto-bin.
Agentic commerce (ACP / AP2 / Perplexity Buy) — all US-only, retail-only, no UK date. Monitor, don't build.

DROP

llms.txt · schema-as-citation-lever · Quora/StackExchange seeding · dedicated TikTok · Perplexity Pages as owned content · pay-per-crawl / HTTP 402 (actively harmful for us).

Engine backends & platform levers

Each engine sources differently — only ~10–14% domain overlap. There's no single play.

Engine	Index it uses	The lever
ChatGPT	Bing + OpenAI cache	Be indexed/submitted in Bing (87% match). Reddit + Wikipedia + review sites.
Claude	Brave ("AI Grounding")	Submit money-page URLs to Brave; row-level tables; clean HTML.
Gemini / AIO	Google + Shopping Graph (renders JS)	YouTube, entity/Knowledge-Graph signals, query fan-out coverage.
Perplexity	Own crawler + Bing/Google blend	Strongest recency bias — freshness + Reddit + listicles.
Copilot	Bing	Same as ChatGPT; under-cites Reddit, over-cites authoritative/structured pages.

Where to earn mentions (off-domain)

Reddit — #1 cross-LLM cited domain (Google's $60M deal; ~49% of AIOs). Wins on exact query-intent + real experience, not upvotes. Volatile + ban-happy → genuine participation only, never link-drop.
Review sites (barnacle) — #2 source type (14%); being on ≥2 platforms = 3.4× cite likelihood. B2B software → G2/Capterra/GetApp; consultancy → Clutch (66–85%); consumer → Trustpilot. Barnacle (being mentioned) is safe; parasite (hosting your pages there) is penalised.
YouTube — #1 AIO source (~23%, incl. finance). Chapters become separately citable. But video ≈ 0% in text chatbots → you need the on-site transcript text for ChatGPT/Perplexity.
Skip: Quora & StackExchange (shrinking, barely cited outside Google).

The technical layer

Query fan-out Verified mechanism

AI Mode decomposes one query into ~8–12 parallel sub-queries (NoGood) and synthesises the answer. Google publishes no count. Optimise for the sub-queries via broad pillar/cluster coverage. Tool: Qforia (iPullRank). Semrush: targeting 10–20 fan-out queries per article lifted citations ~150%.

Embeddings & chunks

Retrieval scores each chunk by cosine similarity in isolation. Sweet spot ~256 tokens (128–512, 10–20% overlap, heading-aligned). Write self-contained chunks with no anaphora ("as noted above") — that lets a passage rank for queries not literally on the page. ~85% of retrieved pages are never cited (the "considered vs cited" funnel).

On-page structure Verified

Indig (18k citations): first 30% of page = 44.2% of ChatGPT citations; named-entity density in cited content ~20.6% vs 5–8% normal (3–4×); 78.4% of question-citations come from headings.
SALT (AI Mode): pixel-depth doesn't matter, but a subheading immediately followed by its answer does.
Reconciled rule: question-H2 + crisp first sentence, 120–180 words, entity-rich, with a cited stat.

The 2024 Google API leak

Confirmed real signals: NavBoost (clicks, "last longest click"), siteAuthority (feeds Q*), twiddlers (post-ranking re-rank), hostAge (sandbox for fresh spam), three freshness dates, Chrome data. Whether these feed AI Mode selection is a plausible hypothesis, not confirmed — AI selection likely rides the same core stack + entities + cross-source consensus.

Entity establishment (the new lever we'd missed)

The highest non-content ROI for Gemini/AIO is making our brands machine-recognised entities. The opening: Wikidata's notability bar is low — any Companies-House / Crunchbase-registered brand qualifies (Criterion 2: "serious public references"). A Wikidata item feeds Google's Knowledge Graph (property P2671) → Gemini / AIO / Perplexity cards, and goes live in a day. Low risk.

Wikidata items for the finance/health brands first (MI, ltdturnaround, Finterra, PeptideClear) → put the Q-id into each site's sameAs.
Entity-home About page per site: Organization JSON-LD with a stable @id + full sameAs (Wikidata, Companies House, LinkedIn, Crunchbase, G2).
Founder Person entities — Adam / Jimmy / Oliver are currently bylines with no entity backing; this is the E-E-A-T lever Google weights hardest for YMYL/finance.
Wikipedia: hold. Bar too high (3+ independent feature articles); premature/paid attempts risk bans. Wikidata delivers ~80% of the benefit at ~5% of the risk. Wikipedia is itself the single most-cited AI domain (26–48% of ChatGPT citations) — worth earning later, not faking now.

The 7 LinkedIn signals — verdicts

Signal	Real?	What it means for us
Brave powers Claude	Yes	Brave = Claude's index. Submit money pages; use row-level tables; clean HTML; caption videos.
HTTP 402 / pay-per-crawl (AWS, Cloudflare)	Yes	Not for us — we're citation-seekers, not a publisher. Defensive only: make sure no CF zone blocks AI bots.
CMA forces Google transparency	Softened	UK-only. Fair ranking (incl. AIO) + notice of significant changes + complaints process, live ~Dec 2026. Bank ranking-drop evidence; keep opted in to AIO.
Discover: social + AIO + combined cards	Trend yes	Commodity content gets summarised away. Differentiated angle + original data + YouTube/X = the defence. (Exact %s are Reach's proprietary figures.)
AI-traffic tracking is broken	Yes	Stop counting AI clicks. Track branded search + ChatGPT-User log hits + GSC AI-Mode filter.
Model-agnostic ops	n/a	Ops, not SEO. Our fleet is already prompt/CLAUDE.md-portable.
GBP agent	n/a	We don't run GBP-listing brands.

Myths to stop repeating

"Schema / llms.txt drive AI citations." The OMG white paper leans on this; every controlled study says no.

"Google must warn before every update / pay publishers." CMA reality: notice of significant changes only, UK-only, opt-out + attribution, no payment scheme.

"TikTok is replacing Google for Gen Z." Reversed into 2026 — the preference actually halved; ChatGPT is now the bigger challenger.

"AI Mode runs ~20 sub-searches (per Tom Capper)." Unverified conflation; Google publishes no number, real range ~8–12.