The research is unusually consistent — Ahrefs, Princeton, Semrush, SE Ranking, Vercel and a dozen named practitioners independently land on the same place. That convergence is the signal.
The one-sentence version
AI cites where the web already talks about you (brand mentions), pulls from clean raw HTML (not JS, not schema tricks, not llms.txt), and rewards original data over restatement — so the win is getting mentioned + indexed in Bing/Brave + publishing things only we have, not on-page tinkering.
The most valuable output of this dive is that it kills wasted effort: schema-for-citations, llms.txt, TikTok and Quora are all dead ends. The real levers are cheap and mostly things we haven't done yet.
Each one surfaced independently across multiple studies — so we can act on them with confidence.
Ahrefs (75k brands): branded web mentions correlate with AI visibility at 0.664 vs 0.218 for backlinks. ConvertMate (80M citations), SE Ranking (129k domains) and practitioners Capper, Williams-Cook and Petrovic all agree.
→ The game shifts from "write better pages" to "get mentioned more." This is the practitioner version of the "branded queries +31%" screenshot.
ChatGPT runs on Bing (~87% citation match — Seer / Glenn Gabe). Claude runs on Brave. We've only ever submitted homepages. Submitting every money-page URL is the cheapest way to unlock two whole engines.
→ Still pending for MI, MHQ, PeptideClear, FundBiz, BBL, SEOCompare, Kartapay.
llms.txt are hygiene, NOT citation levers VerifiedThis contradicts the OMG white paper ("schema matters more than backlinks"). The hard evidence says no:
llms.txt 2.0/10; Adobe's log audit: it gets 1.1% of AI-bot traffic; Ahrefs: 97% never read; Google confirms neither is used.→ Keep both as baseline hygiene. Spend zero incremental effort chasing citations through them.
Vercel/MERJ (500M+ fetches): GPTBot, OAI-SearchBot, ClaudeBot and PerplexityBot read raw HTML only — JS-injected content is invisible to them. Our static Astro is a structural advantage here.
→ Risk = anything client-side-rendered: the JS calculators (FitCalcs / datesandtimes / babydata), studio bars, JS-injected schema (the Rochelle failure mode). We should codify a render-gate so it can't regress.
Princeton GEO (causal): adding statistics +32%, quotations +41%, cited sources +28%; keyword-stuffing −9%. Google's info-gain patent literally ranks redundant content lower.
→ Same force as the Discover "combined cards" finding: when 30 publishers cover one story, the commodity version gets absorbed by the AIO and the citation goes to whoever has the unique data. Proprietary numbers get cited and clicked; restatement gets deleted.
AI is a low-volume, high-intent branding channel — not a traffic channel.
AI referral traffic is <1% and largely invisible (no referrer; AI Mode uses noreferrer) — but AI visitors convert ~4–5× organic (Adobe 4.4×; Seer: ChatGPT 15.9%, Claude ~16.8% vs organic 1.76%). Zero-click is now 68% of US Google searches.
→ Budget zero AI referral traffic. Measure branded search volume + citations (Bing Webmaster's free "AI Performance" grounding-queries report), not clicks. This reinforces our lead-gen model — it's lead quality, not volume.
Deduped across all 10 research streams, tiered by effort and decision-needed.
submit_brave.py sibling to the existing submit_indexnow.py. Biggest unclaimed lever — unlocks ChatGPT (Bing) + Claude (Brave).llms.txt / schema-for-citations effort — redirect that energy into mentions + freshness.regression-scan.mjs (flag JS-injected content/schema).fleet_bot_hits into a citation detector — we already log ASN; add bot verification (catch the ~5–8% spoofs via published IP-range JSONs) and a "we were just cited" alert on ChatGPT-User / Perplexity-User / Claude-User real-time fetches. Earliest possible signal we're in an answer.llms.txt · schema-as-citation-lever · Quora/StackExchange seeding · dedicated TikTok · Perplexity Pages as owned content · pay-per-crawl / HTTP 402 (actively harmful for us).
Each engine sources differently — only ~10–14% domain overlap. There's no single play.
| Engine | Index it uses | The lever |
|---|---|---|
| ChatGPT | Bing + OpenAI cache | Be indexed/submitted in Bing (87% match). Reddit + Wikipedia + review sites. |
| Claude | Brave ("AI Grounding") | Submit money-page URLs to Brave; row-level tables; clean HTML. |
| Gemini / AIO | Google + Shopping Graph (renders JS) | YouTube, entity/Knowledge-Graph signals, query fan-out coverage. |
| Perplexity | Own crawler + Bing/Google blend | Strongest recency bias — freshness + Reddit + listicles. |
| Copilot | Bing | Same as ChatGPT; under-cites Reddit, over-cites authoritative/structured pages. |
AI Mode decomposes one query into ~8–12 parallel sub-queries (NoGood) and synthesises the answer. Google publishes no count. Optimise for the sub-queries via broad pillar/cluster coverage. Tool: Qforia (iPullRank). Semrush: targeting 10–20 fan-out queries per article lifted citations ~150%.
Retrieval scores each chunk by cosine similarity in isolation. Sweet spot ~256 tokens (128–512, 10–20% overlap, heading-aligned). Write self-contained chunks with no anaphora ("as noted above") — that lets a passage rank for queries not literally on the page. ~85% of retrieved pages are never cited (the "considered vs cited" funnel).
Confirmed real signals: NavBoost (clicks, "last longest click"), siteAuthority (feeds Q*), twiddlers (post-ranking re-rank), hostAge (sandbox for fresh spam), three freshness dates, Chrome data. Whether these feed AI Mode selection is a plausible hypothesis, not confirmed — AI selection likely rides the same core stack + entities + cross-source consensus.
The highest non-content ROI for Gemini/AIO is making our brands machine-recognised entities. The opening: Wikidata's notability bar is low — any Companies-House / Crunchbase-registered brand qualifies (Criterion 2: "serious public references"). A Wikidata item feeds Google's Knowledge Graph (property P2671) → Gemini / AIO / Perplexity cards, and goes live in a day. Low risk.
sameAs.@id + full sameAs (Wikidata, Companies House, LinkedIn, Crunchbase, G2).| Signal | Real? | What it means for us |
|---|---|---|
| Brave powers Claude | Yes | Brave = Claude's index. Submit money pages; use row-level tables; clean HTML; caption videos. |
| HTTP 402 / pay-per-crawl (AWS, Cloudflare) | Yes | Not for us — we're citation-seekers, not a publisher. Defensive only: make sure no CF zone blocks AI bots. |
| CMA forces Google transparency | Softened | UK-only. Fair ranking (incl. AIO) + notice of significant changes + complaints process, live ~Dec 2026. Bank ranking-drop evidence; keep opted in to AIO. |
| Discover: social + AIO + combined cards | Trend yes | Commodity content gets summarised away. Differentiated angle + original data + YouTube/X = the defence. (Exact %s are Reach's proprietary figures.) |
| AI-traffic tracking is broken | Yes | Stop counting AI clicks. Track branded search + ChatGPT-User log hits + GSC AI-Mode filter. |
| Model-agnostic ops | n/a | Ops, not SEO. Our fleet is already prompt/CLAUDE.md-portable. |
| GBP agent | n/a | We don't run GBP-listing brands. |