AI Alignment Jun 11, 2026 5 min read

Discoverability: getting your brand into the sources AI reads

AI can't recommend a brand it never sees. Discoverability is layer one of AI Alignment — making sure models can crawl you, find you in the sources they trust, and read you in machine-friendly ways.

Felix Norton

Co-founder & CEO

Everything in AI Alignment starts here, because nothing else is possible until it’s true: a model can’t mention a brand it can’t find. Discoverability is the foundation layer — being present, crawlable, and readable in the places AI looks before it writes an answer.

Most “we’re invisible in ChatGPT” problems are, at root, discoverability problems. Before you spend a cent on clever content, it’s worth understanding how a model finds anything at all.

How AI actually finds things

There’s a myth worth killing first: that AI answers come purely from what the model “memorised” during training. Sometimes they do. But increasingly, modern AI search retrieves before it answers — a pattern researchers call retrieval-augmented generation, or RAG¹.

How AI retrieves and cites: a question triggers a search of the web and knowledge bases; relevant sources are retrieved; the model generates an answer grounded in them and cites them.

In plain terms, three things happen when a buyer asks a question:

Retrieval. The system interprets the question and fetches relevant content — from a live web search, an index, and the model’s training data.
Augmentation. It assembles the best of what it found into context for the model.
Generation. The model writes an answer grounded in that retrieved material — and, on tools like Perplexity and AI Overviews, cites it.

This is great news, because it means discoverability is winnable. You don’t have to wait for the next training run to be “remembered.” If your brand is present in the sources a model retrieves at answer-time, you can show up today. If it isn’t, you won’t — no matter how good your product is.

So discoverability has two halves: can AI read your own site, and is your brand present in the third-party sources AI trusts.

Half one: can AI read your site?

If AI crawlers can’t fetch your pages, you’ve opted out of the retrieved web. This breaks in a few common, fixable ways.

You’re blocking AI bots — maybe by accident. AI companies crawl with named user agents: OpenAI uses GPTBot and OAI-SearchBot, Anthropic ClaudeBot, Perplexity PerplexityBot, Google Google-Extended, and so on. Many sites block some of these in robots.txt — sometimes deliberately, often via a plugin or default someone enabled and forgot. Worse, infrastructure can block them for you: Cloudflare began blocking AI crawlers by default, which means a brand can be invisible to AI without anyone on the team ever making that choice. Check it.

Your content is locked behind JavaScript or logins. If the substance of a page only appears after heavy client-side rendering, or sits behind a gate, retrieval often misses it. The facts that matter about you should be in plain, server-rendered HTML.

Your important facts aren’t on crawlable pages at all. Pricing trapped in an image, product details only inside a PDF or a video, key claims living solely on a third-party platform — all of it is hard to retrieve.

A quick self-check from the command line tells you whether a bot can even fetch your homepage:

curl -A "GPTBot" -I https://yourdomain.com/
curl -A "GPTBot" https://yourdomain.com/robots.txt

If you get a 403, a redirect to a challenge page, or a Disallow covering the bot, that’s your first fix — and it’s usually a same-day one.

Make yourself easy to parse. Beyond “not blocked,” you want “easy to read.” Clean HTML, real headings, descriptive link text, and increasingly an llms.txt file — a simple, plain-language map of your most important content for AI consumers. It’s an emerging convention, not a magic switch, but it’s low-cost and signals the right things.

Half two: be present in the sources AI trusts

Here’s the part that surprises traditional SEOs: your own website is not the main event. When Ahrefs studied 75,000 brands, the factors most correlated with AI visibility were about presence across the web — brand web mentions, mentions on YouTube, brand search demand — far more than on-site metrics like Domain Rating or page count. As they put it, AI visibility isn’t just about your website; it’s about how widely your brand shows up².

And models lean heavily on a predictable set of third-party sources. Across tens of millions of AI citations, the most-cited domains skew toward Reddit, YouTube, Wikipedia, and LinkedIn³, with review sites like G2 and Yelp showing up heavily on recommendation queries. The exact mix varies by platform and shifts over time — but the lesson is stable: the answer about you is often assembled from places you don’t own.

That means discoverability work includes:

Being on the platforms models cite. A real presence (not a ghost profile) on the review sites, communities, and directories relevant to your category.
Being in the comparison content. “Best X” listicles, roundups, and alternatives pages — the raw material for category and comparison answers.
Being talked about, not just present. A mention in a discussion thread or an industry article is a retrievable signal; a static profile nobody links to barely is.

This is the bridge to the Authority layer, where we go deep on earning those mentions. Discoverability is making sure you’re present and readable; authority is making sure you’re present enough, in credible enough places, to get chosen.

Your Discoverability checklist

Work top to bottom — the cheap technical fixes first, then the ongoing presence work.

Discoverability checklist

AI crawlers (GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended) are allowed in robots.txt.
No infrastructure-level block (Cloudflare / WAF / bot manager) is silently stopping AI bots.
Core facts (what you do, pricing, key features) are in server-rendered HTML, not images, PDFs, or JS-only.
Clean semantic structure: real <h1>/<h2>s, descriptive link text, sensible page titles.
An llms.txt (and a current sitemap.xml) is published.
You have a genuine, accurate presence on the review sites, communities, and directories for your category.
You appear in at least some “best [category]” and “[competitor] alternatives” content.
You know which domains AI cites for your category (from your audit) and you’re present on them.

Further reading: ⁴

Sources

arXiv — A survey on retrieval-augmented generation (RAG) ↩
Ahrefs — Top brand visibility factors (75k brands) ↩
Search Engine Land — AI search engines cite Reddit, YouTube, and LinkedIn most (study) ↩
Ahrefs — The 10 most-cited domains across 78.6M searches ↩

Tags AI Alignment GEO Discoverability

Keep reading

All articles →

AI Alignment

The AI Alignment Framework: get AI to see and recommend your brand

AI answers are the new shop window. The AI Alignment Framework is a four-layer system — Discoverability, Clarity, Authority, Trust — for getting models to find you, understand you, cite you, and recommend you.

Jun 14, 2026

AI Alignment

Buyers ask AI now: why brand visibility moved from pages to mentions

Search didn't die — it moved. Buyers ask ChatGPT and Perplexity instead of scrolling ten blue links. Here's the data behind the shift, and how to get your leadership to take it seriously.

Jun 13, 2026

Discoverability: getting your brand into the sources AI reads

How AI actually finds things#

Half one: can AI read your site?#

Half two: be present in the sources AI trusts#

Your Discoverability checklist#

Sources#

Keep reading

The AI Alignment Framework: get AI to see and recommend your brand

Buyers ask AI now: why brand visibility moved from pages to mentions

How AI actually finds things

Half one: can AI read your site?

Half two: be present in the sources AI trusts

Your Discoverability checklist

Sources