Back to all writing
AI Alignment Jun 12, 2026 6 min read

How to run an AI visibility audit (and read the results)

You can't fix what you can't see. This is a step-by-step AI visibility audit — pick your models, build a prompt set, score yourself across the four layers, and turn the results into priorities. Free template included.

A clipboard-style audit scorecard checking a brand's presence across ChatGPT, Google AI Overviews, and Perplexity.

Every AI Alignment project should start the same way: by finding out where you actually stand. Not where you think you stand — where you actually stand, in the answers your buyers are reading right now.

This is the audit. It takes a focused afternoon to do by hand, and it gives you a baseline across all four layers of the framework — Discoverability, Clarity, Authority, Trust. Do it once before you change anything, and you’ll have a “before” you can measure against. Skip it, and you’re optimising blind.

Here’s the process, end to end, plus a template you can copy.

Step 1 — Pick the models that matter to you

You don’t need to track every AI. Start with the surfaces your buyers actually use. For most B2B and consumer brands in 2026 that’s:

  • ChatGPT — by far the largest, and the default for most buyers.
  • Google AI Overviews — because it sits on top of the searches people still do on Google.
  • Perplexity — smaller, but research- and citation-heavy, and over-indexed among power users.

If you’re enterprise or technical, add Gemini, Claude, and Grok. But don’t let “track everything” stop you from starting. Three models, done consistently, beats eight models done once.

Step 2 — Build your prompt set (the new keyword research)

In AI search, the question is the new keyword. Your prompt set is the list of questions a real buyer would ask on the way to choosing something in your category. This is the most important — and most skipped — step. A weak prompt set gives you a meaningless audit.

Build 15–30 prompts across the buyer journey:

Starter prompt set
  • Category / discovery: “What’s the best [category] for [segment]?” · “Top tools for [job to be done]?”
  • Comparison: “[You] vs [Competitor]” · “Alternatives to [Competitor]”
  • Use-case / fit: “Best [category] for [specific situation, e.g. a 5-person agency]”
  • Brand-specific: “What does [your brand] do?” · “Is [your brand] any good?” · “How much does [your brand] cost?”
  • Objection / risk: “Is [your brand] safe / legit?” · “[Your brand] reviews” · “Problems with [your brand]”

Organise them with tags — persona, funnel stage, and which layer they test. (Brand-specific prompts mostly test Clarity and Trust; category and comparison prompts test Discoverability and Authority.)

Step 3 — Run them and capture the answers

Ask each prompt in each model. For every answer, record five things:

  1. Mentioned? Were you named at all? (Discoverability)
  2. Accurate? Was what it said about you correct? (Clarity)
  3. Favourable? Was the tone positive, neutral, or negative? (Trust)
  4. Position / share. Were you first, buried, or one of many? Who else was named? (Authority)
  5. Sources. Which URLs/domains did it cite? (Discoverability + Authority)

A note on method: AI answers vary between runs and over time, and they personalise. Use a clean session (logged out / no memory), and run each prompt a couple of times so you’re recording the typical answer, not a fluke. Date everything — answers drift, and the drift is the story.

Step 4 — Score across the four layers

Roll your raw notes up into a score per layer. Keep it simple — a 0–5 scale is plenty. The point isn’t false precision; it’s spotting which layer is your weakest link.

LayerWhat you’re scoring0–5 guide
Discoverability% of category/comparison prompts where you appear at all0 = never appears · 5 = appears in nearly every relevant answer
ClarityAccuracy of what models say about you0 = frequently wrong / confused with others · 5 = consistently accurate
AuthorityPosition & citation share vs competitors0 = competitors dominate, you’re absent · 5 = you’re a default named option
TrustSentiment & whether you’re recommended0 = negative / warned-about · 5 = actively recommended

Your lowest score is your starting point. The framework is sequential for a reason — fix Discoverability before Authority, Clarity before Trust.

Step 5 — The audit template

Grab it with the buttons below — Copy to paste straight into a sheet, or Download CSV to open it in Excel or Google Sheets. One row per prompt × model; the scorecard at the bottom becomes your baseline.

AI Visibility Audit — worksheet
PromptModelMentioned? (Y/N)Accurate? (Y/N/—)Sentiment (+/0/-)Your positionCompetitors namedSources citedNotes
”Best [category] for [segment]“ChatGPT
”Best [category] for [segment]“AI Overviews
”[You] vs [Competitor]“Perplexity
”What does [brand] do?”ChatGPT
”[Brand] reviews”ChatGPT

Baseline scorecard (fill once the table is complete):

Baseline scorecard
LayerScore (0–5)Biggest gap observed
Discoverability
Clarity
Authority
Trust

Step 6 — Read the results like a diagnostician

The scores tell you where; the patterns tell you why. A few common diagnoses:

  • Absent from category prompts, fine on brand prompts. You have a Discoverability problem — models can describe you when asked directly, but don’t surface you unprompted. Work on source presence and authority.
  • Mentioned but wrong. A Clarity problem. Models found you but have a bad entity record — wrong category, dated facts, confused with a competitor.
  • Mentioned, accurate, but never recommended. A Trust problem. You’re a known option, not a preferred one — usually a sentiment or reviews gap.
  • Same competitor everywhere, citing the same handful of domains. That’s your Authority target list. Those domains are where the answer is being decided.

Write the diagnosis down in plain language: “We’re invisible on comparison queries because every answer cites three review sites we’re barely on.” That sentence is worth more than any score.

Step 7 — Turn it into priorities

Don’t try to fix everything. Pick the lowest layer, pick the two or three highest-impact gaps inside it, and take those into your 90-day plan. Re-run the same prompt set monthly so you can see movement — the audit isn’t a one-time event, it’s your scoreboard. (When you’re ready to make that scoreboard a proper dashboard, we wrote a guide to GEO reporting, and you can see what a tracked baseline looks like in our Q1 2026 GEO benchmark.)

Further reading: 123

Sources

  1. Search Engine Land — 8 GEO metrics to track in 2026

  2. Ahrefs — How to track AI Overviews: mentions, citations, and click loss

  3. Search Engine Land — How to reverse-engineer LLM brand visibility

Tags AI Alignment GEO Playbook