ai.txt vs robots.txt vs llms.txt: which file does what
A modern site may now publish several machine-readable files at once.
The temptation is to ask which single file "solves AI governance".
That is the wrong question.
These files do not replace one another. They solve different parts of the same problem.
The cleanest mental model is this:
robots.txt= accessai.txtor adjacent usage signals = usagellms.txt= attention
And even that trio is not the whole stack.
robots.txt is the access layer
robots.txt is still the first governance file most crawlers check.
It answers a simple question:
Can this crawler fetch this URL?
That makes it the natural place for:
- per-agent crawl rules;
- path-level allow and disallow logic;
- classic search crawl control;
- many training crawler opt-outs;
- crawl hygiene on low-value paths.
Its strength is portability and familiarity.
Its limit is just as important: robots.txt says nothing precise about what the crawler may do with the content after retrieval.
ai.txt and related usage signals are the usage layer
The purpose of an ai.txt style file is different.
It is about downstream use, not crawl access.
It answers a different question:
What kinds of AI use do I allow for this content?
That can include distinctions such as:
- training;
- retrieval;
- summarization;
- generation;
- query-time model input.
The ecosystem is still uneven here. Support is not universal. Different actors document different mechanisms.
That is why it is often safer to think in terms of a broader usage layer rather than one magical file. Depending on the system, that usage layer may involve:
ai.txt;- AI usage policy files;
Content-Signalstyle directives such assearch,ai-input, andai-train;- structured governance JSON surfaces.
The key idea is the same in every case:
crawl access and downstream use are not the same decision.
llms.txt is the attention layer
llms.txt solves yet another problem.
It does not primarily decide access. It does not primarily decide downstream rights.
It helps machine readers understand what matters most on the site.
It answers:
What should a model or machine reader focus on first?
That makes it useful for:
- pointing to core documentation;
- highlighting high-value pages;
- compressing reading paths;
- reducing the chance that models wander through low-value pages first.
That is why llms.txt works best as a guidance layer, not as a policy-enforcement layer.
Why the 3-file model is useful but still incomplete
The three-file model is useful because it prevents conceptual collapse.
Without it, teams start asking one file to do everything.
But modern control still extends beyond those three surfaces.
Search snippet and indexing behavior still belong elsewhere
If the real question is about Search indexing, previews, or snippets, the right layer is often not ai.txt or llms.txt.
It is page-level Search controls such as:
meta robotsX-Robots-Tagnoindexnosnippetdata-nosnippetmax-snippet
User-triggered agents may sit outside normal crawl logic
Some agent traffic is user-initiated rather than automatic.
That means robots.txt may not be the full control answer.
This is especially important when a vendor explicitly distinguishes between automatic crawl and user-triggered access.
Signed agents move the problem to infrastructure
Once the traffic is signed, verified, or allowlisted at the CDN or WAF layer, you are outside the three-file model.
At that point, the correct controls may include:
- edge allowlisting;
- bot verification;
- infrastructure rules;
- runtime permissioning.
The better question: which control surface fits which job?
Instead of asking which file wins, ask which surface matches the actual problem.
| Problem | Primary surface |
|---|---|
| Allow or block crawl by agent and path | robots.txt |
| Express downstream AI use posture | ai.txt, AI usage signals, or related usage surfaces |
| Guide models to the right content | llms.txt |
| Control Search indexing or snippet behavior | meta robots / X-Robots-Tag |
| Publish precedence and machine-readable policy context | governance JSON and policy surfaces |
| Allow signed or verified agents | edge / CDN / WAF infrastructure |
This is the practical reason Better Robots.txt should be understood as a governance stack rather than as a single file generator.
How Better Robots.txt approaches the stack
Better Robots.txt helps keep the stack more coherent.
It helps site owners align:
- crawl policy;
- AI usage posture;
llms.txtguidance;- broader governance explanations and machine-readable entrypoints.
That does not mean every ecosystem reads every file.
It means the site owner publishes a clearer, more internally consistent position across the layers that exist.
The core principle
The safest principle is not "publish one magic AI file".
The safest principle is:
separate access, usage, attention, and enforcement.
That is how you avoid both overclaiming and policy contradiction.