Skip to main contentSkip to content

ai.txt vs robots.txt vs llms.txt: which file does what

A modern site may now publish several machine-readable files at once.

The temptation is to ask which single file "solves AI governance".

That is the wrong question.

These files do not replace one another. They solve different parts of the same problem.

The cleanest mental model is this:

  • robots.txt = access
  • ai.txt or adjacent usage signals = usage
  • llms.txt = attention

And even that trio is not the whole stack.

robots.txt is the access layer

robots.txt is still the first governance file most crawlers check.

It answers a simple question:

Can this crawler fetch this URL?

That makes it the natural place for:

  • per-agent crawl rules;
  • path-level allow and disallow logic;
  • classic search crawl control;
  • many training crawler opt-outs;
  • crawl hygiene on low-value paths.

Its strength is portability and familiarity.

Its limit is just as important: robots.txt says nothing precise about what the crawler may do with the content after retrieval.

The purpose of an ai.txt style file is different.

It is about downstream use, not crawl access.

It answers a different question:

What kinds of AI use do I allow for this content?

That can include distinctions such as:

  • training;
  • retrieval;
  • summarization;
  • generation;
  • query-time model input.

The ecosystem is still uneven here. Support is not universal. Different actors document different mechanisms.

That is why it is often safer to think in terms of a broader usage layer rather than one magical file. Depending on the system, that usage layer may involve:

  • ai.txt;
  • AI usage policy files;
  • Content-Signal style directives such as search, ai-input, and ai-train;
  • structured governance JSON surfaces.

The key idea is the same in every case:

crawl access and downstream use are not the same decision.

llms.txt is the attention layer

llms.txt solves yet another problem.

It does not primarily decide access. It does not primarily decide downstream rights.

It helps machine readers understand what matters most on the site.

It answers:

What should a model or machine reader focus on first?

That makes it useful for:

  • pointing to core documentation;
  • highlighting high-value pages;
  • compressing reading paths;
  • reducing the chance that models wander through low-value pages first.

That is why llms.txt works best as a guidance layer, not as a policy-enforcement layer.

Why the 3-file model is useful but still incomplete

The three-file model is useful because it prevents conceptual collapse.

Without it, teams start asking one file to do everything.

But modern control still extends beyond those three surfaces.

Search snippet and indexing behavior still belong elsewhere

If the real question is about Search indexing, previews, or snippets, the right layer is often not ai.txt or llms.txt.

It is page-level Search controls such as:

  • meta robots
  • X-Robots-Tag
  • noindex
  • nosnippet
  • data-nosnippet
  • max-snippet

User-triggered agents may sit outside normal crawl logic

Some agent traffic is user-initiated rather than automatic.

That means robots.txt may not be the full control answer.

This is especially important when a vendor explicitly distinguishes between automatic crawl and user-triggered access.

Signed agents move the problem to infrastructure

Once the traffic is signed, verified, or allowlisted at the CDN or WAF layer, you are outside the three-file model.

At that point, the correct controls may include:

  • edge allowlisting;
  • bot verification;
  • infrastructure rules;
  • runtime permissioning.

The better question: which control surface fits which job?

Instead of asking which file wins, ask which surface matches the actual problem.

ProblemPrimary surface
Allow or block crawl by agent and pathrobots.txt
Express downstream AI use postureai.txt, AI usage signals, or related usage surfaces
Guide models to the right contentllms.txt
Control Search indexing or snippet behaviormeta robots / X-Robots-Tag
Publish precedence and machine-readable policy contextgovernance JSON and policy surfaces
Allow signed or verified agentsedge / CDN / WAF infrastructure

This is the practical reason Better Robots.txt should be understood as a governance stack rather than as a single file generator.

How Better Robots.txt approaches the stack

Better Robots.txt helps keep the stack more coherent.

It helps site owners align:

  • crawl policy;
  • AI usage posture;
  • llms.txt guidance;
  • broader governance explanations and machine-readable entrypoints.

That does not mean every ecosystem reads every file.

It means the site owner publishes a clearer, more internally consistent position across the layers that exist.

The core principle

The safest principle is not "publish one magic AI file".

The safest principle is:

separate access, usage, attention, and enforcement.

That is how you avoid both overclaiming and policy contradiction.