Skip to content

ai.txt vs robots.txt vs llms.txt: which file does what

The machine governance landscape now includes three distinct files that site owners can publish. Each addresses a different question: robots.txt controls access, ai.txt controls usage, and llms.txt controls attention. Understanding which file solves which problem is essential for building a coherent governance posture.

robots.txt — the access layer

Robots.txt is the oldest and most widely respected governance file. Created in 1994, it tells crawlers which paths they may and may not access. It operates at the crawl level: before a bot fetches a page, it checks robots.txt to see if that path is allowed.

Robots.txt answers one question: can this bot visit this URL?

It does not say anything about what a bot may do with the content it retrieves. A page that is allowed in robots.txt can be indexed for search, used for AI training, archived, or scraped for competitive intelligence. The permission is binary — access or no access — with no conditions attached.

This is both its strength and its limitation. It is the most reliable control layer because virtually every crawler checks it. But it cannot express nuanced preferences like "allow retrieval but block training."

ai.txt — the usage layer

The ai.txt file is an emerging convention that addresses the gap robots.txt cannot fill. Where robots.txt controls whether a bot can fetch content, ai.txt declares what a bot may do with the content it fetches.

ai.txt answers: what is this bot permitted to do with my content?

Common usage categories include training (using content to build or refine AI models), retrieval (fetching content in real time to answer user queries), summarization (condensing content into shorter outputs), and generation (using content as a basis for new text).

A site owner might allow retrieval with attribution while prohibiting training. Or permit summarization for academic purposes but not for commercial products. This granularity is impossible with robots.txt alone.

Adoption is still early. Most AI crawlers do not yet actively check for ai.txt during their crawl process. But the file establishes documented intent — which matters both for legal purposes and for future ecosystem adoption.

llms.txt — the attention layer

The llms.txt file serves a completely different purpose from the other two. It does not restrict access or declare usage preferences. Instead, it guides AI systems toward the content that matters most on a site.

llms.txt answers: what should this AI system focus on?

A large website may contain thousands of pages. Without guidance, an AI system either crawls everything or guesses importance from URL structure and link patterns. llms.txt provides explicit guidance: here are the pages that matter, here is what each contains, and here is how they relate to each other.

Think of it as a sitemap for AI: it directs attention rather than restricting access.

How they work together

The three files form a complementary stack. Each addresses a distinct layer of the governance problem:

  1. robots.txt — can this bot access this page? (access control)
  2. ai.txt — what may this bot do with the content? (usage control)
  3. llms.txt — what content should this bot prioritize? (attention guidance)

Used independently, each file is useful. Used together, they provide AI systems with a complete picture: where to go, what to read, how to use it, and what the site owner permits.

The key principle is consistency. A page blocked in robots.txt should not appear in llms.txt. A usage restriction in ai.txt should align with the per-agent rules in robots.txt. Better Robots.txt generates these files from a single configuration to prevent contradictions — the llms.txt module and the AI governance settings produce coordinated output.

What about ai-manifest.json and .well-known files?

Beyond the three core files, a broader ecosystem of machine-readable governance surfaces is emerging. The machine governance file stack describes the full architecture, from .well-known/ai-governance.json to entity-graph.jsonld. These files extend the governance surface beyond the three core files but follow the same principle: each file answers a specific question for a specific audience.