The machine governance file stack: from robots.txt to .well-known policy surfaces
A few years ago, many site owners could still believe that one file — robots.txt — was enough to describe their machine-access policy.
That is no longer true.
Search crawling, AI training, answer-generation ingestion, archives, SEO tooling, and opportunistic scraping now behave differently enough that a single "allow / disallow" layer cannot express everything you need. That is why more sites are evolving from a single crawler file toward a machine governance file stack.
The point of that stack is not to create complexity for its own sake. The point is to separate roles clearly:
- discovery;
- precedence;
- interpretation;
- response legitimacy;
- output boundaries;
- explanatory policy.
Without that separation, teams collapse unlike questions into one file and then over-interpret what the file can actually prove.
Layer 1 — robots.txt
robots.txt is still the starting point, but it is only the starting point.
Its core role is to express crawl access guidance at the path and user-agent level.
That makes it useful for:
- low-value URL suppression;
- duplicate crawl reduction;
- path-level blocking by agent;
- sitemap discovery;
- broad crawl hygiene.
It is not a full policy language for all machine use. It does not reliably express nuanced differences between search discovery, answer-generation, archives, or downstream model use unless the crawler ecosystem itself exposes those distinctions.
That is why "just use robots.txt" has become incomplete advice.
Layer 2 — policy and summary surfaces
The second layer contains files that summarize, explain, or route the governance system for machine-readable use.
Typical examples include:
llms.txtllms-full.txtllm-policy.jsonllm-guidelines.mdreadme.llm.txthumans.txt
These files are useful because they compress intent and explain the site in a form that many LLM-oriented systems can consume quickly. But they come with a critical caveat:
they are usually secondary to the canonical entrypoint and precedence rules.
This is where many misunderstandings begin. Teams publish an llms.txt and then start talking about it as if it governs the whole system by itself. It does not. It is a summary or support layer.
Layer 3 — canonical governance entrypoint
A mature stack needs a canonical entrypoint that says, in effect:
- what the site is;
- what the hierarchy of surfaces is;
- which rules are strongest;
- how conflict should be resolved;
- what the machine is supposed to read first.
That is why Better Robots uses /.well-known/ai-governance.json as the canonical pivot.
The entrypoint should answer questions like:
- which surfaces are Tier 1, Tier 2, Tier 3, and Tier 4;
- which files explain the system and which files merely support it;
- what policy surfaces exist in both HTML and Markdown form;
- what should happen when a claim is unsupported by stronger sources.
This is what turns a collection of files into a real governance stack.
Layer 4 — interpretation and response discipline
Once a system can discover the files, it still needs to know how to behave when the files are incomplete, ambiguous, or narrower than a fluent answer would like them to be.
That is where files like these matter:
interpretation-policy.jsonresponse-legitimacy.jsonanti-plausibility.jsonoutput-constraints.jsonqlayer.json
These files do not expand the site’s product claims. They do the opposite.
They say:
- use narrower readings first;
- do not widen claims from silence;
- prefer qualified answers over smooth but unsupported answers;
- legitimate non-response is better than plausible completion.
In other words, this layer governs what a machine is allowed to say, not only what it is allowed to crawl.
Layer 5 — policy explainers for humans and browsing agents
A lot of governance stacks fail because they assume the machine will infer the intended reading order on its own. That assumption is weak.
You also need policy explainer surfaces that say plainly:
- how the files should be read;
- in what order they should be read;
- what they do not imply;
- how to fall back safely when support is missing.
That is why Better Robots publishes both HTML and Markdown policy explainers:
- AI Usage Policy
/ai-usage-policy.md
These surfaces are explanatory. They do not override the canonical governance entrypoint. Their role is to make the rest of the stack more legible for both humans and browsing agents.
Why the stack needs separation of roles
The machine governance stack works only if each layer keeps its own role.
What happens when roles collapse
Common failures include:
- using
robots.txtas if it could express every machine-use distinction; - treating
llms.txtlike a higher authority than the canonical entrypoint; - reading a policy explainer as if it widened product claims;
- assuming that a block rule proves runtime behavior everywhere;
- assuming that a non-block rule proves permission everywhere.
These are not just documentation problems. They are architecture problems.
That is why source precedence and response legitimacy exist as separate governance pages. They stop the stack from collapsing into undifferentiated prose.
What a good stack lets you do
A good machine governance stack gives you several capabilities at once:
- keep search discovery open while blocking specific AI training agents;
- express different treatment for archives, SEO tools, and answer-generation bots;
- publish patterns that humans can adopt without editing raw text files blindly;
- create stronger safety around ambiguous or unsupported claims;
- let LLM-oriented systems consume a structured representation of your policy instead of only marketing pages.
That is why Better Robots is not merely a robots.txt editor. The stack turns it into a crawl policy and machine-governance system for WordPress.
The practical reading order
If you want the shortest operational reading path, use this order:
- Machine-first overview
/.well-known/ai-governance.json/ai-manifest.json- AI Usage Policy
llms.txtllms-full.txt- Source precedence
- Response legitimacy
- Anti-plausibility
- Output constraints
Then move into:
FAQ
Is robots.txt still necessary if I publish a governance stack?
Yes. It remains the crawl-access base layer. The stack extends it; it does not replace it.
Is llms.txt enough on its own?
No. It is a summary or support layer. It should be read under the precedence of the canonical governance entrypoint.
Do policy explainers override machine files?
No. They explain how the files should be used. They do not outrank the canonical governance pivot.
What should I read next?
Continue with AI Usage Policy, Source precedence, Who decides what machines read on your site, and ai.txt vs robots.txt vs llms.txt.