The machine governance file stack: from robots.txt to .well-known
A modern website communicating its full governance posture to machines may publish a dozen or more files, each serving a specific purpose for a specific audience. This article maps the entire stack — from the foundational files that every site should have to the advanced governance surfaces that establish institutional-grade machine readability.
Layer 1: universal files
These files are read by virtually every crawler and should exist on every site.
robots.txt is the foundation. It controls crawl access per user agent and provides the sitemap location. It is the most widely respected governance file on the web. Every site needs one; the default WordPress file is not sufficient.
sitemap.xml lists the pages you want crawled and indexed, with optional priority and last-modified dates. It works in partnership with robots.txt: the sitemap says what to find, robots.txt says what to skip. Keeping them aligned prevents contradictory signals.
humans.txt identifies the site owner and contact information. It is not a governance file per se, but it anchors the identity behind the governance decisions.
Layer 2: AI-specific files
These files emerged from the need to communicate with AI systems specifically, beyond what robots.txt can express.
llms.txt is a structured guide for large language models, describing what the site contains and which pages are most relevant. It is the AI equivalent of a sitemap: attention guidance rather than access control.
llms-full.txt extends llms.txt with deeper context: governance hierarchies, reading sequences, and structured references to other machine files.
ai.txt declares usage preferences: whether content may be used for training, retrieval, summarization, or generation.
llm-policy.json and llm-guidelines.md provide machine-readable and human-readable policy declarations for AI system behavior when consuming site content.
Layer 3: structured governance
These files provide the deepest layer of machine-readable governance. They are consumed by advanced AI systems, automated auditing tools, and governance infrastructure.
ai-manifest.json acts as a routing index for all machine surfaces on the site. It tells an AI system where to find every governance file and what each one contains.
ai-governance.json (typically at /.well-known/ai-governance.json) is the canonical governance root. It declares the site's interpretation policy, response legitimacy rules, anti-plausibility constraints, and output boundaries.
entity-graph.jsonld describes the entities associated with the site (organization, products, people) in Schema.org vocabulary, providing identity anchors for AI systems.
product.jsonld provides structured product data for software or service sites, including offers, ratings, and feature lists.
datasets.jsonld declares what datasets or registries the site publishes, their formats, and their intended use.
Layer 4: policy and verification
These files close the governance loop by providing human-readable policies and cryptographic verification.
AI usage policy (as an HTML page like /governance/ai-usage-policy or a Markdown mirror like /ai-policy.md) states the site's position on AI access in human language. It serves journalists, researchers, legal teams, and anyone who wants to understand the site's stance.
doctrine-index.json lists all governance files on the site with their paths, providing a machine-readable inventory of the entire stack.
governance-fingerprint.json provides cryptographic hashes of governance files, enabling verification that files have not been modified since publication.
What a typical site needs
Not every site needs every file. The practical minimum for a WordPress site in 2026 is:
- A properly configured robots.txt with per-agent rules for AI crawlers
- An accurate sitemap.xml that is aligned with robots.txt
- A llms.txt that describes the site's content for AI systems
- A published AI usage policy that documents the site's position on AI access
Better Robots.txt generates layers 1 through 3 from its configuration interface. The governance module provides the policy layer. Together, they create a governance stack that communicates clearly to every type of machine visitor.