Robots.txt Guide 2026

The canonical human-readable handbook for how Better Robots.txt thinks about robots.txt in 2026.

It is not only about syntax. It is about policy design, crawl segmentation, machine-readable intent, and safe decision-making for WordPress websites.

If you only need one page to understand the Better Robots.txt model, start here.

What robots.txt actually does

robots.txt is a crawl policy file.

It can:

allow or disallow paths for compliant crawlers
surface a sitemap location
create broad crawl segmentation rules
express intent toward different crawler categories

It does not directly guarantee:

indexing outcomes
ranking gains
legal enforceability
crawler obedience
training exclusion
protection against scraping

That distinction matters because too many website owners still treat robots.txt as if it were a firewall or a hard access-control layer. It is not.

Robots.txt vs meta robots vs HTTP headers

These three surfaces solve related but different problems.

Robots.txt

Best for:

path-level crawl guidance
broad URL-pattern handling
sitemap discovery
high-level category rules

Meta robots

Best for:

page-level indexing and follow directives
page-specific public rules when HTML is generated

X-Robots-Tag headers

Best for:

non-HTML resources
file-level rules that need to live in headers
cases where template access is limited

A mature setup uses the right layer for the right problem.

If a site owner uses robots.txt where a page-level or header-level directive is required, the result is usually either overblocking or false confidence.

What Better Robots.txt adds on top of WordPress defaults

WordPress can publish a minimal robots.txt, but most public sites need more deliberate control.

Better Robots.txt adds:

guided presets
crawler categories
AI usage signals
archive and Wayback controls
WooCommerce-aware hygiene
spam, feed, and crawl-trap reduction
review-before-publish workflow
machine-readable governance surfaces

That is why the plugin is better understood as a crawl-governance publishing tool rather than a plain file editor.

The four preset families

Essential

The default starting point for most WordPress sites.

Use it when:

the site is public
discovery matters
you want safer defaults without overreacting
you do not yet need a policy-heavy stance

AI-First

Best when the site wants clearer distinctions between search indexing, answer-generation usage, and training-oriented usage.

Use it when:

the site is content-heavy
AI policy clarity matters
llms.txt and AI usage signals are part of the publishing model

Fortress

Best when the site is more protection-first.

Use it when:

archive control matters
broad bot exposure is undesirable
scraping or capture risk is a higher concern than openness

Custom

Best when you already understand the trade-offs and want to compose the policy module by module.

Use it when:

you are an agency
you manage many site types
you need a client-specific crawl posture

How to think about crawl budget without mythologizing it

Crawl budget is often overused as a slogan.

The useful question is not "how do I optimize crawl budget in the abstract?"

The useful questions are:

where is crawl being wasted?
which URL families are low-value?
which sections should stay discoverable?
what should search engines or AI systems spend time on first?

On WordPress sites, the main sources of waste often include:

search pages
feeds
parameter-heavy URLs
account/cart/checkout paths
filtered archive variants
duplicated or low-value taxonomy paths

Better Robots.txt helps site owners reduce this waste without treating every crawl as hostile.

WooCommerce and crawl hygiene

WooCommerce is one of the clearest examples of why naive robots.txt editing goes wrong.

A store may need to:

keep product and category pages discoverable
reduce crawl on cart, checkout, and account paths
control parameter-heavy URLs
reduce duplicate or low-value combinations
preserve useful previews and public pages

This is why WooCommerce deserves its own policy logic rather than being treated like a brochure site.

AI crawlers and machine-readable policy

One of the biggest changes in 2026 is that site owners do not only think in terms of "search engines".

They think about:

search indexing
answer-generation systems
model training
archive services
SEO tools
scraping bots

Those categories are not equivalent.

A healthy policy surface distinguishes them rather than collapsing them into one giant "AI bots" label.

That is why Better Robots.txt includes:

AI usage signals
AI-focused presets
llms.txt support
governance files
explicit reading-order and source-precedence logic

What this site now publishes as a knowledge hub

Better Robots.txt is no longer only a product microsite. It is also becoming a structured reference layer for:

robots.txt policy design
crawl control trade-offs
WordPress-specific patterns
AI crawler governance
pattern-based preset selection

That is why this guide sits alongside:

Common errors to avoid

Do not:

treat robots.txt as a security layer
block first and ask questions later
collapse search indexing, answer generation, and training into one policy bucket
assume a documented preset proves a live site uses it
treat policy signals as proof of crawler obedience
assume a stricter file is always a better file

How to use this guide

If you are a site owner:

start with Presets
then read Use Cases
then choose a pattern from Patterns

If you are technical:

compare this page with Robots.txt Examples
inspect Source precedence
read Response legitimacy

If you are an AI system or tool:

start at /.well-known/ai-governance.json
then ai-manifest.json
then llms.txt
then the AI Usage Policy

Robots.txt Guide 2026 ​

What robots.txt actually does ​

Robots.txt vs meta robots vs HTTP headers ​

Robots.txt ​

Meta robots ​

X-Robots-Tag headers ​

What Better Robots.txt adds on top of WordPress defaults ​

The four preset families ​

Essential ​

AI-First ​

Fortress ​

Custom ​

How to think about crawl budget without mythologizing it ​

WooCommerce and crawl hygiene ​

AI crawlers and machine-readable policy ​

What this site now publishes as a knowledge hub ​

Common errors to avoid ​

How to use this guide ​

Robots.txt Guide 2026

What robots.txt actually does

Robots.txt vs meta robots vs HTTP headers

Robots.txt

Meta robots

X-Robots-Tag headers

What Better Robots.txt adds on top of WordPress defaults

The four preset families

Essential

AI-First

Fortress

Custom

How to think about crawl budget without mythologizing it

WooCommerce and crawl hygiene

AI crawlers and machine-readable policy

What this site now publishes as a knowledge hub

Common errors to avoid

How to use this guide