Claude-User vs ClaudeBot vs Claude-SearchBot: which Anthropic control does what

Anthropic’s crawler documentation became much more useful the moment it stopped behaving like a flat bot list.

It now makes a three-way split explicit:

ClaudeBot
Claude-User
Claude-SearchBot

That split matters because each surface answers a different policy question.

If you collapse them into a single "Anthropic bot" category, you lose the ability to separate:

training posture;
search visibility inside Claude’s search workflows;
user-directed retrieval.

That is exactly the kind of category error Better Robots.txt is designed to reduce.

The short version

Here is the cleanest operational view.

Surface	What it is for	Main question
`ClaudeBot`	Training collection	Do I allow future site content to be included in Anthropic training datasets?
`Claude-User`	User-triggered retrieval	Do I allow Claude to fetch my pages in response to a user request?
`Claude-SearchBot`	Search optimization	Do I want this site indexed to improve visibility and accuracy in Claude’s search-style results?

That is the minimum safe model.

What `ClaudeBot` actually controls

ClaudeBot is the Anthropic surface tied to training collection.

Anthropic describes it as the bot that helps enhance the utility and safety of its generative AI models by collecting web content that could potentially contribute to training.

That means ClaudeBot is the relevant surface when your question is:

should future materials from this site be excluded from Anthropic training datasets;
do we want to refuse model-improvement use;
do we want to separate search-related visibility from training reuse?

That is the first important lesson:

blocking ClaudeBot is a training decision.

It is not the same thing as blocking Anthropic’s user-directed retrieval or search optimization surfaces.

What `Claude-User` actually controls

Claude-User is Anthropic’s user-triggered retrieval surface.

Anthropic explains it very clearly: when individuals ask questions to Claude, it may access websites using a Claude-User agent.

This matters because it makes the request class obvious.

The visit exists because a user asked for it.

It is not just a classic automatic crawl.

Anthropic also says that disabling Claude-User prevents their system from retrieving your content in response to a user query, which may reduce your site’s visibility for user-directed web search.

So if the real business question is:

do we want Claude to be able to fetch our pages on demand for users;
are we comfortable with user-directed retrieval but not with training;
why do we still see Anthropic requests even after changing the training posture;

then Claude-User is the surface to examine.

What `Claude-SearchBot` actually controls

Claude-SearchBot is the Anthropic surface for search optimization.

Anthropic says it navigates the web to improve search result quality for users, analyzing online content to improve the relevance and accuracy of search responses.

That makes it the right control when the question is about:

search-style discoverability inside Claude;
indexing for improved answer quality;
visibility and accuracy in user search results.

Anthropic also states that disabling Claude-SearchBot prevents the system from indexing your content for search optimization, which may reduce your site’s visibility and accuracy in user search results.

So the second key lesson is this:

Claude-SearchBot is not ClaudeBot.

One is about training. The other is about search optimization and visibility.

What makes Anthropic operationally different

Anthropic’s documentation adds two operational details that are very important.

1. Anthropic supports `Crawl-delay`

Anthropic says its bots aim for minimal disruption and respect Crawl-delay where appropriate.

That matters for sites where the question is not "block or allow everything", but "slow this operator down".

2. Anthropic does not currently publish IP ranges

Anthropic also says that opting out should be done through robots.txt, not by blocking IP addresses, because IP-based blocking may not work correctly or persistently guarantee an opt-out and can prevent Anthropic from reading your robots.txt.

It also states that it does not currently publish IP ranges because it uses service provider public IPs.

That is a major practical distinction from operators that publish dedicated crawler IP lists.

It means that for Anthropic, the primary public governance lever remains the published policy surface, not a strong IP-verification workflow.

Which Anthropic control should you use?

Use the following decision path.

Goal: refuse training use

Block ClaudeBot.

That is the surface Anthropic ties to future training datasets.

Goal: stay visible in Claude’s search-style systems

Keep Claude-SearchBot allowed.

If it is blocked, you should expect lower visibility and lower indexing quality in those user-facing search workflows.

Goal: govern user-triggered access

Treat Claude-User separately.

That is the Anthropic surface for user-directed retrieval.

Goal: reduce crawl pressure without a full block

Consider Crawl-delay in addition to your allow/deny posture.

Three common mistakes

1. Blocking `ClaudeBot` and assuming all Anthropic traffic is now covered

It is not.

That only addresses the training surface.

2. Using IP blocking as the primary opt-out strategy

Anthropic explicitly warns that this may not work correctly or persistently guarantee an opt-out and can prevent the system from reading robots.txt.

3. Forgetting subdomains

Anthropic says you should publish the block in the robots.txt file of every subdomain you want to opt out from.

That is a classic operational omission.

Where Better Robots.txt fits

Better Robots.txt helps most on the part of the problem that is actually publishable policy.

It helps you:

separate training, search optimization, and user-triggered retrieval;
publish clearer robots.txt rules;
keep Anthropic-related decisions from being mixed into one blunt block;
align those decisions with broader machine governance outputs.

What it cannot do is make Anthropic traffic magically easier to verify than Anthropic publicly documents it.

If identity verification is weak or vendor IP ranges are unpublished, the honest thing is to say so and design policy accordingly.

The correct mental model

The safest Anthropic mental model is this:

ClaudeBot = training collection
Claude-User = user-directed retrieval
Claude-SearchBot = search optimization

The more clearly those are separated, the less likely your policy is to destroy visibility when your real goal was only to refuse training.

Claude-User vs ClaudeBot vs Claude-SearchBot: which Anthropic control does what ​

The short version ​

What ClaudeBot actually controls ​

What Claude-User actually controls ​

What Claude-SearchBot actually controls ​

What makes Anthropic operationally different ​

1. Anthropic supports Crawl-delay ​

2. Anthropic does not currently publish IP ranges ​

Which Anthropic control should you use? ​

Goal: refuse training use ​

Goal: stay visible in Claude’s search-style systems ​

Goal: govern user-triggered access ​

Goal: reduce crawl pressure without a full block ​

Three common mistakes ​

1. Blocking ClaudeBot and assuming all Anthropic traffic is now covered ​

2. Using IP blocking as the primary opt-out strategy ​

3. Forgetting subdomains ​

Where Better Robots.txt fits ​

The correct mental model ​

Related ​