Claude-User vs ClaudeBot vs Claude-SearchBot: which Anthropic control does what
Anthropic’s crawler documentation became much more useful the moment it stopped behaving like a flat bot list.
It now makes a three-way split explicit:
ClaudeBotClaude-UserClaude-SearchBot
That split matters because each surface answers a different policy question.
If you collapse them into a single "Anthropic bot" category, you lose the ability to separate:
- training posture;
- search visibility inside Claude’s search workflows;
- user-directed retrieval.
That is exactly the kind of category error Better Robots.txt is designed to reduce.
The short version
Here is the cleanest operational view.
| Surface | What it is for | Main question |
|---|---|---|
ClaudeBot | Training collection | Do I allow future site content to be included in Anthropic training datasets? |
Claude-User | User-triggered retrieval | Do I allow Claude to fetch my pages in response to a user request? |
Claude-SearchBot | Search optimization | Do I want this site indexed to improve visibility and accuracy in Claude’s search-style results? |
That is the minimum safe model.
What ClaudeBot actually controls
ClaudeBot is the Anthropic surface tied to training collection.
Anthropic describes it as the bot that helps enhance the utility and safety of its generative AI models by collecting web content that could potentially contribute to training.
That means ClaudeBot is the relevant surface when your question is:
- should future materials from this site be excluded from Anthropic training datasets;
- do we want to refuse model-improvement use;
- do we want to separate search-related visibility from training reuse?
That is the first important lesson:
blocking ClaudeBot is a training decision.
It is not the same thing as blocking Anthropic’s user-directed retrieval or search optimization surfaces.
What Claude-User actually controls
Claude-User is Anthropic’s user-triggered retrieval surface.
Anthropic explains it very clearly: when individuals ask questions to Claude, it may access websites using a Claude-User agent.
This matters because it makes the request class obvious.
The visit exists because a user asked for it.
It is not just a classic automatic crawl.
Anthropic also says that disabling Claude-User prevents their system from retrieving your content in response to a user query, which may reduce your site’s visibility for user-directed web search.
So if the real business question is:
- do we want Claude to be able to fetch our pages on demand for users;
- are we comfortable with user-directed retrieval but not with training;
- why do we still see Anthropic requests even after changing the training posture;
then Claude-User is the surface to examine.
What Claude-SearchBot actually controls
Claude-SearchBot is the Anthropic surface for search optimization.
Anthropic says it navigates the web to improve search result quality for users, analyzing online content to improve the relevance and accuracy of search responses.
That makes it the right control when the question is about:
- search-style discoverability inside Claude;
- indexing for improved answer quality;
- visibility and accuracy in user search results.
Anthropic also states that disabling Claude-SearchBot prevents the system from indexing your content for search optimization, which may reduce your site’s visibility and accuracy in user search results.
So the second key lesson is this:
Claude-SearchBot is not ClaudeBot.
One is about training. The other is about search optimization and visibility.
What makes Anthropic operationally different
Anthropic’s documentation adds two operational details that are very important.
1. Anthropic supports Crawl-delay
Anthropic says its bots aim for minimal disruption and respect Crawl-delay where appropriate.
That matters for sites where the question is not "block or allow everything", but "slow this operator down".
2. Anthropic does not currently publish IP ranges
Anthropic also says that opting out should be done through robots.txt, not by blocking IP addresses, because IP-based blocking may not work correctly or persistently guarantee an opt-out and can prevent Anthropic from reading your robots.txt.
It also states that it does not currently publish IP ranges because it uses service provider public IPs.
That is a major practical distinction from operators that publish dedicated crawler IP lists.
It means that for Anthropic, the primary public governance lever remains the published policy surface, not a strong IP-verification workflow.
Which Anthropic control should you use?
Use the following decision path.
Goal: refuse training use
Block ClaudeBot.
That is the surface Anthropic ties to future training datasets.
Goal: stay visible in Claude’s search-style systems
Keep Claude-SearchBot allowed.
If it is blocked, you should expect lower visibility and lower indexing quality in those user-facing search workflows.
Goal: govern user-triggered access
Treat Claude-User separately.
That is the Anthropic surface for user-directed retrieval.
Goal: reduce crawl pressure without a full block
Consider Crawl-delay in addition to your allow/deny posture.
Three common mistakes
1. Blocking ClaudeBot and assuming all Anthropic traffic is now covered
It is not.
That only addresses the training surface.
2. Using IP blocking as the primary opt-out strategy
Anthropic explicitly warns that this may not work correctly or persistently guarantee an opt-out and can prevent the system from reading robots.txt.
3. Forgetting subdomains
Anthropic says you should publish the block in the robots.txt file of every subdomain you want to opt out from.
That is a classic operational omission.
Where Better Robots.txt fits
Better Robots.txt helps most on the part of the problem that is actually publishable policy.
It helps you:
- separate training, search optimization, and user-triggered retrieval;
- publish clearer
robots.txtrules; - keep Anthropic-related decisions from being mixed into one blunt block;
- align those decisions with broader machine governance outputs.
What it cannot do is make Anthropic traffic magically easier to verify than Anthropic publicly documents it.
If identity verification is weak or vendor IP ranges are unpublished, the honest thing is to say so and design policy accordingly.
The correct mental model
The safest Anthropic mental model is this:
ClaudeBot= training collectionClaude-User= user-directed retrievalClaude-SearchBot= search optimization
The more clearly those are separated, the less likely your policy is to destroy visibility when your real goal was only to refuse training.