ChatGPT-User vs GPTBot vs OAI-SearchBot: which OpenAI control does what
One of the easiest mistakes to make in modern machine-access governance is treating all OpenAI traffic as one thing.
It is not.
A practical OpenAI model now needs at least four distinct surfaces in view:
OAI-SearchBotGPTBotChatGPT-UserChatGPT agent
If those are collapsed into one mental bucket, teams start asking the wrong question.
They say "Should we block OpenAI?" when the real questions are much more specific:
- do we want visibility in ChatGPT search;
- do we want to refuse training;
- do we want to govern user-triggered visits;
- do we need verified or allowlisted agent access at the edge?
That is exactly why Why robots.txt is not enough for user-triggered AI agents matters as a pillar, not as an edge case.
The short version
Here is the cleanest way to separate the main OpenAI surfaces.
| Surface | What it is for | Main question |
|---|---|---|
OAI-SearchBot | Search visibility in ChatGPT search features | Do I want this site surfaced in ChatGPT search answers? |
GPTBot | Training collection | Do I allow this content to be used in training generative AI foundation models? |
ChatGPT-User | User-triggered visits from ChatGPT or Custom GPTs | Do I want this kind of on-demand retrieval, knowing it is not normal automatic crawl? |
ChatGPT agent | Signed and allowlistable agent traffic | Is this now an edge, identity, or runtime-permissions problem rather than a plain crawl-policy problem? |
That distinction is the whole article.
What OAI-SearchBot actually controls
OAI-SearchBot is the OpenAI surface that matters for search visibility.
Its job is not training. Its job is to surface websites in ChatGPT search features.
That means it is the OpenAI control you care about when the main business goal is:
- appearing in ChatGPT search answers;
- being surfaced, cited, and linked in ChatGPT search features;
- keeping search-oriented discoverability open.
This is also where a lot of teams accidentally damage themselves.
If you block OAI-SearchBot, your content is not supposed to be shown in ChatGPT search answers. OpenAI also says the site can still appear as a navigational link, which is a much narrower outcome than being properly surfaced in search answers.
So the first rule is simple:
If your concern is discoverability in ChatGPT search, think OAI-SearchBot, not GPTBot.
What GPTBot actually controls
GPTBot is the OpenAI surface for training collection.
Its role is to crawl content that may be used in training OpenAI’s generative AI foundation models.
That means GPTBot is the right control when your real question is:
- can this content be used for training;
- do we want to refuse future model-improvement use;
- do we want to separate search visibility from training reuse?
This is the key nuance:
blocking GPTBot is a training posture.
It is not the same thing as opting out of ChatGPT search visibility.
A site can allow OAI-SearchBot while disallowing GPTBot. In fact, OpenAI explicitly documents this as a normal choice.
That is a good example of why flat "allow OpenAI / block OpenAI" thinking is too crude to be useful.
What ChatGPT-User changes
ChatGPT-User introduces a different class of request.
It is used for certain user actions in ChatGPT and Custom GPTs. When a user asks ChatGPT a question, it may visit a web page with a ChatGPT-User agent. OpenAI also notes that users may interact with external applications through GPT Actions.
This matters because ChatGPT-User is not described as automatic web crawling.
It is user-triggered access.
That is why OpenAI says robots.txt rules may not apply to these requests.
This is also why ChatGPT-User should not be treated as a Search control. OpenAI explicitly says it is not used to determine whether content may appear in Search. That remains an OAI-SearchBot question.
So if your team is asking:
- why did ChatGPT visit this page on demand;
- why do we still see some retrieval behavior even though training was blocked;
- why is Search visibility still separate from user-triggered visits;
the answer is usually that you are looking at ChatGPT-User, not a classic crawler.
Where ChatGPT agent fits
There is another OpenAI surface that is easy to miss if you stay trapped inside the robots.txt frame.
OpenAI also documents ChatGPT agent as a signed and allowlistable agent in major edge ecosystems such as Akamai, Cloudflare, and HUMAN.
That means some OpenAI-related traffic should not be modeled primarily as a plain public crawler rule.
It should be modeled as a verified identity and infrastructure policy problem.
That kind of control lives in places like:
- CDN allowlisting;
- WAF rules;
- trusted agent directories;
- runtime permissions and session-aware policy.
This is exactly why Robots.txt vs signed agent allowlisting belongs in the same cluster as the OpenAI crawler breakdown.
Which OpenAI control should you use?
Use the following decision path.
Goal: appear in ChatGPT search features
Keep OAI-SearchBot allowed.
If your site is blocked there, you should not expect normal inclusion in ChatGPT search answers.
Goal: refuse training while keeping search visibility
Disallow GPTBot, but do not block OAI-SearchBot unless you also intend to lose normal ChatGPT search surfacing.
Goal: reason clearly about user-triggered visits
Treat ChatGPT-User as a separate class from search and training.
Do not assume that blocking a training crawler answers the user-triggered retrieval question.
Goal: allow a trusted agent through controlled infrastructure
That is not mainly a robots.txt problem.
Treat it as an edge verification and allowlisting problem.
Three common mistakes
1. Blocking GPTBot and thinking Search visibility is gone
It is not the Search control.
If the real issue is ChatGPT search visibility, OAI-SearchBot is the relevant surface.
2. Blocking OAI-SearchBot and then wondering why citation visibility drops
That is the expected outcome.
Search surfacing and training posture are not the same thing.
3. Treating ChatGPT-User and ChatGPT agent as the same operational class
They are not.
ChatGPT-User is about user-triggered visits. ChatGPT agent belongs to a signed or allowlistable agent model at the edge.
Where Better Robots.txt fits
Better Robots.txt helps on the publishing and governance side of this problem.
It helps site owners:
- separate
OAI-SearchBotfromGPTBot; - avoid mixing discoverability decisions with training decisions;
- coordinate
robots.txtwith AI usage signals andllms.txt; - keep the published policy clearer across several machine-readable surfaces.
What it does not do is turn signed-agent access into a simple checkbox problem.
When the real question becomes verified identity, allowlisting, runtime permissions, or edge enforcement, you are outside the core role of the plugin and into infrastructure territory.
The correct mental model
The safest OpenAI mental model now is this:
OAI-SearchBot= search visibilityGPTBot= training collectionChatGPT-User= user-triggered retrievalChatGPT agent= signed or allowlistable agent traffic
The clearer that separation is in your policy, the less likely you are to block the wrong thing for the wrong reason.