Robots.txt vs meta robots vs x-robots-tag: when to use each
WordPress site owners have three distinct mechanisms to control how search engines and other bots handle their content: the robots.txt file, the meta robots tag, and the X-Robots-Tag HTTP header. All three influence crawler behavior, but they operate at different layers, affect different things, and solve different problems.
Choosing the wrong one — or using one when you need another — leads to content that stays indexed when it should not, content that disappears when it should not, or crawl budget wasted on pages that deliver no value.
Robots.txt: controlling access before the crawl
Robots.txt works at the crawl level. When a bot requests your robots.txt file, it reads the rules and decides which URLs it is allowed to fetch. If a URL is disallowed, a well-behaved crawler never requests it.
This is important to understand: robots.txt prevents crawling, not indexing. If a disallowed page has inbound links from other sites, Google may still list the URL in search results — it just will not have crawled the page to know what is on it. The result is a thin, sometimes embarrassing search listing with no snippet and no useful information.
Use robots.txt when: You want to prevent bots from accessing entire sections of your site (admin areas, staging directories, internal search results, low-value parameter pages). It is the right tool for managing crawl budget at scale and for setting category-level policies for different types of bots.
Do not use robots.txt when: You want to remove a specific page from search results. For that, you need a noindex directive.
Meta robots: controlling indexing at the page level
The meta robots tag lives inside the HTML <head> of individual pages. It tells search engines what to do with a page after they have crawled it. Common values include noindex (do not add this page to the index), nofollow (do not follow links on this page), and noarchive (do not show a cached copy).
This is the correct tool for removing individual pages from search results. Unlike robots.txt, a meta robots noindex directive tells the search engine explicitly: you may crawl this page, but do not show it in results.
Use meta robots when: You want fine-grained, page-level control over indexing. Common cases include thank-you pages, internal search result pages, tag archives with thin content, or pages that exist for logged-in users but should not appear in search.
Do not use meta robots when: The page should not be crawled at all. If you use noindex but leave the page accessible in robots.txt, crawlers will still fetch it on every crawl cycle just to read the noindex tag. That is wasted crawl budget.
X-Robots-Tag: controlling indexing at the server level
The X-Robots-Tag is an HTTP response header that carries the same directives as meta robots (noindex, nofollow, noarchive, and others) but is delivered at the server level rather than embedded in HTML.
This distinction matters for one critical reason: X-Robots-Tag works on any file type. PDFs, images, XML files, JSON endpoints — anything served over HTTP can carry an X-Robots-Tag. The meta robots tag only works inside HTML documents.
Use X-Robots-Tag when: You need to control indexing for non-HTML resources (PDFs you want deindexed, images you want excluded, API endpoints that should not appear in search). It is also useful when you cannot modify page templates but have access to server configuration or a plugin that sets HTTP headers.
Do not use X-Robots-Tag when: A meta robots tag would do the job. Adding HTTP headers requires server-level or plugin-level configuration, which is harder to audit than a tag visible in page source. Use the simplest tool that solves the problem.
The overlap trap
The most common confusion is using robots.txt to try to deindex a page. This does not work. If you block a URL in robots.txt, crawlers cannot access the page — which means they cannot read the noindex tag you put there. The page may remain in Google's index indefinitely as a URL-only listing.
The correct sequence is:
- Add
noindexto the page (via meta robots or X-Robots-Tag). - Wait for crawlers to see the directive and drop the page from the index.
- Only then, if desired, add a robots.txt disallow to stop future crawl attempts.
Reversing this order — blocking in robots.txt first — prevents the noindex from ever being read.
Where AI crawlers fit
Traditional search engine crawlers respect all three mechanisms with reasonable consistency. AI crawlers introduce a new variable.
Most AI crawlers (GPTBot, ClaudeBot, CCBot) read and respect robots.txt. However, not all of them process meta robots or X-Robots-Tag in the same way search engines do. Some AI systems fetch content for training or retrieval without rendering the page, which means they may never encounter an in-page meta tag.
For AI crawlers specifically, robots.txt is currently the most reliable control layer. It operates before the content is fetched, making it the one mechanism you can count on regardless of how the bot processes the response.
Choosing the right tool
The decision tree is straightforward:
- Want to block crawl access for entire sections or specific bot categories? Use robots.txt.
- Want to remove individual HTML pages from search results? Use meta robots with
noindex. - Want to control indexing of non-HTML files? Use X-Robots-Tag.
- Want to control AI crawler access? Start with robots.txt and supplement with specific headers where supported.
Each tool solves a different problem. Using them in combination — not as interchangeable alternatives — gives you precise control over how every type of bot interacts with every type of content on your site.