How to audit your robots.txt in 5 minutes

Most site owners have no idea what their robots.txt currently says. They set it up once — or never touched it at all — and assume it is working correctly. A five-minute audit can reveal whether your file is helping your SEO, doing nothing useful, or actively causing damage.

Here is the checklist.

Step 1: verify the file exists and is served correctly (30 seconds)

Open your browser and navigate to yourdomain.com/robots.txt. You should see a plain text file with recognizable robots.txt directives: User-agent, Disallow, Allow, Sitemap.

If you see a 404 error, your site has no robots.txt. This means every crawler has full access to everything, which may or may not be what you want.

If you see your homepage HTML instead of a text file, WordPress is not generating a virtual robots.txt, or a plugin or server configuration is intercepting the request. This needs investigation.

If you see the file but it contains only the WordPress default (User-agent: * and Disallow: /wp-admin/), your crawl policy is effectively nonexistent.

Step 2: check for accidental blocks on important content (60 seconds)

Read through every Disallow rule and ask: does this pattern match any page I want search engines to index?

Common accidents include:

A Disallow for /category/ that removes all category archives from search. A Disallow for /page/ that matches paginated archives but also matches legitimate pages with "page" in the URL. A Disallow for /wp-content/ that blocks CSS and JavaScript rendering. A trailing slash mismatch: Disallow: /blog blocks /blog, /blog/, and /blog/anything, while Disallow: /blog/ only blocks paths under /blog/.

If you are unsure whether a rule is blocking something important, use Google's robots.txt tester in Search Console or simply search site:yourdomain.com inurl:pattern to see what pages match the pattern.

Step 3: check for AI crawler rules (30 seconds)

Search the file for any of these user-agent strings: GPTBot, ClaudeBot, CCBot, Google-Extended, Bytespider, Applebot-Extended, PerplexityBot, FacebookBot, Amazonbot.

If none of these appear, your robots.txt makes no distinction between search engine crawlers and AI crawlers. Every AI bot has the same access as Googlebot. This is a deliberate choice if you intend it, but an oversight if you do not.

If some appear but not others, check whether the missing ones matter to your site. New AI crawlers appear regularly, and a file last updated in 2024 is likely missing agents that are active in 2026.

Step 4: check the sitemap directive (15 seconds)

Look for a line starting with Sitemap: followed by a URL. If it is missing, add one. If it is present, verify that the URL actually works by opening it in your browser. A sitemap directive pointing to a 404 is worse than no directive at all, because it explicitly sends crawlers to a dead end.

If your SEO plugin generates a sitemap index (like /sitemap_index.xml), the directive should point to the index, not to individual child sitemaps.

Step 5: check for contradictions (60 seconds)

Three common contradictions to look for:

Disallowed URLs that appear in your sitemap. Open your sitemap and search for any path that is also disallowed in robots.txt. If you find matches, you are simultaneously inviting and blocking crawlers from the same content.

A Disallow: / rule under User-agent: * that blocks everything. This is occasionally set during development and forgotten. It removes your entire site from all search engine crawl access.

Multiple rules for the same user-agent that conflict with each other. An Allow for /blog/ combined with a Disallow for /blog/category/ is fine (the more specific rule wins). But an Allow for / combined with a Disallow for / on the same agent is confusing and the result depends on which crawler interprets it.

Step 6: check freshness (15 seconds)

When was your robots.txt last updated? If your site has changed significantly — new content types, new plugins, a WooCommerce store added, a migration from HTTP to HTTPS — your robots.txt should reflect those changes.

A robots.txt written for a 50-page brochure site does not serve a 5 000-page content site. A robots.txt that predates the AI crawler era does not address GPTBot, ClaudeBot, or any of the agents that emerged since 2023.

What to do with the results

If your audit revealed problems, you have two paths. You can manually edit the file and redeploy, which requires understanding robots.txt syntax and testing each change. Or you can use a tool like Better Robots.txt that generates the file from a guided configuration, with a preview step that shows exactly what will change before it goes live.

Either way, the five minutes you spent reading your current file are the most valuable five minutes in your technical SEO maintenance cycle. A misconfigured robots.txt can silently damage your indexing, your crawl efficiency, and your AI governance posture — and the only way to catch it is to actually look.

How to audit your robots.txt in 5 minutes ​

Step 1: verify the file exists and is served correctly (30 seconds) ​

Step 2: check for accidental blocks on important content (60 seconds) ​

Step 3: check for AI crawler rules (30 seconds) ​

Step 4: check the sitemap directive (15 seconds) ​

Step 5: check for contradictions (60 seconds) ​

Step 6: check freshness (15 seconds) ​

What to do with the results ​