The 5 most common robots.txt mistakes and how to fix them

A robots.txt file is deceptively simple. A few lines of plain text, no programming language, no compilation step. And yet the majority of WordPress sites ship a robots.txt that either does too little, does too much, or actively works against their own SEO goals.

After reviewing hundreds of robots.txt files across WordPress installations of all sizes, five mistakes appear again and again. Each one is easy to make and surprisingly costly to leave in place.

1. Using the default WordPress robots.txt without changes

WordPress generates a minimal robots.txt automatically. It typically contains two lines: a User-agent: * directive and a Disallow: /wp-admin/. That is better than nothing, but it is not a crawl policy. It is a placeholder.

The default file says nothing about AI crawlers, nothing about archive bots, nothing about SEO tool scrapers, and nothing about the dozens of internal WordPress paths that generate duplicate or low-value content. Leaving this default in place is like leaving your front door unlocked and assuming only friends will walk in.

The fix: Replace the default with a purposeful robots.txt that reflects what your site actually needs. Better Robots.txt presets exist specifically for this: they give you a working policy in seconds, not a blank starting point.

2. Blocking CSS and JavaScript files

This mistake peaked around 2015, but it still shows up in the wild. Some site owners add Disallow: /wp-content/themes/ or Disallow: /wp-includes/ thinking they are hiding internal resources from crawlers.

The problem is that Googlebot renders pages. It loads CSS and JavaScript to understand layout, content hierarchy, and user experience signals. When you block those files, Google cannot render your page properly. The result is degraded indexing and often lower rankings.

The fix: Never block CSS or JavaScript directories unless you have a specific, documented reason. If you are unsure whether a rule is blocking rendering, use Google Search Console's URL inspection tool to see what Google actually sees.

3. Disallowing pages you actually want indexed

This one sounds obvious, but it happens more often than you would expect. A common scenario: someone adds /category/ to their disallow list because they read a blog post about thin content. Six months later, they wonder why their category pages have vanished from search results.

Another frequent case: blocking /page/ to prevent paginated archives from being crawled, without realizing that some themes use /page/ in paths that matter.

The fix: Before adding any Disallow rule, check which URLs match the pattern. Use a crawl tool or simply search site:yourdomain.com inurl:/category/ to see what you would be removing. A Disallow is not a suggestion. It is an instruction that well-behaved crawlers will follow immediately.

4. Forgetting the sitemap directive

The Sitemap: directive in robots.txt tells crawlers where to find your XML sitemap. It is not technically a robots.txt rule — it is an extension that all major search engines support. And yet a surprising number of sites omit it entirely.

Without this line, crawlers still discover your sitemap through Search Console, Bing Webmaster Tools, or by guessing /sitemap.xml. But relying on indirect discovery is fragile. The Sitemap: directive is the only place in robots.txt where you actively point crawlers toward content instead of away from it.

The fix: Add a Sitemap: line at the bottom of your robots.txt pointing to your main sitemap URL. If your SEO plugin generates a sitemap index, point to the index, not to individual child sitemaps.

5. Treating all bots the same

This is the most consequential mistake in 2026 and the one least discussed in traditional SEO guides.

A single User-agent: * block applies the same rules to Googlebot, GPTBot, CCBot, Bytespider, AhrefsBot, and every other crawler that reads your file. But these bots have fundamentally different purposes. Google indexes your pages for search. GPTBot may use your content for training or retrieval. Archive bots preserve snapshots. SEO tools scrape competitive data. Treating them all identically means you cannot set different boundaries for different uses.

The fix: Use specific User-agent blocks for categories that matter to your site. At minimum, separate search engines, AI crawlers, and known bad bots into distinct groups with distinct rules. This is the core principle behind Better Robots.txt's preset system: each category of bot gets a policy that matches its behavior and your intent.

The pattern behind these mistakes

All five mistakes share a common root: they treat robots.txt as a set-and-forget file instead of an active policy document. Your site changes. The crawler ecosystem changes. The expectations around AI usage, archiving, and data extraction change. A robots.txt that was adequate two years ago may be actively harmful today.

The best practice is to review your robots.txt at least quarterly, ideally with a tool that shows you what each rule does before it goes live. That is exactly what the review step in Better Robots.txt is designed for: you see the final output, line by line, before anything changes on your site.

The 5 most common robots.txt mistakes and how to fix them ​

1. Using the default WordPress robots.txt without changes ​

2. Blocking CSS and JavaScript files ​

3. Disallowing pages you actually want indexed ​

4. Forgetting the sitemap directive ​

5. Treating all bots the same ​

The pattern behind these mistakes ​