Skip to content

Robots.txt and JavaScript rendering: why SPA sites have crawl problems

Single-page applications, JavaScript-rendered sites, and headless WordPress setups share a common vulnerability: they depend on JavaScript execution to produce their content. If a crawler does not render JavaScript, it sees an empty page. And if robots.txt prevents a crawler from accessing the JavaScript files, even crawlers that do render JavaScript cannot do their job.

How modern crawlers handle JavaScript

Googlebot renders JavaScript. It uses a headless Chromium browser to execute scripts and process the DOM, then indexes the rendered content. This means that content generated by JavaScript frameworks — React, Vue, Next.js, Nuxt — is generally indexable by Google, provided the rendering completes within a reasonable time.

However, Googlebot's rendering has a key limitation: it operates in two waves. In the first wave, Google fetches the raw HTML. In the second wave, which may happen hours or days later, it renders the JavaScript. This delay means that dynamically generated content is indexed more slowly than static HTML content.

Most AI crawlers do not render JavaScript at all. GPTBot, ClaudeBot, CCBot, and others typically fetch the raw HTML response without executing scripts. If your content is generated client-side, these crawlers see an empty or near-empty page. This is why the robots.txt remains the most reliable control layer for AI bots: it operates before content is fetched, independent of rendering capability.

The CSS and JavaScript blocking trap

This is one of the most common robots.txt mistakes and it hits JavaScript-dependent sites hardest. If your robots.txt blocks paths like /wp-content/themes/, /wp-includes/, or /static/js/, you prevent Googlebot from accessing the script and stylesheet files it needs to render your pages.

The result: Googlebot fetches your HTML, attempts to render it, fails because the required scripts are blocked, and indexes the page based on whatever static HTML exists — which for a JavaScript-rendered site may be nothing meaningful.

Google Search Console's URL Inspection tool shows this clearly. The "rendered HTML" view shows what Googlebot actually saw after rendering. If scripts are blocked, the rendered output is missing the content that depends on those scripts.

What to allow in robots.txt for JavaScript sites

For any site that depends on JavaScript rendering, the robots.txt must explicitly allow access to all script and stylesheet resources. The safest approach is:

Do not block any path under /wp-content/, /wp-includes/, /static/, /assets/, or wherever your theme and plugin assets are served. Googlebot needs access to these files to render pages correctly.

If you must restrict access to specific asset directories, use precise paths rather than broad patterns. Block /wp-content/plugins/specific-plugin/admin/ rather than /wp-content/plugins/.

Test the result using Google Search Console's URL Inspection tool. Compare the rendered HTML against the live page. If they match, your robots.txt is not interfering with rendering.

The meta robots interaction

On JavaScript-rendered sites, the interaction between robots.txt and meta robots becomes more complex. If your noindex directive is injected by JavaScript rather than present in the static HTML, crawlers that do not render JavaScript will never see it.

For AI crawlers specifically, this means a JavaScript-injected noindex tag has no effect. The crawler fetches the raw HTML, does not execute JavaScript, and never encounters the directive. Only robots.txt — which is checked before any HTML is fetched — works reliably for crawlers that do not render JavaScript.

Practical recommendations

For JavaScript-heavy WordPress sites, the robots.txt strategy should account for rendering dependencies. Better Robots.txt handles this through its resource and asset settings, which ensure that script and stylesheet paths remain accessible while still blocking genuinely low-value endpoints.

The principle is: block endpoints, not assets. Admin pages, search results, and cart paths should be blocked. Script bundles, stylesheets, and font files should remain open.