Skip to content

Robots.txt for WooCommerce: blocking crawl waste without breaking your store

WooCommerce is the most popular e-commerce platform on WordPress, powering millions of online stores. It is also one of the most aggressive generators of low-value URLs. A store with 500 products can easily produce 50 000 crawlable URL variations through attribute filters, sorting options, cart states, and paginated archives.

Without a properly configured robots.txt, search engine crawlers spend the majority of their time fetching pages that have no business being in a search index.

The WooCommerce URL problem

A default WooCommerce installation creates several categories of URLs that should never be crawled:

Cart and checkout pages. Every state of the cart, every step of the checkout, and every variation of the account page generates a unique URL. These are transactional endpoints for logged-in users. No search engine should ever see them.

Faceted navigation and attribute filters. When a customer filters products by size, color, price range, or any other attribute, WooCommerce generates a parameterized URL. A store with 5 filter dimensions and 10 values each can produce hundreds of thousands of filter combinations, each one a thin page with a subset of the same products.

Sorting and ordering parameters. URLs with ?orderby=price or ?orderby=date create duplicates of every product archive page. The content is identical except for the sort order — pure crawl waste.

Add-to-cart URLs. WooCommerce generates ?add-to-cart= links that are functional endpoints, not content pages. Crawlers that follow these links consume resources without gaining anything useful.

Product variation URLs. For variable products, each combination of attributes can generate a distinct URL. A t-shirt available in 5 colors and 4 sizes produces 20 URL variations of essentially the same product.

What to block

A WooCommerce-aware robots.txt should address each category:

The cart, checkout, and account paths are the most straightforward. Disallow: /cart/, Disallow: /checkout/, and Disallow: /my-account/ remove transactional pages from the crawlable surface. These pages have no search value and should always be blocked.

Faceted navigation parameters require blocking the query patterns that generate filter URLs. The exact syntax depends on your theme and filter plugin, but common patterns include Disallow: /*?filter_ and Disallow: /*?pa_ for WooCommerce attribute filters.

Sorting parameters can be blocked with Disallow: /*?orderby=. This prevents crawlers from following sorting links on archive pages.

Add-to-cart endpoints should be blocked with Disallow: /*?add-to-cart= to prevent crawlers from triggering cart actions.

Feed URLs for product categories and tags generate thin duplicate content. Blocking /product-category/*/feed/ and /product-tag/*/feed/ removes them.

What to keep

The instinct to block aggressively is dangerous in e-commerce. Some URLs that look like crawl waste are actually important for SEO:

Product pages should always remain crawlable. They are the primary content of your store and the pages most likely to appear in search results.

Product category pages serve as landing pages for category-level searches. Blocking /product-category/ would remove your entire category structure from search.

The main shop page is your store's root archive. It should remain accessible.

Product image URLs should remain accessible if your images appear in Google Image Search, which can be a significant traffic source for visual products.

Paginated product archives (/page/2/, /page/3/) are a judgment call. For large catalogs, blocking pagination reduces crawl waste. For small catalogs, keeping it ensures all products are discoverable through browsing.

The structured data consideration

WooCommerce sites that use product structured data (Product schema, Review schema, Offer schema) depend on crawlers being able to access product pages to read the markup. Blocking any path that contains product structured data means Google cannot validate the schema, which removes your eligibility for rich results.

Before adding any disallow rule, verify that it does not affect pages with structured data that drives rich results in search.

Better Robots.txt and WooCommerce

The plugin detects WooCommerce automatically when it is active and adds a dedicated e-commerce optimization module to the settings. This module provides targeted controls for cart paths, checkout paths, faceted navigation, sorting parameters, and feed URLs — each with a clear explanation of what the rule does.

The Essential preset includes conservative WooCommerce protections. The Fortress preset is more aggressive, blocking additional parameter patterns and restricting bot categories that are particularly wasteful for e-commerce sites. The Custom mode lets you toggle each WooCommerce rule individually, with a preview of the generated robots.txt before anything changes.

The goal is not to block everything a crawler might touch. It is to block the specific URL patterns that generate crawl waste while preserving the pages that drive traffic and revenue.