Skip to content

Robots.txt for SaaS and web applications: protecting dashboards and API endpoints

SaaS products and web applications have a different crawl surface than content-driven websites. The marketing site — landing pages, pricing, documentation, blog — should be crawled and indexed. But the application itself — user dashboards, admin panels, API endpoints, OAuth flows, webhooks — should be invisible to crawlers entirely.

The challenge is that both live on the same domain. A single robots.txt must express a policy that is open for the public-facing content and restrictive for the application internals.

What to block

SaaS sites typically need to block several categories of application paths:

Authentication endpoints. Login pages, registration flows, password reset forms, and OAuth callback URLs are functional endpoints, not content pages. They should never appear in search results and should not consume crawl budget.

User dashboards and account areas. Paths like /app/, /dashboard/, /account/, /settings/, or /admin/ serve the logged-in application experience. Even though these paths typically require authentication to render meaningful content, crawlers may still attempt to fetch them and receive login pages or error responses.

API endpoints. REST APIs at /api/, GraphQL endpoints, webhook receivers, and other programmatic interfaces are not web pages. Crawling them wastes resources and can produce unintended side effects if the endpoints modify state.

Internal tooling paths. Health check endpoints (/healthz), metrics endpoints (/metrics), internal admin tools, and development-only paths should be blocked.

Session and state parameters. URLs containing session tokens, CSRF tokens, or user-specific state parameters create an infinite crawl space that no robot should enter.

What to keep open

The marketing and documentation surface should remain fully crawlable:

The landing page, features page, pricing page, and other conversion-oriented content are the primary SEO surfaces. These should never be blocked.

Documentation pages, FAQs, and knowledge base articles drive organic traffic and should be indexed.

Blog content and use cases provide the informational layer that captures top-of-funnel search traffic.

API documentation (as distinct from API endpoints) is content that developers search for and should be indexed.

The pattern

The practical pattern for most SaaS robots.txt files is a whitelist approach inside a default-open policy:

Block specific application paths explicitly (Disallow: /app/, Disallow: /api/, Disallow: /dashboard/). Keep the marketing site open by not blocking its paths. Add AI crawler rules for the categories that matter to your content strategy.

This differs from the approach on WordPress content sites, where the goal is to clean up low-value content paths. On SaaS sites, the goal is to wall off the application while keeping the marketing surface wide open.

WordPress-based SaaS

Some SaaS products use WordPress for their public-facing site while running the application on a separate stack or subdomain. In this case, the WordPress robots.txt only needs to address the marketing site. If the application runs on the same domain under a prefix like /app/, Better Robots.txt can handle both the WordPress-standard protections and the application-specific blocks through the advanced settings.