Skip to main contentSkip to content

Robots.txt for SaaS and web apps: protecting dashboards and API endpoints

SaaS products and web applications create a very specific crawl problem: the same domain often hosts two radically different surfaces.

On one side, you have the public marketing and documentation layer:

  • homepage;
  • features;
  • pricing;
  • blog;
  • help docs;
  • developer docs;
  • changelog pages.

Those surfaces should usually remain open because they are the discovery layer of the product.

On the other side, you have the application surface:

  • dashboards;
  • authenticated account areas;
  • app settings;
  • API endpoints;
  • webhook receivers;
  • OAuth flows;
  • private workspaces;
  • internal admin paths.

Those surfaces should usually not participate in web crawling at all.

That is why robots.txt for SaaS is not mainly a "block everything weird" exercise. It is a boundary design problem. You need a policy that stays open for public acquisition surfaces while staying restrictive for app paths that do not belong in search or in broad machine discovery.

The core distinction: public documentation vs application internals

The first thing to preserve is the difference between:

  • public surfaces that explain the product;
  • internal surfaces that only exist for logged-in workflows.

This sounds obvious, but it is where many SaaS sites go wrong. Teams often discover crawl issues only after account URLs, dashboards, query-heavy paths, or API endpoints start appearing in logs and coverage reports.

The right pattern is to keep the product layer open and make the application boundary explicit.

This is also why source precedence matters: the site’s public product role should remain primary, while governance and app-boundary rules narrow the internal crawl surface without collapsing everything into one broad block.

What usually belongs in the blocked application layer

A typical SaaS robots.txt strategy blocks several classes of paths.

1. Authenticated dashboards

Examples include:

  • /app/
  • /dashboard/
  • /workspace/
  • /account/
  • /settings/

These paths are not useful public documents. Even when they resolve to login redirects, they still create crawl waste and noisy bot traffic.

2. Authentication and session flows

Examples include:

  • /login/
  • /signin/
  • /signup/
  • /register/
  • /password-reset/
  • OAuth callback paths

These URLs are operational endpoints, not discovery content.

3. API and webhook endpoints

Examples include:

  • /api/
  • /graphql
  • /webhooks/
  • /integrations/callback/

These are particularly important because they may respond in unexpected ways to bot traffic, and they can create monitoring noise or rate-limit pressure.

4. Internal tooling and monitoring paths

Examples include:

  • /admin/
  • /internal/
  • /health/
  • /metrics/
  • /status/

Some of these should be protected at the infrastructure layer, but even when stronger protections exist, robots.txt still helps reduce low-value crawl attempts.

What usually belongs in the open public layer

The public layer should remain clearly crawlable, because it is the actual acquisition surface.

Typical open SaaS surfaces include:

  • home;
  • features;
  • pricing;
  • comparison pages;
  • product docs;
  • API docs for humans;
  • blog posts;
  • case studies;
  • changelog and release notes.

This is where a lot of SaaS teams make the opposite mistake: they over-block because they are thinking like app operators instead of site publishers. The result is an application that is protected, but a marketing surface that becomes less discoverable than it should be.

The classic SaaS robots.txt mistake

The classic mistake is a blunt block on a large prefix that accidentally includes useful documentation or public landing pages.

Examples:

  • a block on /docs/ because some documentation paths were noisy;
  • a block on /account/ when part of onboarding docs actually live below it;
  • a wildcard rule that captures both app URLs and public pricing or integration pages.

This is why a pattern library matters. A SaaS site should not be treated like a brochure site, but it should not be treated like a private app-only surface either.

The right answer is almost always path classification, not blanket fear.

Why JS-heavy SaaS sites are extra fragile

SaaS sites often rely on JavaScript frameworks, hybrid rendering, or app shells. That creates a second layer of risk.

If you block the wrong assets or app entrypoints, you may:

  • harm the rendering of public docs or marketing pages;
  • confuse crawler interpretation of canonical content;
  • create incomplete fetches for public pages that still matter.

This is why robots.txt and JavaScript rendering belongs in the same cluster as this article. You are not only deciding which paths should stay private. You are also deciding which public surfaces still need supporting resources to render and be interpreted correctly.

How Better Robots helps with SaaS patterns

Better Robots is useful for SaaS because the problem is not only "write a few disallow lines." The real problem is making sure that:

  • the public marketing layer stays open;
  • the app layer stays bounded;
  • AI governance decisions remain separate from product discovery decisions;
  • the final file can be reviewed safely before publishing.

In practice that means:

  • using a preset or pattern as the starting point;
  • then adapting the blocked prefixes for your app architecture;
  • then reviewing the result through the Review & Save step;
  • and only then publishing.

If your goal also includes AI controls, the AI & LLM Governance settings should be treated as a separate policy layer, not collapsed into the same broad app-path block.

A practical SaaS review checklist

Before publishing a SaaS robots.txt, ask:

  1. Which paths are public acquisition surfaces?
  2. Which paths are authenticated application surfaces?
  3. Which paths are API or webhook endpoints?
  4. Which public docs depend on JS assets or rendering paths that must stay accessible?
  5. Are any broad wildcard rules catching useful pages by accident?
  6. Have we separated app privacy concerns from AI training concerns?

If you cannot answer these cleanly, your file is probably too broad.

What robots.txt cannot do for SaaS security

It is important not to overstate robots.txt.

It can help reduce crawl waste and make intent clearer, but it does not:

  • authenticate users;
  • secure dashboards;
  • protect APIs from malicious traffic;
  • replace access controls;
  • replace network or infrastructure security.

That is why output constraints matter. A good governance answer must not upgrade robots.txt into a full security system. It is a crawl-governance signal, not a substitute for real access control.

FAQ

Should SaaS apps block login pages?

Usually yes, as public crawl targets. They are operational endpoints, not content pages worth indexing.

Should public docs for a SaaS stay open?

Usually yes. Docs, pricing, features, and blog pages are the public discovery layer of the product.

Does blocking /api/ solve API security?

No. It may reduce low-value crawl attempts, but real API protection must happen through authentication, authorization, rate limiting, and infrastructure controls.

Continue with Robots.txt and JavaScript rendering, The 5 most common robots.txt mistakes, Patterns, and Advanced settings.