Skip to main contentSkip to content

Bing, noarchive, nocache, and Copilot: how this differs from robots.txt

The Bing ecosystem is one of the best reminders that not every AI-control decision is a crawler-token decision.

Sometimes the correct layer is not a separate User-agent block in robots.txt.

Sometimes the correct layer is a page-level meta instruction.

That is exactly what Microsoft documented when it extended noarchive and nocache for Bing Chat and, more broadly, the Copilot-style answer layer.

The short version

Here is the simplest distinction.

ControlMain roleWhat it helps answer
robots.txtCrawl accessShould Bingbot or another crawler fetch this URL at all?
nocacheLimited answer and training useCan Bing use only the URL, title, and snippet rather than the full content?
noarchiveStronger exclusion from AI answer useShould this content stay out of Bing Chat or Copilot answers and training use while still remaining in search results?

That difference is why Robots.txt vs meta robots vs x-robots-tag remains such a foundational article.

What Microsoft documented

Microsoft’s public guidance for Bing Chat introduced a practical extension of standard meta controls.

It explained that publishers could use existing controls to decide how content in the Bing Index should be used in Bing Chat and in training Microsoft’s generative AI foundation models.

The key cases were these.

No special tag

If a page does not use nocache or noarchive, Microsoft says the content may be included in Bing Chat answers and may also be used in training foundation models.

nocache

If a page uses nocache, Microsoft says the content may still be included in Bing Chat answers, but Bing will only display the URL, title, and snippet in the answer.

It also says that for content in the Bing Index labeled nocache, only the URL, title, and snippet may be used in training Microsoft’s generative AI foundation models.

noarchive

If a page uses noarchive, Microsoft says the content will not be included in Bing Chat answers, will not be linked in those answers, and will not be used in training Microsoft’s generative AI foundation models.

Both together

Microsoft says that if both nocache and noarchive are present, Bing treats the page as nocache.

That is an operational detail worth knowing because it is not what many teams intuitively expect.

Why this is not the same thing as robots.txt

The control question here is not only:

"May the crawler fetch this URL?"

It is also:

"How may content already in the Bing Index be reused in answer and training contexts?"

That is a different problem.

robots.txt is still relevant when the real issue is crawl access for bingbot or another crawler.

But Bing’s own AI-related documentation makes it clear that some downstream answer and training choices are governed with page-level directives such as noarchive and nocache.

This is exactly why a serious machine-access policy cannot flatten every decision into a single file.

What this means in practice

Use the following decision path.

Goal: control crawl access

Use robots.txt.

That is still the correct layer when the question is whether Bing’s crawler should fetch the URL.

Goal: remain in search results while limiting AI answer use

Use nocache or noarchive, depending on how restrictive you want to be.

Microsoft explicitly says that content tagged nocache or noarchive can still appear in Bing search results.

That is one of the most important practical details.

Goal: stay linkable in AI answers but restrict full-content reuse

nocache is the more limited route.

It allows Bing to use the URL, title, and snippet instead of full content.

Goal: keep the page out of Copilot-style answers entirely

noarchive is the stronger exclusion surface.

Three common mistakes

1. Expecting robots.txt alone to express Bing answer-use policy

That is too narrow.

Bing explicitly documented noarchive and nocache as meaningful controls for Bing Chat and related AI use.

2. Assuming noarchive removes the page from Bing search results

Microsoft says it does not.

The content can still appear in search results.

3. Publishing both nocache and noarchive without understanding precedence

Microsoft says that if both are present, Bing treats the content as nocache.

That is the kind of nuance teams need to know before they assume they have chosen the stricter control.

Where Better Robots.txt fits

Better Robots.txt does not turn page-level Bing meta controls into robots.txt directives.

That would be the wrong layer.

Where it helps is by making the layer distinction explicit:

  • crawl access belongs to robots.txt;
  • some downstream answer-use decisions belong to meta controls;
  • other machine-governance questions may belong to usage signals, llms.txt, or infrastructure.

That clarity is valuable in itself because it prevents category mistakes.

The correct mental model

The safest Bing mental model is this:

  • robots.txt = crawl access
  • nocache = limited answer and training use
  • noarchive = stronger exclusion from Copilot-style answer use and training use
  • Bing search visibility can still remain even when those meta controls are used

The less you mix those layers together, the less likely you are to publish contradictory policy.