robots.txt

The file at /robots.txt that tells crawlers which URLs they may and may not request. Convention, not enforcement.

`robots.txt` is a plain-text file at the root of the domain that lists which URL paths well-behaved crawlers should avoid. It’s an opt-in convention — search engines and most legitimate bots respect it; malicious scrapers often don’t. Each rule pairs a `User-agent` (which crawler the rule applies to) with `Allow` or `Disallow` directives.

The usual minimum is to disallow auth-gated paths (`/admin`, `/dashboard`, `/account`), API routes, and any duplicate or low-value URLs. The file should also list a `Sitemap:` directive pointing at the XML sitemap so crawlers find it without having to guess.

A common mistake is using robots.txt to "hide" sensitive URLs. It does the opposite — the file is publicly readable, so anyone can see the list of paths you’d rather they didn’t visit. For real privacy, require authentication.