Skip to main content

← Glossary·SEO

robots.txt

The file at /robots.txt that tells crawlers which URLs they may and may not request. Convention, not enforcement.

`robots.txt` is a plain-text file at the root of the domain that lists which URL paths well-behaved crawlers should avoid. It’s an opt-in convention — search engines and most legitimate bots respect it; malicious scrapers often don’t. Each rule pairs a `User-agent` (which crawler the rule applies to) with `Allow` or `Disallow` directives.

The usual minimum is to disallow auth-gated paths (`/admin`, `/dashboard`, `/account`), API routes, and any duplicate or low-value URLs. The file should also list a `Sitemap:` directive pointing at the XML sitemap so crawlers find it without having to guess.

A common mistake is using robots.txt to "hide" sensitive URLs. It does the opposite — the file is publicly readable, so anyone can see the list of paths you’d rather they didn’t visit. For real privacy, require authentication.

Want this checked on your own site? The audit tool scores it in 30 seconds, free. Or browse the rest of the glossary.