What Is Robots.txt?
Robots.txt is a text file that tells search engine crawlers which pages or sections of your site they can or cannot access. It's located at the root of your domain (example.com/robots.txt).
Basic Syntax
User-Agent
Specifies which crawler the rules apply to:
- User-agent: * (all crawlers)
- User-agent: Googlebot (Google only)
- User-agent: Bingbot (Bing only)
Disallow
Blocks access to specified paths:
- Disallow: /private/
- Disallow: /admin/
- Disallow: *.pdf
Allow
Explicitly allows access (useful with Disallow):
- Allow: /public/
Common Use Cases
Block Specific Directories
User-agent: * Disallow: /admin/ Disallow: /private/
Block Specific File Types
User-agent: * Disallow: /*.pdf$
Block Search Results Pages
User-agent: * Disallow: /search
Important Considerations
Robots.txt Doesn't Block Indexing
If a page is linked elsewhere, it may still appear in search results. Use noindex meta tags for true blocking.
Don't Block CSS/JS
Google needs to render pages—blocking these resources hurts SEO.
Be Careful with Disallow
Blocking the wrong pages can hurt your SEO significantly.
Testing Your Robots.txt
- Google Search Console robots.txt Tester
- Bing Webmaster Tools
- Third-party testing tools
Including Your Sitemap
Add your sitemap location at the end:
Sitemap: https://example.com/sitemap.xml