Robots.txt Guide: Controlling Search Engine Crawlers

What Is Robots.txt?

Robots.txt is a text file that tells search engine crawlers which pages or sections of your site they can or cannot access. It's located at the root of your domain (example.com/robots.txt).

Basic Syntax

User-Agent

Specifies which crawler the rules apply to:

User-agent: * (all crawlers)
User-agent: Googlebot (Google only)
User-agent: Bingbot (Bing only)

Disallow

Blocks access to specified paths:

Disallow: /private/
Disallow: /admin/
Disallow: *.pdf

Allow

Explicitly allows access (useful with Disallow):

Allow: /public/

Common Use Cases

Block Specific Directories

User-agent: *
Disallow: /admin/
Disallow: /private/

Block Specific File Types

User-agent: *
Disallow: /*.pdf$

Block Search Results Pages

User-agent: *
Disallow: /search

Important Considerations

Robots.txt Doesn't Block Indexing

If a page is linked elsewhere, it may still appear in search results. Use noindex meta tags for true blocking.

Don't Block CSS/JS

Google needs to render pages—blocking these resources hurts SEO.

Be Careful with Disallow

Blocking the wrong pages can hurt your SEO significantly.

Testing Your Robots.txt

Google Search Console robots.txt Tester
Bing Webmaster Tools
Third-party testing tools

Including Your Sitemap

Add your sitemap location at the end:

Sitemap: https://example.com/sitemap.xml

Written by SerpUp Admin

SEO expert and digital marketing specialist at SerpUp.