Robots.txt Guide: Controlling Search Engine Crawlers

Learn how to use robots.txt effectively to control how search engines crawl your website.

Robots.txt Guide: Controlling Search Engine Crawlers

What Is Robots.txt?

Robots.txt is a text file that tells search engine crawlers which pages or sections of your site they can or cannot access. It's located at the root of your domain (example.com/robots.txt).

Basic Syntax

User-Agent

Specifies which crawler the rules apply to:

  • User-agent: * (all crawlers)
  • User-agent: Googlebot (Google only)
  • User-agent: Bingbot (Bing only)

Disallow

Blocks access to specified paths:

  • Disallow: /private/
  • Disallow: /admin/
  • Disallow: *.pdf

Allow

Explicitly allows access (useful with Disallow):

  • Allow: /public/

Common Use Cases

Block Specific Directories

User-agent: *
Disallow: /admin/
Disallow: /private/

Block Specific File Types

User-agent: *
Disallow: /*.pdf$

Block Search Results Pages

User-agent: *
Disallow: /search

Important Considerations

Robots.txt Doesn't Block Indexing

If a page is linked elsewhere, it may still appear in search results. Use noindex meta tags for true blocking.

Don't Block CSS/JS

Google needs to render pages—blocking these resources hurts SEO.

Be Careful with Disallow

Blocking the wrong pages can hurt your SEO significantly.

Testing Your Robots.txt

  • Google Search Console robots.txt Tester
  • Bing Webmaster Tools
  • Third-party testing tools

Including Your Sitemap

Add your sitemap location at the end:

Sitemap: https://example.com/sitemap.xml
S

Written by SerpUp Admin

SEO expert and digital marketing specialist at SerpUp.

Ready to Grow Your Organic Traffic?

Get a free SEO audit and discover how we can help your business rank higher.