Skip to main content
Crawl control

Robots.txt Generator

Generate robots.txt with crawl rules, sitemap pointers, and AI bot blocking presets.

Quick Presets:

Rules

robots.txt

User-agent: *
Allow: /
Disallow: /account/
Disallow: /checkout/

Sitemap: https://www.example.com/sitemap.xml

Mastering Robots.txt for Technical SEO

The robots.txt file is the gatekeeper of your website. Utilizing the Robots Exclusion Protocol, it is the absolute first file any well-behaved web crawler (like Googlebot, Bingbot, or AhrefsBot) looks for when arriving at your domain. If you configure it incorrectly, you can accidentally block your entire site from appearing on Google.

When to Use Allow vs. Disallow

  • Disallow: Use this to prevent crawlers from accessing sensitive or useless directories. Common examples include /admin/, /checkout/, /cart/, internal search result pages (/?s=), or staging environments. This saves your "Crawl Budget" for important content.
  • Allow: Use this when you have disallowed a parent directory, but want to make a specific sub-directory crawlable. For example, you might Disallow: /assets/ but Allow: /assets/public-images/.

The Danger of robots.txt for Hiding Content

A common SEO mistake is using robots.txt to hide private pages (like a PDF or a secret landing page). Robots.txt is public. Anyone can view it by appending /robots.txt to your domain. Furthermore, if an external site links to your disallowed page, Google may still index the URL itself. If you need to keep a page out of search engines securely, use a <meta name="robots" content="noindex"> tag or password protection instead.

How to Use the Robots.txt Generator

Step-by-step guide

  1. Set User Agent

    Use * for all crawlers, or specify a particular bot like Googlebot, Bingbot, or GPTBot.

  2. Add Allow/Disallow Rules

    Add rules for paths you want to allow or block. Use /account/ or /checkout/ to block private sections.

  3. Configure Sitemap & Delay

    Enter your sitemap URL and optionally set a crawl delay to control how fast bots crawl your site.

  4. Copy or Download

    Copy the generated robots.txt or download the file. Upload it to the root of your website domain.

Frequently Asked Questions

About the Robots.txt Generator

A robots.txt file is a simple text file placed in your website's root directory that tells search engine crawlers (like Googlebot) which pages or files they can or cannot request from your site. It is the first thing a crawler checks before accessing your content.

No! A robots.txt file prevents crawling, but it does NOT guarantee a page won't be indexed. If other sites link to your disallowed page, Google might still index the URL (though it won't know the content). To prevent indexing, use a "noindex" meta tag instead.

The User-agent directive specifies which crawler the rules apply to. An asterisk (*) means the rules apply to all web crawlers. You can specify "Googlebot" to target only Google, or "Bingbot" for Bing.

Including the absolute URL to your XML sitemap in your robots.txt file is a best practice. It acts as a beacon, immediately showing any visiting crawler exactly where to find your site map of all important pages.

The crawl delay directive tells search engines to wait a certain number of seconds between requests. This is useful for large sites on slow servers to prevent the crawler from overloading the server. Note: Googlebot largely ignores the crawl-delay directive (they use Search Console for rate limiting), but other bots like Bingbot respect it.

It must be placed in the top-level root directory of your website domain. For example, it must be accessible at https://www.yourdomain.com/robots.txt. If you put it in a subdirectory, crawlers will not find it.

Related Workflows

Guides, tools, and template pages to continue the workflow