Robots.txt Generator - Create Robot Rules
Generate robots.txt files for your website. Allow or block crawlers, set crawl delay, add sitemap URL. Visual builder.
Robot Rules
About robots.txt
robots.txt tells search engine crawlers which pages they can and cannot access. It is placed at the root of your website (e.g., https://example.com/robots.txt).
Related Tools
robots.txt: What It Does, What It Doesn't
A robots.txt file at your domain root tells well-behaved crawlers which paths they should and shouldn't index. It's been the standard for 30 years, but it's commonly misunderstood: it isn't a security boundary, it isn't enforced, and the "disallowed" pages still show up in search results in some forms. Used correctly, it's a useful tool for managing crawl budget and keeping low-value paths out of the index.
What robots.txt actually controls
- Crawling. Whether a compliant bot fetches a URL.
- Crawl rate. Some bots respect
Crawl-delay. - Sitemap discovery. The
Sitemap:directive tells crawlers where your sitemap is.
What it does NOT control:
- Indexing. Google can index a URL it never crawled (based on backlinks). Use
<meta name="robots" content="noindex">or X-Robots-Tag header for that. - Access. Bots can ignore robots.txt. Use auth/IP allowlist for sensitive endpoints.
- Removing existing pages from search. Disallow blocks future crawls; existing snippets persist until Google decides to drop them. Use noindex + a fresh crawl.
Common patterns
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /*?
Sitemap: https://example.com/sitemap.xml
This blocks admin and API paths plus any URL with a query string (often duplicate-content variants).
Pitfalls
- Disallow: /. Blocks your entire site from search. Easy to leave in after a staging-to-prod copy.
- Wildcards aren't standardized. Most major bots (Google, Bing) support
*and$. Some smaller crawlers don't. - Trailing slash matters.
Disallow: /adminblocks/admin/,/admin.html, and/admin-tool/. Add the slash if you mean only the directory. - Per-bot rules.
User-agent: Googlebotblocks must precede the catch-all in some implementations. Test with Search Console's robots tester. - Caching. Crawlers cache robots.txt for hours. Changes take a day to fully propagate.
- Listing private paths. Counter-intuitive —
Disallow: /secret-admin/tells the world that path exists. Use auth instead.
The right modern stance
Use robots.txt for crawl budget management on large sites (block faceted search URLs, infinite calendars, internal tools). Use noindex meta tags or X-Robots-Tag headers for things you actually don't want in search results. Use proper authentication for things that are private. Don't conflate the three.
For broader SEO context, see our developer tools roundup.
Frequently Asked Questions
Will Disallow: keep my page out of Google?
It blocks crawling, but Google can still list the URL based on external links — without a snippet. To keep a page entirely out of search, use noindex and let Google crawl it (or use authentication).
Do bots really obey robots.txt?
Major search engines and reputable services do. Scrapers, AI training crawlers, and malicious bots routinely ignore it. Treat it as a policy signal, not enforcement.
How do I block a specific bot?
Add a User-agent block for that bot name. Most won't respect it; for hostile crawlers, block at the firewall or WAF layer.
Where should the file live?
At the root of your domain — https://example.com/robots.txt. It does not work on subpaths.