Back to Blog

What is robots.txt? Complete Guide + Free Generator

What is robots.txt? Complete Guide + Free Generator

robots.txt is a plain text file placed at the root of your website (yourdomain.com/robots.txt) that tells search engine crawlers — like Googlebot, Bingbot, and AI crawlers — which pages or sections they are and aren't allowed to access.

It's one of the first files any search engine crawler checks when it visits your website. Getting it wrong can accidentally block Google from indexing your entire site.

How Does robots.txt Work?

robots.txt uses a simple syntax with two main directives:

  • User-agent — specifies which crawler the rule applies to (* means all crawlers)
  • Disallow — the URL path that crawler cannot access
  • Allow — explicitly permits a path (overrides a broader Disallow)
  • Sitemap — tells crawlers where to find your XML sitemap

A basic robots.txt looks like this:

User-agent: *
Disallow: /admin/
Disallow: /checkout/
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

What Should You Block in robots.txt?

Block pages that should not appear in search results:

  • Admin areas/admin/, /wp-admin/, /dashboard/
  • Login/register pages/login, /register
  • Checkout and cart/cart/, /checkout/
  • Search result pages/search/ (creates infinite crawlable URLs)
  • Duplicate content — paginated pages after page 5, filter/sort URLs
  • Internal tools — staging environments, API endpoints, test pages

What Should You NOT Block in robots.txt?

These are the most common and damaging robots.txt mistakes:

❌ Never block your CSS and JavaScript files

Google needs to render your pages to evaluate them. If you block CSS/JS, Google can't see how your pages actually look — and may incorrectly penalize them for appearing broken or thin.

❌ Never block pages you want to rank

A page that's Disallowed in robots.txt can still appear in search results (with no snippet), but it cannot be properly indexed or ranked. If you want a page to rank, it must be crawlable.

❌ Don't confuse robots.txt with noindex

robots.txt controls crawling. The noindex meta tag controls indexing. If you Disallow a page in robots.txt, Google can't read the noindex tag either — the page may still appear in search results as a "known URL" with no description.

robots.txt for AI Crawlers (2025)

In 2025, you should also configure your robots.txt for AI language model crawlers. These bots are used by ChatGPT, Claude, Perplexity, and others to scrape content for their training data and real-time search features.

# Allow all standard search engines
User-agent: *
Allow: /

# Block AI training crawlers (optional)
User-agent: GPTBot
Disallow: /

# Allow AI search features (recommended — this brings referral traffic)
User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Use the Optyxo Robots.txt Generator to create a correctly formatted robots.txt with AI crawler configurations in seconds — no syntax knowledge required.

How to Test Your robots.txt

After creating or modifying your robots.txt:

  1. Visit yourdomain.com/robots.txt in your browser — verify it loads correctly
  2. Use the Optyxo Robots.txt Tester — test specific URLs to see if they're allowed or blocked
  3. Check Google Search Console → Settings → robots.txt to see what Google reads
  4. Run a full Optyxo SEO audit — it checks robots.txt as part of Technical SEO and flags accidental blocks

robots.txt Syntax Rules

  • The file must be named exactly robots.txt (lowercase)
  • It must be at the root of your domain: yourdomain.com/robots.txt
  • One directive per line — no inline comments after directives
  • Paths are case-sensitive on Linux servers
  • Disallow: / blocks everything; Disallow: (empty) blocks nothing
  • More specific rules take precedence over general ones (in most crawlers)

Does robots.txt Affect SEO?

Directly: yes. If you accidentally Disallow important pages, they can't rank. Deliberately blocking thin content, duplicate URLs, and admin areas improves crawl efficiency — Googlebot can spend its crawl budget on your important pages instead of wasting it on 200 filter variations of the same product page.

Frequently Asked Questions

Is robots.txt required for SEO?

Not required, but strongly recommended. Without a robots.txt, search engines will crawl everything including admin areas, checkout pages, and other content you probably don't want indexed. Most crawlers also look for the Sitemap directive in robots.txt as a signal to find your XML sitemap.

Can robots.txt block a page from appearing in Google?

robots.txt blocks crawling, not indexing. Google may still know a blocked page exists (from backlinks or internal links) and show it in results as a URL-only listing with no description. To fully remove a page from Google, you need both Disallow in robots.txt AND a noindex tag — but note that Google can't read the noindex if the page is Disallowed. For complete removal, use the URL Removal Tool in Google Search Console.

How do I create a robots.txt file?

Use the free Optyxo Robots.txt Generator — select your CMS, choose which sections to block, configure AI crawler access, and download the correctly formatted file. Or create it manually as a plain text file (.txt) using the syntax rules above.

What happens if my robots.txt has errors?

Syntax errors in robots.txt cause unpredictable behavior — different crawlers interpret malformed files differently. Some may ignore the entire file, others may stop at the error. Always test your robots.txt with the Optyxo Robots.txt Tester after any changes.

Should I block AI crawlers in robots.txt?

It depends on your goals. Blocking AI training crawlers (GPTBot) prevents your content from being used in AI training datasets. However, blocking AI search crawlers (ChatGPT-User, PerplexityBot, ClaudeBot) means you won't appear in AI-powered search answers — missing a growing source of referral traffic. Most websites should allow AI search crawlers while blocking training-only bots.

Analyzing...
This may take a few seconds