The State of AI Search — March 2026 →
Promptwatch Logo

Robots.txt

Root directory file instructing search engine and AI crawlers which pages to crawl or avoid—now critical for managing GPTBot, PerplexityBot, and ClaudeBot.

Updated March 15, 2026
SEO

Definition

Robots.txt is a text file placed in a website's root directory that provides crawling instructions to web robots (bots and crawlers) about which pages or sections of the site should or should not be crawled. It follows the Robots Exclusion Standard and serves as the first communication between your website and any crawler—including the AI crawlers that now dominate many sites' traffic.

In 2026, robots.txt management has become a strategic AI visibility decision. AI crawlers—GPTBot (OpenAI), PerplexityBot, ClaudeBot (Anthropic), Google-Extended, and others—now account for over 95% of crawler traffic on many websites. Your robots.txt configuration directly determines whether these AI systems can access and potentially cite your content. Blocking AI crawlers means your content won't appear in ChatGPT, Perplexity, or Claude responses.

Key robots.txt directives include User-agent (which crawler the rules apply to), Disallow (paths to block), Allow (exceptions within blocked paths), Sitemap (location of XML sitemaps), and Crawl-delay (request pacing). You can set rules for specific AI crawlers independently—for example, allowing GPTBot while restricting other bots, or granting all AI crawlers access to your blog but blocking them from gated content.

Robots.txt works alongside the newer llms.txt standard, which serves as an AI-specific complement. While robots.txt tells crawlers what not to access, llms.txt proactively guides AI systems to your most valuable, citation-worthy content.

Important limitations: robots.txt only controls crawling, not indexing. Pages blocked by robots.txt can still appear in search results if linked from other sites (use noindex meta tags to prevent indexing). Robots.txt is a voluntary standard—malicious bots may ignore it. Never use robots.txt to hide sensitive data; use proper authentication instead.

Best practices: keep rules simple and readable, don't block CSS/JavaScript needed for rendering, reference your XML sitemaps, test changes before deployment using Google's robots.txt tester, and regularly review your AI crawler rules as new bots emerge and your content strategy evolves.

Examples of Robots.txt

  • A publisher allows GPTBot and PerplexityBot access to their articles but blocks them from paywalled premium content, balancing AI visibility with content monetization
  • An e-commerce site blocks AI crawlers from checkout, account, and filtered product pages while allowing access to product pages and buying guides—directing AI citation toward valuable content
  • A media company reviews their robots.txt and discovers they accidentally blocked ClaudeBot, explaining why their content never appears in Claude responses—fixing it restores AI visibility within weeks
  • A SaaS company creates user-agent-specific rules allowing all AI crawlers to access their documentation and blog while blocking admin and staging directories

Share this article

Frequently Asked Questions about Robots.txt

Learn about AI visibility monitoring and how Promptwatch helps your brand succeed in AI search.

For most businesses seeking AI visibility, allow AI crawlers (GPTBot, PerplexityBot, ClaudeBot, Google-Extended) access to your public, citation-worthy content. Block them from private areas, gated content, and low-value pages. Blocking all AI crawlers means your content won't appear in AI responses—a significant visibility loss as AI search captures 12–15% market share.

Be the brand AI recommends

Monitor your brand's visibility across ChatGPT, Claude, Perplexity, and Gemini. Get actionable insights and create content that gets cited by AI search engines.

Promptwatch Dashboard