Explore Promptwatch, track 10 prompts for free
Promptwatch Logo

Robots.txt

Root directory file instructing search engine and AI crawlers which pages to crawl or avoid—now critical for managing GPTBot, PerplexityBot, and ClaudeBot.
Updated May 6, 2026
SEO

Definition

Robots.txt is a text file placed in a website's root directory that provides crawling instructions to web robots (bots and crawlers) about which pages or sections of the site should or should not be crawled. It follows the Robots Exclusion Standard and serves as the first communication between your website and any crawler—including the AI crawlers that now dominate many sites' traffic.

In 2026, robots.txt management has become a strategic AI visibility decision. AI crawlers—GPTBot (OpenAI), PerplexityBot, ClaudeBot (Anthropic), Google-Extended, and others—now account for over 95% of crawler traffic on many websites. Your robots.txt configuration directly determines whether these AI systems can access and potentially cite your content. Blocking AI crawlers means your content won't appear in ChatGPT, Perplexity, or Claude responses.

Key robots.txt directives include User-agent (which crawler the rules apply to), Disallow (paths to block), Allow (exceptions within blocked paths), Sitemap (location of XML sitemaps), and Crawl-delay (request pacing). You can set rules for specific AI crawlers independently—for example, allowing GPTBot while restricting other bots, or granting all AI crawlers access to your blog but blocking them from gated content.

Robots.txt works alongside the newer llms.txt standard, which serves as an AI-specific complement. While robots.txt tells crawlers what not to access, llms.txt proactively guides AI systems to your most valuable, citation-worthy content.

Important limitations: robots.txt only controls crawling, not indexing. Pages blocked by robots.txt can still appear in search results if linked from other sites (use noindex meta tags to prevent indexing). Robots.txt is a voluntary standard—malicious bots may ignore it. Never use robots.txt to hide sensitive data; use proper authentication instead.

Best practices: keep rules simple and readable, don't block CSS/JavaScript needed for rendering, reference your XML sitemaps, test changes before deployment using Google's robots.txt tester, and regularly review your AI crawler rules as new bots emerge and your content strategy evolves.

Current relevance: Robots.txt still matters for traditional rankings, but it also shapes whether AI answer engines can discover, trust, and cite a page. Strong implementation supports crawlability, passage extraction, structured understanding, and freshness signals across Google, Bing, ChatGPT, Perplexity, and agentic browsing tools.

Examples of Robots.txt

  • A publisher allows GPTBot and PerplexityBot access to their articles but blocks them from paywalled premium content, balancing AI visibility with content monetization
  • An e-commerce site blocks AI crawlers from checkout, account, and filtered product pages while allowing access to product pages and buying guides—directing AI citation toward valuable content
  • A media company reviews their robots.txt and discovers they accidentally blocked ClaudeBot, explaining why their content never appears in Claude responses—fixing it restores AI visibility within weeks
  • A SaaS company creates user-agent-specific rules allowing all AI crawlers to access their documentation and blog while blocking admin and staging directories
  • An SEO team reviews robots.txt alongside AI Overview citations, Bing/Copilot visibility, sitemap freshness, structured data validation, and AI crawler access before updating priority pages.

Share this article

Terms related to Robots.txt

AI Web Crawlers

Bots deployed by AI companies to fetch web content for training and retrieval—comprising 95%+ of tracked crawler traffic, led by GPTBot and PerplexityBot.

AI

LLMs.txt

LLMs.txt is a proposed specification for controlling how AI crawlers and language models access website content, functioning as a robots.txt equivalent specifically designed for LLM interactions.

GEO

Crawl Budget

The number of pages search engine and AI bots will crawl within a timeframe—critical for large sites to ensure important content gets discovered and indexed.

SEO

XML Sitemaps

Structured files listing website URLs with metadata to guide search engine and AI crawler discovery, crawling priority, and content freshness.

SEO

Crawling and Indexing

How search engines and AI crawlers discover, analyze, and index web content—including GPTBot, ClaudeBot, and the emerging llms.txt standard for AI access.

SEO

AI Indexing

How AI systems discover, process, and store web content for generating responses—distinct from traditional search indexing and critical for GEO.

AI

OpenAI Crawlers

OpenAI crawlers such as GPTBot, OAI-SearchBot, and ChatGPT-User have different purposes for training, ChatGPT search, and user-triggered browsing.

AI

IndexNow

IndexNow is a protocol for notifying participating search engines about changed URLs, improving freshness for search and AI-grounded answers.

SEO

TDM Rights Reservation

TDM rights reservation is the use of legal and technical notices to reserve rights around text and data mining by AI systems.

AI

AI Crawler Logs

AI crawler logs are server log records showing how AI bots, retrieval agents, and user-triggered AI browsers access a site.

Analytics

Frequently Asked Questions about Robots.txt

Learn about AI visibility monitoring and how Promptwatch helps your brand succeed in AI search.

For most businesses seeking AI visibility, allow AI crawlers (GPTBot, PerplexityBot, ClaudeBot, Google-Extended) access to your public, citation-worthy content. Block them from private areas, gated content, and low-value pages. Blocking all AI crawlers means your content won't appear in AI responses—a significant visibility loss as AI search captures 12–15% market share.

Be the brand AI recommends

Monitor your brand's visibility across ChatGPT, Claude, Perplexity, and Gemini. Get actionable insights and create content that gets cited by AI search engines.

Promptwatch Dashboard