Definition
Robots.txt is a text file placed in a website's root directory that provides crawling instructions to web robots (bots and crawlers) about which pages or sections of the site should or should not be crawled. It follows the Robots Exclusion Standard and serves as the first communication between your website and any crawler—including the AI crawlers that now dominate many sites' traffic.
In 2026, robots.txt management has become a strategic AI visibility decision. AI crawlers—GPTBot (OpenAI), PerplexityBot, ClaudeBot (Anthropic), Google-Extended, and others—now account for over 95% of crawler traffic on many websites. Your robots.txt configuration directly determines whether these AI systems can access and potentially cite your content. Blocking AI crawlers means your content won't appear in ChatGPT, Perplexity, or Claude responses.
Key robots.txt directives include User-agent (which crawler the rules apply to), Disallow (paths to block), Allow (exceptions within blocked paths), Sitemap (location of XML sitemaps), and Crawl-delay (request pacing). You can set rules for specific AI crawlers independently—for example, allowing GPTBot while restricting other bots, or granting all AI crawlers access to your blog but blocking them from gated content.
Robots.txt works alongside the newer llms.txt standard, which serves as an AI-specific complement. While robots.txt tells crawlers what not to access, llms.txt proactively guides AI systems to your most valuable, citation-worthy content.
Important limitations: robots.txt only controls crawling, not indexing. Pages blocked by robots.txt can still appear in search results if linked from other sites (use noindex meta tags to prevent indexing). Robots.txt is a voluntary standard—malicious bots may ignore it. Never use robots.txt to hide sensitive data; use proper authentication instead.
Best practices: keep rules simple and readable, don't block CSS/JavaScript needed for rendering, reference your XML sitemaps, test changes before deployment using Google's robots.txt tester, and regularly review your AI crawler rules as new bots emerge and your content strategy evolves.
Examples of Robots.txt
- A publisher allows GPTBot and PerplexityBot access to their articles but blocks them from paywalled premium content, balancing AI visibility with content monetization
- An e-commerce site blocks AI crawlers from checkout, account, and filtered product pages while allowing access to product pages and buying guides—directing AI citation toward valuable content
- A media company reviews their robots.txt and discovers they accidentally blocked ClaudeBot, explaining why their content never appears in Claude responses—fixing it restores AI visibility within weeks
- A SaaS company creates user-agent-specific rules allowing all AI crawlers to access their documentation and blog while blocking admin and staging directories
