The State of AI Search — March 2026 →
Promptwatch Logo

LLMs.txt

LLMs.txt is a proposed specification for controlling how AI crawlers and language models access website content, functioning as a robots.txt equivalent specifically designed for LLM interactions.

Updated March 15, 2026
GEO

Definition

LLMs.txt is a specification proposed by Jeremy Howard of Answer.AI in late 2024 that gives website owners granular control over how large language models and AI crawlers interact with their content. Just as robots.txt became the standard for communicating with traditional search engine crawlers, LLMs.txt is emerging as the equivalent standard for the age of AI-powered search and content consumption.

The specification addresses a fundamental gap in the web's infrastructure. While robots.txt tells search engine bots which pages they can and cannot crawl, it was never designed to handle the nuances of AI model training, retrieval-augmented generation, or AI-powered search. LLMs.txt fills this gap by providing directives specifically tailored to how language models discover, process, and attribute content.

An LLMs.txt file is placed at the root of a domain (e.g., example.com/llms.txt) and uses a key-value syntax with several directives:

User-LLM: Specifies which AI models or providers the rules apply to (e.g., GPT, Claude, Gemini, or a wildcard for all) Allow: Explicitly permits AI access to specific content paths Disallow: Blocks AI models from accessing certain content Attribution: Defines how the site expects to be credited when its content is used in AI responses License: Specifies the licensing terms under which content may be used by AI systems

The specification serves multiple stakeholders. Publishers gain fine-grained control over which content AI systems can access, going beyond the binary allow/block of robots.txt. AI companies get a clear, machine-readable signal about content permissions, reducing legal ambiguity. Users benefit from AI systems that respect creator preferences, leading to more trustworthy and properly attributed AI responses.

From a GEO perspective, LLMs.txt represents a strategic opportunity. By explicitly allowing AI access to key content while setting clear attribution requirements, publishers can increase their visibility in AI-generated responses while maintaining control over their intellectual property. A well-configured LLMs.txt file signals to AI crawlers that your content is available, authoritative, and should be cited.

Adoption has grown steadily since the proposal. By early 2026, thousands of websites have implemented LLMs.txt files, and several major AI providers have begun respecting the directives. The specification complements rather than replaces robots.txt—websites typically maintain both files, with robots.txt handling traditional search engine crawlers and LLMs.txt managing AI-specific access.

The relationship between LLMs.txt and its companion specification, LLMs-full.txt, is also important. While LLMs.txt provides access control directives, LLMs-full.txt offers a complete Markdown rendering of site content optimized for AI consumption. Together, they form a comprehensive framework for managing AI interactions with web content.

Implementation best practices include being specific about which content to allow rather than using broad wildcards, setting clear attribution requirements that AI systems can follow programmatically, regularly reviewing and updating directives as AI capabilities evolve, and monitoring AI crawler logs to verify compliance with your LLMs.txt directives.

As AI search becomes an increasingly significant traffic and visibility channel, LLMs.txt is evolving from a nice-to-have into a core component of technical GEO strategy. Organizations that proactively configure their LLMs.txt files position themselves to maximize AI visibility while protecting their content rights.

Examples of LLMs.txt

  • A major news publisher adds an LLMs.txt file that allows AI access to all article content but requires attribution with the journalist's name and publication date, resulting in consistently cited references in ChatGPT and Perplexity responses
  • An e-commerce site configures LLMs.txt to allow AI crawlers to access product descriptions and reviews but blocks access to pricing pages, ensuring product discovery through AI while preventing price comparison scraping
  • A SaaS company uses LLMs.txt to explicitly allow access to its documentation and blog while disallowing access to gated content, increasing the frequency of AI-generated recommendations that link to their free resources
  • A medical information site sets User-LLM directives to allow only specific AI providers that have agreed to proper medical disclaimer attribution, ensuring health content is presented with appropriate context
  • A university research repository implements LLMs.txt with Creative Commons licensing directives, making it clear that AI systems can use and cite their published papers under specific attribution terms

Share this article

Frequently Asked Questions about LLMs.txt

Learn about AI visibility monitoring and how Promptwatch helps your brand succeed in AI search.

Robots.txt was designed for traditional search engine crawlers and offers basic allow/disallow directives for page-level access. LLMs.txt is purpose-built for AI language models and includes additional directives like Attribution, License, and User-LLM that address AI-specific concerns such as content attribution in generated responses, licensing terms for AI training data, and per-model access control. The two files complement each other—robots.txt manages traditional crawlers while LLMs.txt handles AI interactions.

Be the brand AI recommends

Monitor your brand's visibility across ChatGPT, Claude, Perplexity, and Gemini. Get actionable insights and create content that gets cited by AI search engines.

Promptwatch Dashboard