Promptwatch Logo

Diffbot

Diffbot crawls and structures web pages into a knowledge graph that is sold for AI training, retrieval, and data enrichment.
DiffbotDiffbot
AI CrawlerAI Training

What is Diffbot?

Diffbot is Diffbot's AI crawler. Diffbot crawls and structures web pages into a knowledge graph that is sold for AI training, retrieval, and data enrichment.

Diffbot matters for AI visibility because the pages it collects can shape what large language models learn about your brand, products, and expertise. Allowing it can strengthen how accurately AI systems describe and recommend you, while disallowing it keeps your content out of training data. Either way, knowing Diffbot visits is the first step to managing how your brand shows up in AI search.

Tracking which AI crawlers and agents reach your site, and what they do once there, is the foundation of generative engine optimization. See our guides to AI crawlers and robots.txt to control automated access and protect your AI search visibility.

Want to see every AI bot hitting your site? Promptwatch turns your server and CDN logs into a live view of AI crawler and agent traffic, so you can watch ChatGPT, Claude, Perplexity, Gemini, and others crawl your pages and connect those visits to real citations and revenue. Learn more in AI crawler logs.

See every AI bot hitting your site

Promptwatch turns your server and CDN logs into a live view of AI crawler and agent traffic. Watch ChatGPT, Claude, Perplexity, Gemini, and more crawl your pages in real time, see exactly what they take, and connect every crawl to the citations and revenue it drives.

How to handle Diffbot

If you do not want Diffbot using your content for AI training, disallow it in robots.txt. If you are comfortable contributing to AI systems, leave it allowed.

To control Diffbot, add a rule for its user agent to your robots.txt:

User-agent: Diffbot
Disallow: /

Diffbot generally honors robots.txt directives.

Examples

  • A publisher reviews server logs, sees Diffbot requesting long-form articles, and decides whether to allow or disallow it in robots.txt.
  • A site owner adds a Disallow rule for Diffbot to keep premium guides out of AI training datasets.

Frequently asked questions about Diffbot

Learn about AI visibility monitoring and how Promptwatch helps your brand succeed in AI search.

Diffbot is operated by Diffbot. It functions as Diffbot's AI crawler.

Be the brand AI recommends

Monitor your brand's visibility across ChatGPT, Claude, Perplexity, and Gemini. Get actionable insights and create content that gets cited by AI search engines.

Promptwatch Dashboard