Is BM25 still used now that we have embeddings?

Yes. BM25 remains a common first-stage candidate generator because it is fast, predictable, and excellent at exact matches of specific terms—product names, codes, versions—that dense vector search can miss. Most modern systems combine BM25 with vector retrieval rather than replacing it.

How does BM25 fit into a modern retrieval pipeline?

Typically as the lexical half of a hybrid search stack: BM25 supplies high-recall keyword matches, dense embedding retrieval adds semantic matches, the two result sets are fused, and a cross-encoder reranker orders the merged candidates before the top passages go to the language model.

Why does BM25 matter for GEO?

Because exact wording still influences retrieval. Content that uses consistent, precise terminology and exact entity names matches the lexical stage that BM25 powers, helping it enter the candidate set answer engines synthesize from—complementing the semantic match that embeddings provide.

BM25 - AI SEO & GEO Glossary

Definition

BM25 (Best Matching 25) is a ranking function from information retrieval that scores how relevant a document is to a query based on the query's terms. It improves on simple term-frequency counting by accounting for how often a term appears in a document (with diminishing returns), how rare the term is across the whole corpus (inverse document frequency), and document length, so that long documents are not unfairly favored. The result is a fast, robust lexical (keyword-based) relevance score.

Despite being decades old, BM25 remains a workhorse in 2026 AI search. It is widely used as the first-stage candidate generator in retrieval pipelines—including AI answer engines—precisely because it is cheap, predictable, and excellent at exact matching of specific terms like product names, error codes, version numbers, and rare keywords that vector search can miss. Bing Copilot, for example, reportedly uses BM25 over its web index as a primary candidate generator before reranking.

In practice BM25 rarely works alone. Modern systems combine it with dense embedding-based retrieval in a hybrid search stack and then apply a cross-encoder reranker for final precision. BM25 supplies high-recall lexical matches; vectors add semantic understanding; the reranker sorts the merged set.

For GEO, BM25's persistence is a reminder that exact wording still matters. Using consistent, precise terminology and named entities—not just semantically adjacent phrasing—helps your content match the lexical stage of retrieval and survive into the candidate set that answer engines synthesize from.

Examples of BM25

A search system uses BM25 to instantly surface documents containing an exact error code that a semantic model might overlook.
A hybrid pipeline runs BM25 and vector retrieval in parallel, fuses the results, and reranks them before sending the top passages to an LLM.
Bing Copilot generates candidates with BM25 over its web index, then applies a reranker to refine relevance before synthesis.
A GEO team improves lexical match by auditing pages to use consistent, precise terminology and exact entity names that BM25-style retrieval can score directly.

Frequently Asked Questions about BM25

Learn about AI visibility monitoring and how Promptwatch helps your brand succeed in AI search.

BM25 scores a document's relevance to a query using term frequency (with diminishing returns), inverse document frequency (rarer terms count more), and document-length normalization so long documents are not unfairly favored. It is a lexical, keyword-matching relevance function.

BM25

Definition

Examples of BM25

Terms related to BM25

Vector Search

Hybrid Search

Semantic Search

Reranking

Embeddings

RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG)

Passage Ranking

Retrieval Coverage

Frequently Asked Questions about BM25

Be the brand AI recommends

BM25

Definition

Examples of BM25

Terms related to BM25

Vector Search

Hybrid Search

Semantic Search

Reranking

Embeddings

RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG)

Passage Ranking

Retrieval Coverage

Frequently Asked Questions about BM25

What does BM25 actually measure?

Is BM25 still used now that we have embeddings?

How does BM25 fit into a modern retrieval pipeline?

Why does BM25 matter for GEO?

Be the brand AI recommends