Definition
BM25 (Best Matching 25) is a ranking function from information retrieval that scores how relevant a document is to a query based on the query's terms. It improves on simple term-frequency counting by accounting for how often a term appears in a document (with diminishing returns), how rare the term is across the whole corpus (inverse document frequency), and document length, so that long documents are not unfairly favored. The result is a fast, robust lexical (keyword-based) relevance score.
Despite being decades old, BM25 remains a workhorse in 2026 AI search. It is widely used as the first-stage candidate generator in retrieval pipelines—including AI answer engines—precisely because it is cheap, predictable, and excellent at exact matching of specific terms like product names, error codes, version numbers, and rare keywords that vector search can miss. Bing Copilot, for example, reportedly uses BM25 over its web index as a primary candidate generator before reranking.
In practice BM25 rarely works alone. Modern systems combine it with dense embedding-based retrieval in a hybrid search stack and then apply a cross-encoder reranker for final precision. BM25 supplies high-recall lexical matches; vectors add semantic understanding; the reranker sorts the merged set.
For GEO, BM25's persistence is a reminder that exact wording still matters. Using consistent, precise terminology and named entities—not just semantically adjacent phrasing—helps your content match the lexical stage of retrieval and survive into the candidate set that answer engines synthesize from.
Examples of BM25
- A search system uses BM25 to instantly surface documents containing an exact error code that a semantic model might overlook.
- A hybrid pipeline runs BM25 and vector retrieval in parallel, fuses the results, and reranks them before sending the top passages to an LLM.
- Bing Copilot generates candidates with BM25 over its web index, then applies a reranker to refine relevance before synthesis.
- A GEO team improves lexical match by auditing pages to use consistent, precise terminology and exact entity names that BM25-style retrieval can score directly.
