Summary
A retrieval pipeline that blends BM25, dense vectors, and a cross-encoder reranker so exact-match precision and semantic recall can work together instead of competing.
BM25 + Dense + Reranking
A hybrid retrieval project that combines lexical search, dense vectors, and reranking to recover the exact matches pure vector search misses.
Project Brief
Summary
A retrieval pipeline that blends BM25, dense vectors, and a cross-encoder reranker so exact-match precision and semantic recall can work together instead of competing.
Problem
Pure vector retrieval feels magical until the query depends on a brittle exact phrase, identifier, or part number — then recall collapses silently.
Hypothesis
If lexical and semantic retrieval are combined instead of treated as competing approaches, search quality improves on the queries that usually break RAG.
Outcome
Built and benchmarked a four-stage pipeline (BM25, Dense, Hybrid RRF, Hybrid+Rerank) across 30 queries in 3 categories. Hybrid+Rerank reached 96.7% Recall@5 overall, with the clearest gains on semantic queries where BM25 alone hits 75% and the reranker closes the gap to 90%.
Goals
Technologies Used
This project is an argument against false choices. Lexical retrieval and semantic retrieval are both useful, and the system gets stronger when they are allowed to contribute at different stages. BM25 handles precision — it finds documents that contain the exact terms in the query. Dense bi-encoders handle recall — they find documents with the right meaning even when the phrasing is different. Reranking handles ordering — it looks at query-document pairs together and decides which candidate should actually win.
The result is a stack that behaves more like a practical search system and less like an embedding demo. That matters whenever people search with exact identifiers, awkward phrasing, or queries that require both literal and contextual matching.
The lesson here was that “modern” is not automatically “better.” Some of the strongest AI systems are hybrids because they respect what older search methods still do extremely well. BM25 on semantic-heavy queries only reaches 75% — but it is still the right first stage because it provides signal that dense retrieval simply does not have. The reranker can only improve what the fusion surface already contains; the strength of the hybrid comes from combining two genuinely different signals before the reranking pass.
01
BM25 alone achieves 75% Recall@5 on semantic queries — the 25% gap is queries where the exact search terms don't appear verbatim in the relevant document. Dense retrieval closes this to 80% but still misses paraphrase-heavy edge cases.
02
Hybrid RRF (Reciprocal Rank Fusion of BM25 and dense ranked lists) reaches 85% recall on semantic queries without any reranking, confirming that the two signals are complementary rather than redundant.
03
Hybrid+Rerank (cross-encoder second pass over the RRF candidate pool) hits 90% on semantic and 96.7% overall — a +5.5% lift over pure BM25 and +3.6% over dense alone across all 30 queries.
Analysis
Recall@5 by Query Category — Four Retrieval Configurations
Loading chart...
30 queries across 3 categories evaluated at K=5. Exact-match and mixed queries saturate across all configurations — the story is in semantic queries, where BM25 alone hits 75% recall and Hybrid+Rerank closes to 90%. Hover each bar for Recall@5 and MRR per configuration. Corpus: 20 technical AI/ML documents embedded with all-MiniLM-L6-v2. Reranker: ms-marco-MiniLM-L-6-v2.
[ Connect ]
If your retrieval stack works in demos but falls apart on exact identifiers or brittle queries, I would be happy to compare notes on the hybrid approach.
You are reaching
John Meyer
Security Engineer → AI