prototype Frontier Complete

Hybrid Retrieval Pipeline

BM25 + Dense + Reranking

A hybrid retrieval project that combines lexical search, dense vectors, and reranking to recover the exact matches pure vector search misses.

Back to Projects

Project Brief

20 docs / 30 queries

Corpus / queries

96.7% (Hybrid+Rerank)

Best Recall@5

+15% over BM25

Semantic gain

Recall@5 + MRR

Evaluation

01 - Project Brief

Problem, Hypothesis, Outcome.

TYPE · STRUCTURED CASE NOTE

Summary

A retrieval pipeline that blends BM25, dense vectors, and a cross-encoder reranker so exact-match precision and semantic recall can work together instead of competing.

Problem

Pure vector retrieval feels magical until the query depends on a brittle exact phrase, identifier, or part number — then recall collapses silently.

Hypothesis

If lexical and semantic retrieval are combined instead of treated as competing approaches, search quality improves on the queries that usually break RAG.

Outcome

Built and benchmarked a four-stage pipeline (BM25, Dense, Hybrid RRF, Hybrid+Rerank) across 30 queries in 3 categories. Hybrid+Rerank reached 96.7% Recall@5 overall, with the clearest gains on semantic queries where BM25 alone hits 75% and the reranker closes the gap to 90%.

02 - Goals & Stack

What the build was trying to do.

TYPE · GOALS / TOOLS

Goals

Recover exact matches that pure embeddings flatten away.
Preserve semantic recall for fuzzier queries.
Make the retrieval stack explainable enough to tune deliberately.

Technologies Used

BM25 (Okapi, implemented from scratch) sentence-transformers (all-MiniLM-L6-v2) Cross-encoder reranking (ms-marco-MiniLM-L-6-v2) Reciprocal Rank Fusion (RRF) Retrieval evaluation (Recall@5, MRR)

03 - Breakdown & Notes

Implementation notes.

TYPE · DOCUMENTATION

Breakdown

This project is an argument against false choices. Lexical retrieval and semantic retrieval are both useful, and the system gets stronger when they are allowed to contribute at different stages. BM25 handles precision — it finds documents that contain the exact terms in the query. Dense bi-encoders handle recall — they find documents with the right meaning even when the phrasing is different. Reranking handles ordering — it looks at query-document pairs together and decides which candidate should actually win.

The result is a stack that behaves more like a practical search system and less like an embedding demo. That matters whenever people search with exact identifiers, awkward phrasing, or queries that require both literal and contextual matching.

Build notes

BM25 is implemented from scratch (Okapi BM25, k1=1.5, b=0.75) — no external retrieval library required.
Dense retrieval uses sentence-transformers bi-encoder (all-MiniLM-L6-v2) embedded locally; cosine similarity over a 20-document corpus.
Reciprocal Rank Fusion merges the BM25 and dense ranked lists without any tunable weight parameter — just $1/(k + rank)$ summed per document.
The cross-encoder (ms-marco-MiniLM-L-6-v2) runs as a second stage over the top-20 RRF candidates, not the full corpus, so latency stays low.
Evaluation: 30 manually annotated queries with known relevant document IDs; Recall@5 and MRR measured per configuration.

Lessons Learned

The lesson here was that “modern” is not automatically “better.” Some of the strongest AI systems are hybrids because they respect what older search methods still do extremely well. BM25 on semantic-heavy queries only reaches 75% — but it is still the right first stage because it provides signal that dense retrieval simply does not have. The reranker can only improve what the fusion surface already contains; the strength of the hybrid comes from combining two genuinely different signals before the reranking pass.

04 - Analysis

Findings.

TYPE · FINDINGS

BM25 alone achieves 75% Recall@5 on semantic queries — the 25% gap is queries where the exact search terms don't appear verbatim in the relevant document. Dense retrieval closes this to 80% but still misses paraphrase-heavy edge cases.

Hybrid RRF (Reciprocal Rank Fusion of BM25 and dense ranked lists) reaches 85% recall on semantic queries without any reranking, confirming that the two signals are complementary rather than redundant.

Hybrid+Rerank (cross-encoder second pass over the RRF candidate pool) hits 90% on semantic and 96.7% overall — a +5.5% lift over pure BM25 and +3.6% over dense alone across all 30 queries.

Analysis

Recall@5 by Query Category — Four Retrieval Configurations

Loading chart...

30 queries across 3 categories evaluated at K=5. Exact-match and mixed queries saturate across all configurations — the story is in semantic queries, where BM25 alone hits 75% recall and Hybrid+Rerank closes to 90%. Hover each bar for Recall@5 and MRR per configuration. Corpus: 20 technical AI/ML documents embedded with all-MiniLM-L6-v2. Reranker: ms-marco-MiniLM-L-6-v2.

[ Connect ]

Worth a conversation?

If your retrieval stack works in demos but falls apart on exact identifiers or brittle queries, I would be happy to compare notes on the hybrid approach.

All Projects →

You are reaching

John Meyer

Security Engineer → AI

Open to roles
Contract + consulting
Architecture advisory