Bag-of-Documents Product Search

Side-by-side comparison of five retrieval architectures on 1.2M Amazon ESCI products. Pick a mode for each column, run a query, see how the rankings differ. ESCI 22,458-query R@10 in parens.

Base MiniLM retrieval (15.60%) — dense baseline, no fine-tuning.
BM25 retrieval (20.33%) — bm25s with k1=0.3, b=0.6, tuned for short keyword-stuffed product titles.
RRF(BM25, base) (non-BoD hybrid baseline) — vanilla hybrid retrieval; on this corpus it actually loses to BM25 alone.
BM25 + 3-way ensemble rerank — fast SOTA (21.61%), ~50ms/query. BM25 top-50 reranked by three BoD-trained MiniLM encoders via sumsim fusion.
BM25 + sumsim + BGE — bridge tier (R@10 23.10%, E@1 46.89%), ~2.5s/query on Space CPU. Drops LiYuan from the quality fusion and halves the candidate pool to top-50; 0.5/0.5 sumsim + BGE.
BM25 + sumsim + LiYuan + BGE — quality SOTA (R@10 23.57%, E@1 47.95%), ~5-15s/query on Space CPU. Weighted 3-way fusion (sumsim 0.4, LiYuan 0.2, BGE 0.4) over BM25 top-100. +1.96pp R@10, +5.42pp E@1 over the fast SOTA.

The fast SOTA does three forward passes against precomputed product embeddings then averages cosine — sub-100ms wall-clock. The bridge tier adds 50 BGE-reranker forward passes; the quality SOTA adds 100 LiYuan + 100 BGE forward passes.