BM25 Beyond Query-Document Similarity

TitleBM25 Beyond Query-Document Similarity
Publication TypeConference Paper
Year of Publication2019
AuthorsAklouche, B, Bounhas, I, Slimani, Y
Conference Name26th International Symposium on String Processing and Information Retrieval (SPIRE 2019), Segovia, Spain, October 7-9, 2019
KeywordsAd-hoc IR., BM25, Co-occurrence graph, Query Expansion, Term discriminative power

Abstract. The massive growth of information produced and shared online has made retrieving relevant documents a difficult task. Query Expansion (QE) based on term co-occurrence statistics has been widely applied in an attempt to improve retrieval effectiveness. However, selecting good expansion terms using co-occurrence graphs is challenging. In this paper, we present an adapted version of the BM25 model, which allows measuring the similarity between terms. First, a context windowbased approach is applied over the entire corpus in order to construct the term co-occurrence graph. Afterward, using the proposed adapted version of BM25, candidate expansion terms are selected according to their similarity with the whole query. This measure stands out by its ability to evaluate the discriminative power of terms and select semantically related terms to the query. Experiments on two ad-hoc TREC collections (the standard Robust04 collection and the new TREC Washington Post collection) show that our proposal outperforms the baselines over three state-of-the-art IR models and leads to significant improvements in retrieval effectiveness.