Traditional information retrieval system was carried out essentially in English and fueled by the annual Text Retrieval Conferences (TREC) sponsored by NIST (the National Institute of Standards and Technology). NIST has accumulated large amounts of standard data (text collections, queries, and relevance judgments) so that IR researchers can compare their techniques on common data sets. More recently, IR researchers have found a real interest to study new languages other than English. Now, TREC includes multilingual data and other organizations sponsor similar annual evaluations for European languages (CLEF) and Asian languages (NTCIR) (Chinese, Japanese, and Korean). Arabic began to be included in the TREC cross-lingual track, and in the TDT (Topic Detection and Tracking) evaluations. The availability of standard Arabic data sets from the NIST and the Linguistic Data Consortium (LDC) has in turned spurred a huge acceleration in progress in information retrieval and other natural language processing involving Arabic language. Arabic is an interesting case to study in IR, because it is a highly inflected language. In this sub-topic, we study some problematic related to IR systems (lemmatization, morphological analysis, indexation) and we use the Hadith corpora as knowledge basis.
“On the Usage of a Classical Arabic Corpus as a Language Resource: Related Research and Key Challenges”, ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 18, no. 3, p. Article N°23, 2019.,
“Building a Morpho-Semantic Knowledge Graph for Arabic Information Retrieval”, Information Processing & Management, vol. 57, no. 6, 2019.,
“Arabic Cross-Language Information Retrieval: A Review”, ACM Transactions on Asian and Low-Resource Language Information Processing , vol. 15, no. 3, p. 44 pages, Article 18, 2016.,
“نحو مقاربة شاملة لتحليل وتمثيل الوثائق العربية في الويب الاجتماعي الدلالي”, in الندوة الدولية لعلوم و هندسة الحاسوب, Hammamet, Tunisia, May 201-21, pp. 197-210, 2010, pp. 197–210.,