A Survey of Techniques for Improving Information Retrieval through Query Expansion

PDF (495KB), PP.73-81

Views: 0 Downloads: 0

Author(s)

Surabhi Solanki 1,2,* Seema Verma 3 Sachin Kumar 4

1. Banasthali Vidyapith, Rajasthan, CO-304022, India

2. Bennett University, Greater Noida CO-201310, India

3. National Institute of Technical Teachers' Training and Research, Bhopal, CO- 462002, India

4. Parul University, Vadodara CO-391760, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijitcs.2025.02.07

Received: 26 Mar. 2024 / Revised: 12 Oct. 2024 / Accepted: 5 Feb. 2025 / Published: 8 Apr. 2025

Index Terms

Query Expansion (QE), Information Retrieval (IR), Recall, Precision, Term Weighting, Term Ranking

Abstract

This paper presents a comprehensive survey of QE techniques in IR. Core techniques, employed data sources, and methodologies used in the process of query expansion are discussed. The output study highlights four main steps concerned with expanding queries: steps related to preprocessing of data sources and term extraction, calculation of weights and ranking of terms, selection of terms, and finally expansion. The most important findings are that only effective text normalization and removal of stopwords provide a real platform for performing QE. The introduction of contextually relevant terms significantly enhanced relevance feedback and thesaurus-based WordNet expansion techniques. They have been shown to significantly improve retrieval effectiveness as has been realized from various experiments conducted over years now. It also uses the manual query expansion techniques and discusses several automated ways in order to improve retrieval effectiveness. This work, by reviewing the related literature and methodologies, gives an overview of how the techniques of query expansion have been evolving with time and achieved better results in IR systems. The survey offers a valuable resource for researchers and practitioners in information retrieval, shedding light on the advancements, challenges, and future directions in query expansion research.

Cite This Paper

Surabhi Solanki, Seema Verma, Sachin Kumar, "A Survey of Techniques for Improving Information Retrieval through Query Expansion", International Journal of Information Technology and Computer Science(IJITCS), Vol.17, No.2, pp.73-81, 2025. DOI:10.5815/ijitcs.2025.02.07

Reference

[1]C. Carpineto and G. Romano, “A survey of automatic query expansion in information retrieval,” ACM Computing Surveys, vol. 44, no. 1, pp. 1–50, 2012.
[2]C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008.
[3]K. Yamamoto and T. Tanaka, “Cross-lingual query expansion for improved information retrieval,” Journal of Computational Linguistics, vol. 42, no. 2, pp. 267–285, 2016.
[4]P. Galuščáková, D. W. Oard, and S. Nair, “Cross-language information retrieval,” arXiv preprint arXiv:2111.05988, 2021.
[5]P. K. Mvula, et al., “A survey on the applications of semi-supervised learning to cyber-security,” ACM Computing Surveys, vol. 56, no. 10, pp. 1–41, 2024.
[6]A. Jumde and R. Keskar, “Query reformulation system based on WordNet and word vectors clusters,” Journal of Intelligent & Fuzzy Systems, Preprint, pp. 1–19, 2024.
[7]J. Xu and W. B. Croft, “Query expansion using local and global document analysis,” ACM SIGIR Forum, vol. 51, no. 2, New York, NY, USA: ACM, 2017.
[8]P. Boldi, et al., “Query reformulation mining: models, patterns, and applications,” Information Retrieval, vol. 14, pp. 257–289, 2011.
[9]C. Van Gysel, M. De Rijke, and E. Kanoulas, “Neural vector spaces for unsupervised information retrieval,” ACM Transactions on Information Systems (TOIS), vol. 36, no. 4, pp. 1–25, 2018.
[10]S. Pandey, I. Mathur, and N. Joshi, “Information retrieval ranking using machine learning techniques,” in 2019 Amity International Conference on Artificial Intelligence (AICAI), pp. 86–92, IEEE, February 2019.
[11]L. Tamine and L. Goeuriot, “Semantic information retrieval on medical texts: Research challenges, survey, and open issues,” ACM Computing Surveys (CSUR), vol. 54, no. 7, pp. 1–38, 2021.
[12]H. Chen, et al., “Knowledge graphs for personalized information retrieval,” ACM Transactions on Information Systems, vol. 38, no. 4, pp. 567–580, 2020.
[13]Y. Zhu, E. Yan, and I.-Y. Song, “A natural language interface to a graph-based bibliographic information retrieval system,” Data & Knowledge Engineering, vol. 111, pp. 73–89, 2017.
[14]J. Lin, “A proposed conceptual framework for a representational approach to information retrieval,” ACM SIGIR Forum, vol. 55, no. 2, New York, NY, USA: ACM, 2022.
[15]W. B. Croft and D. J. Harper, “Using probabilistic models of document retrieval without relevance information,” in Readings in Information Retrieval, pp. 339–344, 1997.
[16]S. Chaveesuk, W. Chaiyasoonthorn, and B. Khalid, “Understanding the model of user adoption and acceptance of technology by Thai farmers: A conceptual framework,” in Proceedings of the 2020 2nd International Conference on Management Science and Industrial Engineering, pp. 621–625, 2020.
[17]V. Saiz-Rubio and F. Rovira-Más, “From smart farming towards Agriculture 5.0: A review on crop data management,” Agronomy, vol. 10, no. 2, p. 207, 2020.
[18]X. Qin, H. Zhang, and H. Zheng, “Research on intelligent retrieval system for agricultural information resources based on ontology,” Journal of Physics: Conference Series, vol. 1168, no. 2, 2019.
[19]N. Zhong, et al., “Research challenges and perspectives on Wisdom Web of Things (W2T),” The Journal of Supercomputing, vol. 64, pp. 862–882, 2013.
[20]W. Q. Zheng, “Automatic semantic retrieval and visualization model based on ontology integration,” Information Science, vol. 5, no. 31, pp. 77–83, 2013.
[21]A. E. A. Joseph, et al., “Comparison of liver histology with ultrasonography in assessing diffuse parenchymal liver disease,” Clinical Radiology, vol. 43, no. 1, pp. 26–31, 1991.
[22]S. Mishra, D. Mishra, and G. H. Santra, “Applications of machine learning techniques in agricultural crop production: A review paper,” Indian Journal of Science and Technology, vol. 9, no. 38, pp. 1–14, 2016.
[23]J. M. Attonaty, et al., “Using extended machine learning and simulation techniques to design crop management strategies,” in EFITA First European Conference for Information Technology in Agriculture, Copenhagen, DK, 1997.
[24]D. Stathakis, I. Savina, and T. Nègrea, “Neuro-fuzzy modeling for crop yield prediction,” in The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 34, Part XXX, pp. 1–4, 2006.
[25]V. Petridis and V. G. Kaburlasos, “FINkNN: A fuzzy interval number k-nearest neighbor classifier for prediction of sugar production from populations of samples,” The Journal of Machine Learning Research, vol. 4, pp. 17–37, 2003.
[26]M. N. M. Salleh, “A fuzzy modeling of decision support system for crop selection,” in 2012 IEEE Symposium on Industrial Electronics and Applications, pp. 1–5, 2012.
[27]M. De Leona, et al., “A prediction model framework for crop yield prediction,” in The 14th Asia Pacific Industrial Engineering and Management Systems Conference (APIEMS), pp. 1–7, 2013.
[28]N. Roling, Extension Science: Information Systems in Agricultural Development, CUP Archive, 1988.
[29]D. A. Calhoun, et al., “Hyperaldosteronism among black and white subjects with resistant hypertension,” Hypertension, vol. 40, no. 6, pp. 892–896, 2002.
[30]J. Kim, et al., “Long-term femtosecond timing link stabilization using a single-crystal balanced cross correlator,” Optics Letters, vol. 32, no. 9, pp. 1044–1046, 2007.
[31]H. Y. Kung, et al., “Accuracy analysis mechanism for agriculture data using the ensemble neural network method,” Sustainability, vol. 8, no. 8, p. 735, 2016.
[32]J. Daniel, et al., “A survey of artificial neural network-based modeling in agroecology,” in Soft Computing Applications in Industry, Springer, 2008.
[33]J. R. Neto, et al., “Use of the decision tree technique to estimate sugarcane productivity under edaphoclimatic conditions,” Sugar Tech, vol. 19, no. 6, pp. 662–668, 2017.
[34]H. Imran, H. Hazra, and A. Sharan, “Thesaurus and query expansion,” International Journal of Computer Science & Information Technology (IJCSIT), vol. 1, no. 2, pp. 89–97, 2009.
[35]B. Li, et al., “Metagenomic and network analysis reveal wide distribution and co-occurrence of environmental antibiotic resistance genes,” The ISME Journal, vol. 9, no. 11, pp. 2490–2502, 2015.
[36]H. Lee and S. Kim, “Pseudo-relevance feedback,” Journal of Computer Science and Technology, vol. 30, no. 1, pp. 45–57, 2018.
[37]R. Piryani, V. Gupta, and V. K. Singh, “Movie Prism: A novel system for aspect-level sentiment profiling of movies,” Journal of Intelligent & Fuzzy Systems, vol. 32, no. 5, pp. 3297–3311, 2017.
[38]S. Ruder, I. Vulić, and A. Søgaard, “A survey of cross-lingual word embedding models,” Journal of Artificial Intelligence Research, vol. 65, pp. 569–631, 2019.
[39]M. Berdugo-Vega, et al., “Increasing neurogenesis refines hippocampal activity, rejuvenating navigational learning strategies and contextual memory throughout life,” Nature Communications, vol. 11, no. 1, p. 135, 2020.
[40]P. Wiriyathammabhum, D. Summers-Stay, C. Fermüller, and Y. Aloimonos, “Computer vision and natural language processing: Recent approaches in multimedia and robotics,” ACM Computing Surveys (CSUR), vol. 49, no. 4, pp. 1–44, 2016.
[41]R. Pienta, et al., “VIGOR: Interactive visual exploration of graph query results,” IEEE Transactions on Visualization and Computer Graphics, vol. 24, no. 1, pp. 215–225, 2017.
[42]X. Tang, “A state-of-the-art of semantic change computation,” Natural Language Engineering, vol. 24, no. 5, pp. 649–676, 2018.
[43]M. J. M. de Haas, The Online Presence of Liveability: Using Sentiment Derived from Twitter Messages as an Indicator of Liveability, MS Thesis, 2021.
[44]A. K. Uysal and S. Gunal, “Text classification using genetic algorithm-oriented latent semantic features,” Expert Systems with Applications, vol. 41, no. 13, pp. 5938–5947, 2014.
[45]H. Cui, et al., “An improved Deng entropy and its application in pattern recognition,” IEEE Access, vol. 7, pp. 18284–18292, 2019.
[46]R. A. Sinoara, J. Antunes, and S. O. Rezende, “Text mining and semantics: A systematic mapping study,” Journal of the Brazilian Computer Society, vol. 23, pp. 1–20, 2017.
[47]S. Raghavendra, et al., “Split keyword fuzzy and synonym search over encrypted cloud data,” Multimedia Tools and Applications, vol. 77, pp. 10135–10156, 2018.
[48]L. Shou, et al., “Mining implicit relevance feedback from user behavior for web question answering,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2931–2941, 2020.
[49]L. Zhang, M. Färber, and A. Rettinger, “XKnowSearch! Exploiting knowledge bases for entity-based cross-lingual information retrieval,” in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2425–2428, 2016.
[50]S. Chatterjee, K. Sarkar, and S. Patra, “A hybrid query expansion method for effective Bengali information retrieval,” in International Conference on Frontiers in Computing and Systems, Singapore: Springer Nature Singapore, pp. 377–397, October 2023.