Normalized Statistical Algorithm for Afaan Oromo Word Sense Disambiguation

Full Text (PDF, 302KB), PP.40-50

Views: 0 Downloads: 0

Author(s)

Abdo Ababor Abafogi 1,*

1. Department of Information Technology, College of Computing and Informatics, Wolkite University, Ethiopia

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2021.06.04

Received: 27 Aug. 2021 / Revised: 24 Sep. 2021 / Accepted: 3 Oct. 2021 / Published: 8 Dec. 2021

Index Terms

Afaan Oromo, Word Sense Disambiguation, Normalized Statistical Algorithm, Unsupervised Approach, Sense Cluster Algorithm

Abstract

Language is the main means of communication used by human. In various situations, the same word can mean differently based on the usage of the word in a particular sentence which is challenging for a computer to understand as level of human. Word Sense Disambiguation (WSD), which aims to identify correct sense of a given ambiguity word, is a long-standing problem in natural language processing (NLP). As the major aim of WSD is to accurately understand the sense of a word in particular context, can be used for the correct labeling of words in natural language applications. In this paper, I propose a normalized statistical algorithm that performs the task of WSD for Afaan Oromo language despite morphological analysis The propose algorithm has the power to discriminate ambiguous word’s sense without windows size consideration, without predefined rule and without utilize annotated dataset for training which minimize a challenge of under resource languages. The proposed system tested on 249 sentences with precision, recall, and F-measure. The overall effectiveness of the system is 80.76% in F-measure, which implies that the proposed system is promising on Afaan Oromo that is one of under resource languages spoken in East Africa. The algorithm can be extended for semantic text similarity without modification or with a bit modification. Furthermore, the forwarded direction can improve the performance of the proposed algorithm.

Cite This Paper

Abdo Ababor Abafogi, "Normalized Statistical Algorithm for Afaan Oromo Word Sense Disambiguation", International Journal of Intelligent Systems and Applications(IJISA), Vol.13, No.6, pp.40-50, 2021. DOI: 10.5815/ijisa.2021.06.04

Reference

[1] Nadia Bouhriz, Faouzia Benabbou, and El Habib Ben Lahmar. “Word Sense Disambiguation Approach for Arabic Text” International Journal of Advanced Computer Science and Applications, Vol. 7, No. 4, 2016.
[2] Workineh Tesema and Duresa Tamirat, “Investigating Afan Oromo Language Structure and Developing Effective File Editing Tool as Plug-in into Ms Word to Support Text Entry and Input Methods” American Journal of Computer Science and Engineering Survey, 2021.
[3] Jumi Sarmah, Shikhar Kumar Sarma,"Survey on Word Sense Disambiguation: An Initiative towards an Indo-Aryan Language", International Journal of Engineering and Manufacturing, Vol.6, No.3, pp.37-52, 2016.
[4] Michele Bevilacqua and Roberto Navigli. “Breaking Through the 80% Glass Ceiling Raising the State of the Art in Word Sense Disambiguation by Incorporating Knowledge Graph Information” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2854–2864 July 5 - 10, 2020.
[5] Shweta Vikram, Sanjay K. Dwivedi, "Ambiguity in Question Paper Translation", International Journal of Modern Education and Computer Science, Vol.10, No.1, pp. 13-23, 2018.
[6] Workineh T, Debela T, Teferi K; “Designing a Rule Based Disambiguator for Afan Oromo Words”. Am J Compt Sci Inform Technol, 2017.
[7] Workineh Tesema, Debela Tesfaye and Teferi Kibebew, “Towards the sense disambiguation of Afan Oromo words using hybrid approach (unsupervised machine learning and rule based).” Ethiopian Journal of Education and Sciences 12 (2016): pp. 61-77.
[8] Beekan Erena, Oromo Language (Afaan Oromoo), https://scholar.harvard.edu/erena/oromo-language-afaan-oromoo Accessed 20 Sept. 2021.
[9] Tamene Keneni Walga “Prospects and Challenges of Afan Oromo: A Commentary.” Theory and Practice in Language Studies, vol. 11, no. 6, June 2021, pp. 606. Accessed 15 Sept. 2021.
[10] Guya T. “CaasLuga Afaan Oromoo: Jildii-1”, Gumii Qormaata Afaan Oromootiin Komishinii Aadaa fi Turizimii Oromiyaa, Finfinnee, 2003.
[11] Getachew Rabirra Furtuu, “Seerluga Afaan Oromoo”, Finfinnee Oromiyaa press, 2014.
[12] Kula Kekeba Tune, Vasudeva Varma, Prasad Pingali, Evaluation of Oromo- English Crosslanguage Information Retrieval, ijcai 2007 workshop on clia, hyderabad, india, 2007.
[13] Abdo Ababor Abafogi, "Boosting Afaan Oromo Named Entity Recognition with Multiple Methods", International Journal of Information Engineering and Electronic Business, Vol.13, No.5, pp. 51-59, 2021.
[14] Baskaran Sankaran, k. Vijay-Shanker, “Influence of morphology in word sense disambiguation for Tamil”, Anna University and University of Delaware Proceedings of International Conference on Natural Language Processing, 2003.
[15] Yinglin Wang, Ming Wang, Hamido Fujitas, Word Sense Disambiguation: A comprehensive knowledge exploitation framework Knowledge-Based Systems vol 190, 29 February 2020.
[16] David Yarowsky. Decision lists for lexical ambiguity resolution: Application to accent restoration in spanish and french. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics, pages 88–95. Association for Computational Linguistics, 1994.
[17] Bianca Scarlini, Tommaso Pasini, and Roberto Navig, “SENSEMBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation,” The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) 2020, Association for the Advancement of Artificial Intelligence.
[18] Christian Hadiwinoto, Hwee Tou Ng, and Wee Chung Gan. Improved Word Sense Disambiguation using pre-trained contextualized word representations. In Proc. Of EMNLP, pages 5297–5306, 2019.
[19] Michele Bevilacqua and RobertoNavigli. Quasi bidirectional encoder representations from Transformers for Word Sense Disambiguation. In Proc. of RANLP, pages 122–131, 2019.
[20] Federico Scozzafava, Marco Maru, Fabrizio Brignone, Giovanni Torrisi, and Roberto Navigli. Personalized PageRank with syntagmatic information for multilingual Word Sense Disambiguation. In Proc. of ACL (demos), 2020.
[21] Michele Bevilacqua and Roberto Navigli. Breaking through the 80% glass ceiling: Raising the state of the art in Word Sense Disambiguation by incorporating knowledge graph information. In Proc. of ACL, pages 2854–2864, 2020.
[22] Sawan Kumar, Sharmistha Jat, Karan Saxena, and Partha Talukdar. Zero-shot Word Sense Disambiguation using sense definition embeddings. In Proc. of ACL, 2019.
[23] Terra Blevins and Luke Zettlemoyer. Moving down the long tail of Word Sense Disambiguation with gloss informed bi-encoders. In Proc. of ACL, 2020.
[24] Edoardo Barba, Tommaso Pasini, and Roberto Navigli. ESC: Redesigning WSD with extractive sense comprehension. In Proc. of NAACL, 2021.
[25] Simone Conia and Roberto Navigli. Framing Word Sense Disambiguation as a multi-label problem for model-agnostic knowledge integration. In Proc. of EACL, 2021.
[26] Michele Bevilacqua, Tommaso Pasini, Alessandro Raganato and Roberto Navigli “Recent Trends in Word Sense Disambiguation: A Survey” Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21)
[27] Vial, L.; Lecouteux, B.; and Schwab, D. 2019. Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation. In Proc. of Global Wordnet Conference.
[28] Alessandro Raganato, Claudio Delli Bovi, and Roberto Navigli. 2017. Neural sequence learning models for word sense disambiguation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1156–1167, Copenhagen, Denmark. Association for Computational Linguistics.
[29] B. H. Manjunatha Kumar, B.E., M.Tech.A Survey on Word Sense Disambiguation Sri Siddhartha Institute of Technology, 2018.
[30] Sruthi Sankar K P, P C Reghu Raj, Jayan V U, nsupervised Approach to Word Sense Disambiguation in Malayalam,” International Conference on Emerging Trends in Engineering, Science and Technology, 2015.
[31] Iacobacci, I.; Pilehvar, M. T.; and Navigli, R. 2016. Embeddings for word sense disambiguation: An evaluation study. In Proc. Of ACL, volume 1, 897–907.
[32] Melamud, O.; Goldberger, J.; and Dagan, I. 2016. context2vec: Learning generic context embedding with bidirectional LSTM. In Proc. of CoNLL, 51–61.
[33] Krishnanjan et al. “Survey and Gap Analysis of Word Sense Disambiguation Approaches on Unstructured Texts”, Proceedings of the International Conference on Electronics and Sustainable Communication Systems 2020.
[34] M. Gunavathi and S. Rajini, “The Various Approaches for Word Sense Disambiguation: A Survey,” Department of Computer Science and Engineering, Kumaraguru College of Technology, Coimbatore, India, IJIRT, Volume 3, Issue 10, ISSN: 2349-6002, March 2017.
[35] Huang Heyan, Yang Zhizhuo, and Jian Ping, “Unsupervised Word Sense Disambiguation Using Neighbourhood Knowledge,” Beijing Engineering Applications Research Center of High Volume Language Information Processing and Cloud Computing, Beijing Institute of Technology and Department of Computer Science, Beijing Institute of Technology, China.
[36] Tesfa Kebede. Word sense disambiguation for Afaan Oromo Language: published master’s Thesis, Department of Computer Science, Addis Ababa University, Addis Ababa, Ethiopia, 2013.
[37] Shibiru Olika, “word sense disambiguation for afaan oromo using knowledge base” St. University College, 2018.
[38] Yehuwalashet Bekele. Hybrid Word Sense Disambiguation Approach for Afaan Oromo Words: published master’s Thesis, Department of Computer Science, Addis Ababa University, Addis Ababa, Ethiopia, 2016.