Boosting Afaan Oromo Named Entity Recognition with Multiple Methods

Full Text (PDF, 285KB), PP.51-59

Views: 0 Downloads: 0

Author(s)

Abdo Ababor Abafogi 1,*

1. Department of Information Technology, Wolkite University, Wolkite, Ethiopia

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2021.05.05

Received: 29 Jul. 2021 / Revised: 16 Aug. 2021 / Accepted: 28 Aug. 2021 / Published: 8 Oct. 2021

Index Terms

Afaan Oromo, Named Entity Recognition, Word Sense Disambiguation, NLP, Information Extraction

Abstract

Named Entity Recognizer (NER) is a widely used method of Information extraction (IE) in Natural language processing (NLP) and Information Retrieval (IR) aimed at predicting and categorizing words of a given text into predefined classes of Named Entities like a person, date/time, organization, location, etc. This paper adopts boosting NER for Afaan Oromo by using multiple methods. Combinations of approaches such as machine learning, the stored rules, and pattern matching make a system more efficient and accurate to recognize candidates name entities (NEs). It takes the strongest points from each method to boost the system performance by voting a candidate NE which is detected in more than 1 entity category or out of context because of word ambiguity, it penalized by Word senses disambiguation. Subsequent NEs tagged with identical tags merged as a single tag before the final output. The evaluation shows the system is outperformed. Finally, the future direction is forwarded a hybrid approach of rule-based with unsupervised zero-resource cross-lingual to enhance more.

Cite This Paper

Abdo Ababor Abafogi, "Boosting Afaan Oromo Named Entity Recognition with Multiple Methods", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.13, No.5, pp. 51-59, 2021. DOI:10.5815/ijieeb.2021.05.05

Reference

[1] C. S. Malarkodi and S. L. Devi, A Deeper Study on Features for Named Entity Recognition, Proc. of the WILDRE5– 5th Workshop on Indian Language Data: Resources and Evaluation, pp. 66–72, 2020.
[2] R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld, Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations, In Proc.of the 49th Annu. Meeting of the Assoc. for Comput. Linguistics: Human Lang. Techno., 1, 541-550, 2011.
[3] S. Riedel, L. Yao, and A. McCallum, “Modeling Relations and their Mentions without Labeled Text”, In Joint European Conf. on Machine Learn. Knowl. Discovery in Databases, Springer Berlin Heidelberg, 2010.
[4] A. Thomas and S. Sangeetha, “Deep Learning Architectures for Named Entity Recognition: A Survey”, Advan. Computing and Intelligent Eng, pp. 2015-2025, 2020.
[5] N. K. Raja, N. Bakala, S. Suresh, “NLP: Rule Based Name Entity Recognition”, IJITEE, Vol. 8, no. 11, Sep. 2019.
[6] A. Goyal, M. Kumar, V. Gupta, “Named Entity Recognition: Applications, Approaches and Challenges”, Int. J. of Adv. Res. in Sci. and Eng. vol. 6, no. 10, pp. 1902-1919, 2017.
[7] M. Gupta, “Review of Named Entity Recognition (NER) Using Automatic Summarization of Resumes” https://towardsdatascience.com/a-review-of-named-entity-recognition-ner-using-automatic-summarization-of-resumes-5248a75de175 (accessed apr.15, 2021).
[8] A. M.Popescu, and Etzioni, O., Extracting Product Features and Opinions from Reviews, In Natural language processing and text mining, SPRINGER, pp. 9-28, 2007.
[9] O. Etzioni, et al. “Unsupervised Named-Entity Extraction from the Web: An Experimental Study, Artificial intelligence”, 165(1), ELSEVIER, pp. 91-134, 2005.
[10] Cao, T. H., Tang, T. M. and Chau, C. K., Text Clustering with Named Entities: A Model, Experimentation and Realization, In Data mining: Foundations and intelligent paradigms, 267-287. Springer Berlin Heidelberg, 2012.
[11] I. Habernal, and M. KonopíK, SWSNL: “Semantic Web Search using Natural Language. Expert Systems with Applications, vol. 40(9), pp. 3649-3664, 2013.
[12] W. Tegegne “The Development of Written Afan Oromo and the Appropriateness of Qubee, Latin Script, for Afan Oromo Writing”, Int. Journ. of Computer Appl. Techn and Res., pp 8-14, Vol.28, 2016.
[13] M. Hassen, “A Brief Glance at the History of the Growth of Written Oromo Literature in Cushitic and Omotic Languages” 3rd, Int. Symp., Berlin, 1996.
[14] T. Gamta, “The Oromo language and the latin alphabet”, Journal of Oromo Studies, 1992. http://www.africa.upenn.edu/Hornet/Afaan_Oromo_19777.html last visited on Friday, October 31, 2014.
[15] Ws. Li and A. McCallum, “Rapid Development of Hindi Named Entity Recognition using Conditional Random Fields and Feature Induction”, 2003.
[16] I. Bedane, “The Origin of Afan Oromo: Mother Language,” Glob. J. Hum. Soc. Sci. G Linguist. Educ., vol. 15, no. 12, 2015.
[17] W. Tesema and D. Tamirat, Investigating Afan Oromo Language Structure and Developing Effective File Editing Tool as Plug-in into Ms Word to Support Text Entry and Input Methods.
[18] M. S. Bari, S. Joty, and P. Jwalapuram, Zero-Resource Cross-Lingual Named Entity Recognition, The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), 2020.
[19] J. Devlin, M. Chang, K. Lee, and K. Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding, 2018
[20] A. Akbik, D. Blythe, and R. Vollgraf. Contextual string embeddings for sequence labeling. In COLING, pp. 1638–1649, 2018
[21] M.E. Peters, W. Ammar, C. Bhagavatula, and R. Power, Semi-supervised sequence tagging with bidirectional language models, 2017.
[22] J. Xie, Z. Yang, G. Neubig, A. Smith, and G. Carbonell; Neural cross-lingual named entity recognition with minimal resources. 2018
[23] Y. Lin, S. Yang, V. Stoyanov, and H. Ji. A multi-lingual multi-task architecture for low-resource sequence labeling. Association for Computational Linguistics, In ACL, pp. 799–809. Melbourne, Australia: 2018.
[24] M. Legesse, “Named Entity Recognition for Afan Oromo”, M.S. thesis, Addis Ababa Univ., 2012.
[25] A. Sani, “Afan Oromo Named Entity Recognition using Hybrid Approach”, M.S. thesis, Addis Ababa Univ., 2015.
[26] M. Oljira, et al. Sentiment analysis for Afaan Oromo using combined covolutional neural network and bidirectional long-short memory, IJARET, pp. 101-112, 2020.
[27] A. D. Sitter, Calders, T. and W. Daelemans, “A formal framework for evaluation of information extraction”, 2004.