Work place: School of Science and Technology, Africa Nazarene University, Nairobi, Kenya
E-mail: eombui@anu.ac.ke
Website:
Research Interests: Natural Language Processing, Machine Learning, Artificial Intelligence
Biography
Edward Ombui is a lecturer at the School of Science and Technology, Africa Nazarene University, Kenya. He is a PhD candidate at the University of Nairobi. His education includes an MSc –Applied Computer Science, University of Nairobi, and BSc Computer Science, Africa Nazarene University. His research interests are in Artificial Intelligence, Natural language processing, Machine learning, and Machine Translation. He has published extensively on IEEE, the African Academy of Languages, among other journals. His professional membership includes the Computer Society of Kenya, the Association for computational Linguistics, IEEE, and the African Language Technology group.
By Edward Ombui Lawrence Muchemi Peter Wagacha
DOI: https://doi.org/10.5815/ijitcs.2021.06.03, Pub. Date: 8 Dec. 2021
This study examines the problem of hate speech identification in codeswitched text from social media using a natural language processing approach. It explores different features in training nine models and empirically evaluates their predictiveness in identifying hate speech in a ~50k human-annotated dataset. The study espouses a novel approach to handle this challenge by introducing a hierarchical approach that employs Latent Dirichlet Analysis to generate topic models that help build a high-level Psychosocial feature set that we acronym PDC. PDC groups similar meaning words in word families, which is significant in capturing codeswitching during the preprocessing stage for supervised learning models. The high-level PDC features generated are based on a hate speech annotation framework [1] that is largely informed by the duplex theory of hate [2]. Results obtained from frequency-based models using the PDC feature on the dataset comprising of tweets generated during the 2012 and 2017 presidential elections in Kenya indicate an f-score of 83% (precision: 81%, recall: 85%) in identifying hate speech. The study is significant in that it publicly shares a unique codeswitched dataset for hate speech that is valuable for comparative studies. Secondly, it provides a methodology for building a novel PDC feature set to identify nuanced forms of hate speech, camouflaged in codeswitched data, which conventional methods could not adequately identify.
[...] Read more.By Edward Ombui Lawrence Muchemi Peter Wagacha
DOI: https://doi.org/10.5815/ijitcs.2021.03.03, Pub. Date: 8 Jun. 2021
Presidential campaign periods are a major trigger event for hate speech on social media in almost every country. A systematic review of previous studies indicates inadequate publicly available annotated datasets and hardly any evidence of theoretical underpinning for the annotation schemes used for hate speech identification. This situation stifles the development of empirically useful data for research, especially in supervised machine learning. This paper describes the methodology that was used to develop a multidimensional hate speech framework based on the duplex theory of hate [1] components that include distance, passion, commitment to hate, and hate as a story. Subsequently, an annotation scheme based on the framework was used to annotate a random sample of ~51k tweets from ~400k tweets that were collected during the August and October 2017 presidential campaign period in Kenya. This resulted in a gold-standard codeswitched dataset that could be used for comparative and empirical studies in supervised machine learning. The resulting classifiers trained on this dataset could be used to provide real-time monitoring of hate speech spikes on social media and inform data-driven decision-making by relevant security agencies in government.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals