Work place: Maseno University, Department of Computer Science, Maseno, Kenya
E-mail: jobuhuma@gmail.com
Website:
Research Interests:
Biography
Dr. James Obuhuma is a Computer Science faculty, Department of Computer Science, Maseno University. He holds a PhD in Computer Science from Maseno University, an MSc in Computer Science from the University of Nairobi and a BSc in Computer Science and Technology from Maseno University. His MSc thesis focused on Road Traffic Analysis using GPS Technology which opened his interest in Intelligent Systems. This heavily influenced his PhD research topic which was in the area of Intelligent Transportation Systems (ITS) with a focus on Vehicle Driver Behaviour Modeling. His research interest is in the application of Machine Learning in different domains, including, ITS, Cybersecurity and IoT. Dr. Obuhuma is one of the MSc supervisors for Cynthia Amol, whose MSc research resulted in this publication. Dr. Obuhuma is also a passionate Design Thinking Coach who fosters for Social Innovations across the globe.
By Cynthia Amol Lilian Wanzare James Obuhuma
DOI: https://doi.org/10.5815/ijitcs.2025.01.05, Pub. Date: 8 Feb. 2025
Code-switching, which is the mixing of words or phrases from multiple, grammatically distinct languages, introduces semantic and syntactic complexities to sentences which complicate automated text classification. Despite code-switching being a common occurrence in informal text-based communication among most bilingual or multilingual users of digital spaces, its use to spread misinformation is relatively less explored. In Kenya, for instance, the use of code-switched Swahili-English is prevalent on social media. Our main objective in this paper was to systematically re- view code-switching, particularly the use of Swahili-English code-switching to spread misinformation on social media in the Kenyan context. Additionally, we aimed at pre-processing a Swahili-English code-switched dataset and developing a misinformation classification model trained on this dataset. We discuss the process we took to develop the code- switched Swahili-English misinformation classification model. The model was trained and tested using the PolitiKweli dataset which is the first Swahili-English code-switched dataset curated for misinformation classification. The dataset was collected from Twitter (now X) social media platform, focusing on text posted during the electioneering period of the 2022 general elections in Kenya. The study experimented with two types of word embeddings - GloVe and FastText. FastText uses character n-gram representations that help generate meaningful vectors for rare and unseen words in the code-switched dataset. We experimented with both the classical machine learning algorithms and deep learning algo- rithms. Bidirectional Long Short-Term Memory Networks (BiLSTM) algorithm showed the best performance with an f-score of 0.89. The model was able to classify code-switched Swahili-English political misinformation text as fake, fact or neutral. This study contributes to recent research efforts in developing language models for low-resource languages.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals