Lilian Wanzare

Work place: Maseno University, Department of Computer Science, Maseno, Kenya

E-mail: ldwanzare@maseno.ac.ke

Website:

Research Interests:

Biography

Dr. Lilian D. A. Wanzare is the co-founder of KenCorpus and the Chairperson of the Department of Computer Science at Maseno University. She holds a Bachelor of Science in Computer Science (University of Nairobi), Master of Science in Computer Science (Free University of Bozen-Bolzano), Master of Science in Language Technology (Universita¨t des Saarlandes) and Doctor of Philosophy in Computational Linguistics (Universita¨t des Saarlandes). Dr. Wanzare is one of the supervisors for Cynthia Amol and skilled in Natural Language Processing, Text mining, Knowledge Extraction, Knowledge acquisition, Programming (Python, Java, R) and Machine Learning. Her interests and passions lie in Natural Language Processing (NLP), Natural Language Understanding (NLU), Data Science, Machine Learning, building chatbots and personal assistants.

Author Articles
Modelling Misinformation in Swahili-English Code-switched Texts

By Cynthia Amol Lilian Wanzare James Obuhuma

DOI: https://doi.org/10.5815/ijitcs.2025.01.05, Pub. Date: 8 Feb. 2025

Code-switching, which is the mixing of words or phrases from multiple, grammatically distinct languages, introduces semantic and syntactic complexities to sentences which complicate automated text classification. Despite code-switching being a common occurrence in informal text-based communication among most bilingual or multilingual users of digital spaces, its use to spread misinformation is relatively less explored. In Kenya, for instance, the use of code-switched Swahili-English is prevalent on social media. Our main objective in this paper was to systematically re- view code-switching, particularly the use of Swahili-English code-switching to spread misinformation on social media in the Kenyan context. Additionally, we aimed at pre-processing a Swahili-English code-switched dataset and developing a misinformation classification model trained on this dataset. We discuss the process we took to develop the code- switched Swahili-English misinformation classification model. The model was trained and tested using the PolitiKweli dataset which is the first Swahili-English code-switched dataset curated for misinformation classification. The dataset was collected from Twitter (now X) social media platform, focusing on text posted during the electioneering period of the 2022 general elections in Kenya. The study experimented with two types of word embeddings - GloVe and FastText. FastText uses character n-gram representations that help generate meaningful vectors for rare and unseen words in the code-switched dataset. We experimented with both the classical machine learning algorithms and deep learning algo- rithms. Bidirectional Long Short-Term Memory Networks (BiLSTM) algorithm showed the best performance with an f-score of 0.89. The model was able to classify code-switched Swahili-English political misinformation text as fake, fact or neutral. This study contributes to recent research efforts in developing language models for low-resource languages.

[...] Read more.
Other Articles