Work place: Dept. of Computer Science and Informatics, Faculty of Applied Sciences, Uva Wellassa University, Badulla, 90000, Sri Lanka
E-mail: jayalath@uwu.ac.lk
Website:
Research Interests: Pattern Recognition
Biography
Jayalath Ekanayake earned his BSc and MSc in Sri Lanka and his PhD in computer science at the University of Zurich, Switzerland.
He is a lecturer in Computer Science at the Uva Wellassa University, Sri Lanka. His research interest is pattern recognition.
Dr. Ekanayake is an IEEE member and also a life member of Sri Lanka Association of Advancement of Science (SLAAS).
DOI: https://doi.org/10.5815/ijitcs.2021.03.04, Pub. Date: 8 Jun. 2021
Reported bugs of software systems are classified into different severity levels before fixing them. The number of bug reports may not be equally distributed according to the severity levels of bugs. However, most of the severity prediction models developed in the literature assumed that the underlying data distribution is evenly distributed, which may not correct at all instances and hence, the aim of this study is to develop bug classification models from unevenly distributed datasets and tested them accordingly.
To that end first, the topics or keywords of developer descriptions of bug reports are extracted using Rapid Keyword Extraction (RAKE) algorithm and then transferred them into numerical attributes, which combined with severity levels constructs datasets. These datasets are used to build classification models; Naïve Bayes, Logistic Regression, and Decision Tree Learner algorithms. The models’ prediction quality is measured using Area Under Recursive Operative Characteristics Curves (AUC) as the models learnt from more skewed environments.
According to the results, the prediction quality of the Logistics Regression model is 0.65 AUC whereas the other two models recorded maximum 0.60 AUC. Though the datasets contain comparatively less number of instances from the high severity classes; Blocking and High, the Logistic Regression models predict the two classes with a decent AUC value of 0.65 AUC. Hence, this projects shows that the models can be trained from highly skewed datasets so that the models prediction quality is equally well over all the classes regardless of number of instances representing the class. Further, this project emphasizes that the models should be evaluated using the appropriate metrics when the models are trained from imbalance learning environments. Also, this work uncovers that the Logistic Regression model is also capable of classifying documents as Naïve Bayes, which is well known for this task.
Subscribe to receive issue release notifications and newsletters from MECS Press journals