Work place: Computer Engineering Department, Islamic University, P. O. Box 108, Gaza, Palestine
E-mail:
Website:
Research Interests: Artificial Intelligence
Biography
Hassan M. Dawoud received his B.Sc. degree in computer engineering, Islamic University of Gaza in 2003, and master degree in computer engineering, Islamic University of Gaza, in 2014. He research interests include artificial intelligence.
By Ibrahim S. I. Abuhaiba Hassan M. Dawoud
DOI: https://doi.org/10.5815/ijisa.2017.04.05, Pub. Date: 8 Apr. 2017
The objective of this research is to improve Arabic text documents classification by combining different classification algorithms. To achieve this objective we build four models using different combination methods.
The first combined model is built using fixed combination rules, where five rules are used; and for each rule we used different number of classifiers. The best classification accuracy, 95.3%, is achieved using majority voting rule with seven classifiers, and the time required to build the model is 836 seconds.
The second combination approach is stacking, which consists of two stages of classification. The first stage is performed by base classifiers, and the second by a meta classifier. In our experiments, we used different numbers of base classifiers and two different meta classifiers: Naïve Bayes and linear regression. Stacking achieved a very high classification accuracy, 99.2% and 99.4%, using Naïve Bayes and linear regression as meta classifiers, respectively. Stacking needed a long time to build the models, which is 1963 seconds using naïve Bayes and 3718 seconds using linear regression, since it consists of two stages of learning.
The third model uses AdaBoost to boost a C4.5 classifier with different number of iterations. Boosting improves the classification accuracy of the C4.5 classifier; 95.3%, using 5 iterations, and needs 1175 seconds to build the model, while the accuracy is 99.5% using 10 iterations and requires 1966 seconds to build the model.
The fourth model uses bagging with decision tree. The accuracy is 93.7% achieved in 296 seconds when using 5 iterations, and 99.4% when using 10 iteration requiring 471 seconds. We used three datasets to test the combined models: BBC Arabic, CNN Arabic, and OSAC datasets. The experiments are performed using Weka and RapidMiner data mining tools. We used a platform of Intel Core i3 of 2.2 GHz CPU with 4GB RAM.
The results of all models showed that combining classifiers can effectively improve the accuracy of Arabic text documents classification.
Subscribe to receive issue release notifications and newsletters from MECS Press journals