IJIEEB Vol. 9, No. 3, 8 May 2017
Cover page and Table of Contents: PDF (size: 731KB)
Full Text (PDF, 731KB), PP.36-42
Views: 0 Downloads: 0
Automatic Text Classification, Flat Classification, Hierarchical Classification, Machine Learning, Support Vector Machine (SVM)
The advancement of the present day technology enables the production of huge amount of information. Retrieving useful information out of these huge collections necessitates proper organization and structuring. Automatic text classification is an inevitable solution in this regard. However, the present approach focuses on the flat classification, where each topic is treated as a separate class, which is inadequate in text classification where there are a large number of classes and a huge number of relevant features needed to distinguish between them. This paper aimed to explore the use of hierarchical structure for classifying a large, heterogeneous collection of Amharic News Text. The approach utilizes the hierarchical topic structure to decompose the classification task into a set of simpler problems, one at each node in the classification tree. An experiment had been conducted using a categorical data collected from Ethiopian News Agency (ENA) using SVM to see the performances of the hierarchical classifiers on Amharic News Text. The findings of the experiment show the accuracy of flat classification decreases as the number of classes and documents (features) increases. Moreover, the accuracy of the flat classifier decreases at an increasing number of top feature set. The peak accuracy of the flat classifier was 68.84 % when the top 3 features were used. The findings of the experiment done using hierarchical classification show an increasing performance of the classifiers as we move down the hierarchy. The maximum accuracy achieved was 90.37% at level-3(last level) of the category tree. Moreover, the accuracy of the hierarchical classifiers increases at an increasing number of top feature set compared to the flat classifier. The peak accuracy was 89.06% using level three classifier when the top 15 features were used. Furthermore, the performance between flat classifier and hierarchical classifiers are compared using the same test data. Thus, it shows that use of the hierarchical structure during classification has resulted in a significant improvement of 29.42 % in exact match precision when compared with a flat classifier.
Alemu Kumilachew Tegegnie, Adane Nega Tarekegn, Tamir Anteneh Alemu, "A Comparative Study of Flat and Hierarchical Classification for Amharic News Text Using SVM", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.9, No.3, pp.36-42, 2017. DOI:10.5815/ijieeb.2017.03.05
[1]Rennie, Jason D. M. Improving Multi-Class Text Classification with Naive Bayes. Massachusetts Institute of Technology, Masters Thesis, 2001.
[2]Klein, B. Text Classification Using Machine Learning. Journal of Theoretical and Applied Information Technology. 2004.
[3]D'Alessio, S., Murray, K., Schiaf_no, R., & Kershenbaum, A. The effect of using hierarchical classifiers in text categorization. Proceeding of RIAO-00, 6 International Conferences, 2000.
[4]Koller & Sahami. Hierarchically classifying documents using very few words. The 14th national conference on machine learning. Computer Science department, Stanford University , 1997
[5]Ethnologue. 2004. Languages of the World, 14th Edition.
[6]Yohannes A. Automatic Amharic news text classification using Support Vector Machine approach. Department of Information Science, Addis Ababa University, Master’s Thesis, 2007.
[7]Zelalem Sintayehu. Automatic Classification of Amharic News Items: The Case of Ethiopian News Agency. School of Information Studies for Africa, Addis Ababa University, Addis Ababa, 2001.
[8]Surafel Teklu. Automatic categorization of Amharic news text: A machine learning approach. Department of Information Science, Addis Ababa University, Masters Thesis, 2003.
[9]Worku K. Automatic Amharic News Text Classification: A Neural network approach. Department of Information Science, Addis Ababa University, Master’s Thesis, 2009.
[10]Nega, Stemming of Amharic Words for Information Retrieval university of Sheffield, Sheffield, UK , 2002.
[11]J. Han and M. Kamber. Data Mining: Concepts and techniques (2nd ed.). Morgan Kaufmann Publishers, 2006.
[12]Yuchen Fu,Yuanhu Cheng. “Application of an integrated support vector regression method in prediction of financial returns”. International Journal of Information Engineering and Electronic Business (IJIEEB), Vol.3, No.3, June 2011.
[13]Muhsin Hassan, et al.” Reducing Support Vector Machine Classification Error by Implementing Kalman Filter”, International Journal of Intelligent Systems and Applications(IJISA), Vol. 5, No. 9, August 2013.
[14]Le Hoang,et al. “Image Classification using Support Vector Machine and Artificial Neural Network”, International Journal of Information Technology and Computer Science(IJITCS), Vol. 4, No. 5, Mayl 2012.
[15]F. Sebastiani. Machine learning in Automated Text Categorization-in ACM Computing surveys 34(1), 2002, pages 1-47.
[16]T. Joachims, Text categorization with support vector machines: learning with many relevant features, Proceedings of ECML 98, 10th European Conference on Machine Learning (Chemnitz, Germany, 1998). Pages 137–142.