International Journal of Information Technology and Computer Science(IJITCS)

ISSN: 2074-9007 (Print), ISSN: 2074-9015 (Online)

Published By: MECS Press

IJITCS Vol.8, No.11, Nov. 2016

Arabic Text Categorization Using Mixed Words

Full Text (PDF, 577KB), PP.74-81

Views:125   Downloads:6


Mahmoud Hussein, Hamdy M. Mousa, Rouhia M.Sallam

Index Terms

Arabic Text Categorization;Frequency Ratio Accumulation Method;Term and Document Frequency;Features Selection;and Mixed Words


There is a tremendous number of Arabic text documents available online that is growing every day. Thus, categorizing these documents becomes very important. In this paper, an approach is proposed to enhance the accuracy of the Arabic text categorization. It is based on a new features representation technique that uses a mixture of a bag of words (BOW) and two adjacent words with different proportions. It also introduces a new features selection technique depends on Term Frequency (TF) and uses Frequency Ratio Accumulation Method (FRAM) as a classifier. Experiments are performed without both of normalization and stemming, with one of them, and with both of them. In addition, three data sets of different categories have been collected from online Arabic documents for evaluating the proposed approach. The highest accuracy obtained is 98.61% by the use of normalization.

Cite This Paper

Mahmoud Hussein, Hamdy M. Mousa, Rouhia M.Sallam,"Arabic Text Categorization Using Mixed Words", International Journal of Information Technology and Computer Science(IJITCS), Vol.8, No.11, pp.74-81, 2016. DOI: 10.5815/ijitcs.2016.11.09


