Extraction of Root Words using Morphological Analyzer for Devanagari Script

Sharvari S. Govilkar, J. W. Bakal, Sagar R. Kulkarni

Morphological analyzer;text mining;tokenization;stop words in Devanagari; suffixes in Devanagari; stemming;removing inflections using rules


In India, more than 300 million people use Devanagari script for documentation. In Devanagari script, Marathi and Hindi are mainly used as primary language of Maharashtra state and national language of India respectively. As compared with English script, Devanagari script is reach of morphemes. Thus the lemmatization of Devanagari script is quite complex than that of English script. There is lack of resources for Devanagari script such as WordNet, ontology representation, parsing the keywords and their part of speech. Thus the overall task of information retrieval becomes complex and time consuming. Devanagari script document always carries suffixes which may cause problem in accurate information retrieval. We propose a method of extracting root words from Devanagari script document which can be used for information retrieval, text summarization, text categorization, ontology building etc. An attempt is made to design the Morphological Analyzer for Devanagari script. We have designed CORPUS containing more than 3000 possible stop words and suffixes for Marathi language. Morphological Analyzer can acts as a preliminary stage for developing any information retrieval application in Devanagari script. We have conducted the experiments on randomly selected Marathi documents and we found the accuracy of designed morphological analyzer is up to 96%.

Sharvari S. Govilkar, J. W. Bakal, Sagar R. Kulkarni,"Extraction of Root Words using Morphological Analyzer for Devanagari Script", International Journal of Information Technology and Computer Science(IJITCS), Vol.8, No.1, pp.33-39, 2016. DOI: 10.5815/ijitcs.2016.01.04


