Work place: Department of Computer Applications, Maulana Azad National Institute of Technology, Bhopal-462003, India
E-mail: aartikumar01@gmail.com
Website:
Research Interests: Information Systems, Information Retrieval, Information Storage Systems, Multimedia Information System
Biography
Aarti Kumar was born in Patna, India in 1963. She has done her Masters in Botany in 1983 from Patna University, India and in Computer Applications in 2005 from Indira Gandhi National Open University, India and is a university topper of Bachelors in Education (1999) from Barkatullah University, India.
She is currently pursuing her Ph. D. in Computer Applications from Maulana Azad National Institute of Technology (MANIT), Bhopal, India. Her area of research is Cross-Language Information Retrieval, more specifically English-Hindi Journalistic Text Reuse. She has a teaching Experience of 18 years and a research experience of more than three years. Her published works include:
Mrs. Kumar is an Associate Member, Information Retrieval Society of India (IRSI) and a Professional Member, Association for Computing Machinery (ACM).
DOI: https://doi.org/10.5815/ijitcs.2016.08.09, Pub. Date: 8 Aug. 2016
Linking and tracking news stories covering the same events written in different languages is a challenging task. In natural languages same information may be expressed in multiple ways and newspapers try to exploit this feature for making the news stories more appealing. It has been observed that the same news story is presented in same as well as in different language in different ways but normally the gist remains the same. Diversity of linguistic expressions presents a major challenge in identifying and tracking news stories covering the same events across languages, but doing so may provide rich and valuable resources as comparable and parallel corpora can be generated with this resource. In the case of Indian languages there exist limited language resources for Natural Language Processing and Information Retrieval tasks and identifying comparable and parallel documents would offer a potential source for deriving bilingual dictionaries and training statistical Machine Translation systems. Paraphrasing is the most common way of reproducing news stories and translated text is also a type of paraphrase. Prior to linking monolingual or bilingual news stories, these paraphrase types need to identified and classified to help researchers to devise techniques to solve these challenging problems. English-Hindi language pair not only differs in their scripts but also in their grammar and vocabulary. A number of paraphrase typologies have been built from the perspective of Natural Language Processing or for some or the other specific applications but as per the knowledge of the authors, no typology have been reported for English-Hindi cross language text reuse. In this paper a typology is formulated for cross lingual journalistic text reuse in English-Hindi. Typology unravels level of difficulties in English-Hindi mapping. It shall help in devising techniques for linking and tracking English-Hindi stories.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals