IJEME Vol.8, No.1, Jan. 2018

Keyphrase Extraction of News Web Pages

Chandrakala Arya, Sanjay k. Dwivedi

Index Terms

Keyphrase extraction;Lexical chain;Web News;TF*IDF;WordNet


Keyphrase extraction from news web pages is an important task for news documents retrieval and summarization. Keyphrases are like index terms that enclose the important information about document content. Keyphrases actually offer concise and precise description of document content. Key phrases are considered as a single word or a combination of more than one word that represent the important concepts in a text documents. The aim of this paper is to develop and evaluate an automatic keyphrases extraction approach for news web pages. Our approach identifies the candidate keyphrases from documents and chooses those candidate keyphrase having highest weight score. Weight formula combines the feature set that includes TF*IDF, phrase disatnce in documents and lexical chain that is based on WordNet to represent semantic relations between words. The experimental results show that the performance of our approach is better than the contemporary approaches today.

Chandrakala Arya, Sanjay k. Dwivedi,"Keyphrase Extraction of News Web Pages", International Journal of Education and Management Engineering(IJEME), Vol.8, No.1, pp.48-58, 2018.DOI: 10.5815/ijeme.2018.01.06


