Entailment and Spectral Clustering based Single and Multiple Document Summarization

Full Text (PDF, 537KB), PP.39-51

Views: 0 Downloads: 0

Author(s)

Anand Gupta 1,* Manpreet Kaur 1 Ahsaas Bajaj 1 Ansh Khanna 1

1. Netaji Subhas Institute of Technology, New Delhi, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2019.04.04

Received: 26 Nov. 2018 / Revised: 1 Jan. 2019 / Accepted: 13 Jan. 2019 / Published: 8 Apr. 2019

Index Terms

Automatic Summarization, Textual Entailment, Spectral Clustering, Information Retrieval, Extractive Techniques, Natural Language Processing

Abstract

Text connectedness is an important feature for content selection in text summarization methods. Recently, Textual Entailment (TE) has been successfully employed to measure sentence connectedness in order to determine sentence salience in single document text summarization. In literature, Analog Textual Entailment and Spectral Clustering (ATESC) is one such method which has used TE to compute inter-sentence connectedness scores. These scores are used to compute salience of sentences and are further utilized by Spectral Clustering algorithm to create segments of sentences. Finally, the most salient sentences are extracted from the most salient segments for inclusion in the final summary. The method has shown good performance earlier. But the authors observe that TE has never been employed for the task of multi-document summarization. Therefore, this paper has proposed ATESC based new methods for the same task. The experiments conducted on DUC 2003 and 2004 datasets reveal that the notion of Textual Entailment along with Spectral Clustering algorithm proves to be an effective duo for redundancy removal and generating informative summaries in multi-document summarization. Moreover, the proposed methods have exhibited faster execution times.

Cite This Paper

Anand Gupta, Manpreet Kaur, Ahsaas Bajaj, Ansh Khanna, "Entailment and Spectral Clustering based Single and Multiple Document Summarization", International Journal of Intelligent Systems and Applications(IJISA), Vol.11, No.4, pp.39-51, 2019. DOI:10.5815/ijisa.2019.04.04

Reference

[1]Gupta A, Kathuria M, Singh A, Sachdeva A, Bhati S. Analog textual entailment and spectral clustering (atesc) based summarization. In International Conference on Big Data Analytics 2012 Dec 24 (pp. 101-110). Springer, Berlin, Heidelberg.
[2]Goldstein J, Mittal V, Carbonell J, Kantrowitz M. Multi-document summarization by sentence extraction. In Proceedings of the 2000 NAACL-ANLP Workshop on Automatic summarization 2000 Apr 30 (pp. 40-48). Association for Computational Linguistics.
[3]Dagan I, Glickman O, Magnini B. The PASCAL recognising textual entailment challenge. In Machine learning challenges. evaluating predictive uncertainty, visual object classification, and recognising textual entailment 2006 (pp. 177-190). Springer, Berlin, Heidelberg.
[4]Tatar D, Tamaianu-Morita E, Mihis A, Lupsa D. Summarization by logic segmentation and text entailment. Advances in Natural Language Processing and Applications. 2008;15:26.
[5]SKOROKHOD K. Adaptive method of automatic abstracting and indexing. Information Processing 71. 1972:1179-82.
[6]Hoey M. Patterns of lexis in text.
[7]Gupta A, Kaur M, Mirkin S, Singh A, Goyal A. Text summarization through entailment-based minimum vertex cover. In Proceedings of the Third Joint Conference on Lexical and Computational Semantics (* SEM 2014) 2014 (pp. 75-80).
[8]Monz C, de Rijke M. Light-weight entailment checking for computational semantics. In Proc. of the third workshop on inference in computational semantics (ICoS-3) 2001.
[9]Jain AK, Dubes RC. Algorithms for clustering data.
[10]Jin Y, editor. Multi-objective machine learning. Springer Science & Business Media; 2006 Feb 10.
[11]Hartigan JA, Wong MA. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics). 1979 Jan 1;28(1):100-8.
[12]Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL. Model-based clustering and data transformations for gene expression data. Bioinformatics. 2001 Oct 1;17(10):977-87.
[13]Ng AY, Jordan MI, Weiss Y. On spectral clustering: Analysis and an algorithm. In Advances in neural information processing systems 2002 (pp. 849-856).
[14]Sarkar K. Sentence clustering-based summarization of multiple text documents. TECHNIA–International Journal of Computing Science and Communication Technologies. 2009 Jul;2(1):325-35.
[15]Zhang PY, Li CH. Automatic text summarization based on sentences clustering and extraction. In Computer Science and Information Technology, 2009. ICCSIT 2009. 2nd IEEE International Conference on 2009 Aug 8 (pp. 167-170). IEEE.
[16]Hamad D, Biela P. Introduction to spectral clustering. In Information and Communication Technologies: From Theory to Applications, 2008. ICTTA 2008. 3rd International Conference on 2008 Apr 7 (pp. 1-6). IEEE.
[17]Alpert CJ, Kahng AB, Yao SZ. Spectral partitioning with multiple eigenvectors. Discrete Applied Mathematics. 1999 Jan 15;90(1-3):3-26.
[18]Tilton JC. Image segmentation by region growing and spectral clustering with a natural convergence criterion. In Geoscience and Remote Sensing Symposium Proceedings, 1998. IGARSS'98. 1998 IEEE International 1998 Jul 6 (Vol. 4, pp. 1766-1768). IEEE.
[19]Sidi O, van Kaick O, Kleiman Y, Zhang H, Cohen-Or D. Unsupervised co-segmentation of a set of shapes via descriptor-space spectral clustering. ACM; 2011 Dec 1.
[20]Lauer F, Schnörr C. Spectral clustering of linear subspaces for motion segmentation. In IEEE Int. Conf. on Computer Vision (ICCV'09) 2009 Sep (pp. to-appear).
[21]Lafon S, Lee AB. Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE transactions on pattern analysis and machine intelligence. 2006 Sep;28(9):1393-403.
[22]Kamvar K, Sepandar S, Klein K, Dan D, Manning M, Christopher C. Spectral learning. In International Joint Conference of Artificial Intelligence 2003 Apr 20. Stanford InfoLab.
[23]Fouss F, Pirotte A, Renders JM, Saerens M. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transactions on knowledge and data engineering. 2007 Mar;19(3):355-69.
[24]Zha H. Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval 2002 Aug 11 (pp. 113-120). ACM.
[25]Mani I. Summarization evaluation: An overview.
[26]Papineni K. Why inverse document frequency?. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001 Jun 2 (pp. 1-8). Association for Computational Linguistics.
[27]Mihalcea R, Tarau P. Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing 2004.
[28]Erkan G, Radev DR. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research. 2004 Dec 1;22:457-79.
[29]Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: Bringing order to the web. Stanford InfoLab; 1999 Nov 11.
[30]Chin-Yew L, Och FJ. Looking for a few good metrics: Rouge and its evaluation. In Proceedings of the 4th NTCIR Workshops 2004.
[31]Conroy JM, Schlesinger JD, Goldstein J, O’leary DP. Left-brain/right-brain multi-document summarization. In Proceedings of the Document Understanding Conference (DUC 2004) 2004 May 6.
[32]Miller GA. WordNet: a lexical database for English. Communications of the ACM. 1995 Nov 1;38(11):39-41.
[33]Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems 2013 (pp. 3111-3119).
[34]Pennington J, Socher R, Manning C. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) 2014 (pp. 1532-1543).