Author Based Rank Vector Coordinates (ARVC) Model for Authorship Attribution

Full Text (PDF, 388KB), PP.68-75

Views: 0 Downloads: 0

Author(s)

N V Ganapathi Raju 1,* V.Vijaya Kumar 2 O Srinivasa Rao 3

1. GRIET, Hyderabad, Research Scholar, JNTU Kakinada, India

2. Anurag Institutions, Hyderabad, India

3. Dept of CSE, JNTUK, Kakinada.

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2016.05.06

Received: 22 Jan. 2016 / Revised: 3 Mar. 2016 / Accepted: 30 Mar. 2016 / Published: 8 May 2016

Index Terms

Threshold, Common words, Integration, ARVC model, SVD technique

Abstract

Authorship attribution is one of the important problem, with many applications of practical use in the real-world. Authorship identification determines the likelihood of a piece of writing produced by a particular author by examining the other writings of that author. Most of the research in this field is carried out by using instance based model. One of the disadvantages of this model is that it treats the different documents of each author differently. It produces a matrix per each document of the author, thus creating a huge number of matrices per author, i.e. the dimensionality is very high. This paper presents authorship identification using Author based Rank Vector Coordinates (ARVC) model. The advantage of the proposed ARVC model is that it integrates all the author's profile documents into a single integrated profile document (IPD) and thus overcomes the above disadvantage. To overcome the ambiguity created by common words of authors ARVC model removes the common words based on a threshold. Singular value decomposition (SVD) is used on IPD after removing the common words. To reduce the overall dimension of the matrix, without affecting its semantic meaning a rank-based vector coordinates are derived. The eigenvector features are derived on ARVC model. The present paper used cosine similarity measure for author attribution and carries out authorship attribution on English poems and editorial documents

Cite This Paper

N V Ganapathi Raju, V Vijay Kumar, O Srinivasa Rao,"Author Based Rank Vector Coordinates (ARVC) Model for Authorship Attribution", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.8, No.5, pp.68-75, 2016. DOI: 10.5815/ijigsp.2016.05.06

Reference

[1]Efstathios Stamatatos, "A Survey of Modern Authorship Attribution Methods", Journal of the American Society for Information Science and Technology, Volume 60 Issue 3, Pages 538-556, March 2009. 

[2]P. Juola, "Authorship Attribution", Journal of Foundations and Trends in Information Retrieval, Vol 1, Issue 3, 2006, pp 233-334, 7 March 2008.

[3]Satyam, Anand, Arnav Kumar Dawn, and Sujan Kumar Saha, "A Statistical Analysis Approach to Author Identification using Latent Semantic Analysis", Notebook for PAN at CLEF 2014.

[4]Thomas K Landauer, Peter W. Foltz, Darrell Laham, "An Introduction to Latent Semantic Analysis", Discourse Processes, Volume 25, Pages 259-284, 1998.

[5]Scott Deerwester, Susan T. Dumais George W. Furnas Thomas K. Landauer, Richard Harshman, "Indexing by Latent Semantic Analysis", Journa of the American Society for Information Science 41 (6): Pages 391–407, 1990.

[6]Esteban Castillo, Ofelia Cervantes, DarnesVilariño, David Pinto, "Unsupervised method for the authorship identification task Notebook for PAN at CLEF 2014".

[7]Victor Wennberg, "A Structural Approach to Authorship Attribution using Dependency Grammars", Bachelor of Science Thesis, Fall 2012.

[8]Barbara Rosario, "Latent Semantic Indexing: An overview", College of Engineering,, Michigan state university, Springer 2000.

[9]Edel Garcia, "Latent Semantic Indexing (LSI) A Fast Track Tutorial", 2006.

[10]Efstathios Stamatatos, Walter Daelemans, Ben Verhoeven, Martin Potthast, Benno Stein, Patrick Juola, Miguel A. Sanchez-Perez, "Overview of the Author Identification Task at PAN 2014", Proceedings of CLEF Conference on Authorship Identification, Sep 2014.

[11]Online edition (c) 2009 Cambridge UP" Matrix Decompositions and latent semantic indexing". Cambridge University Press, 2009.

[12]Latent semantic indexing—Wikipedia, the Free Encyclopedia, 2014.

[13]V.Vijay Kumar, N V Ganapathi Raju, O. Srinivasa Rao,"Histograms of Term Weight Feature (HTWF) model for Authorship Attribution", International Journal of applied Engineering Research, vol. 10, no. 16, pages 36622-36628, 2015.

[14]Moshe Koppel, Jonathan Schler, Shlomo Argamon. "Computational Methods in Authorship Attribution", Journal of the American Society for Information Science and Technology, Volume 60 Issue 1, Pages 9-26, January 2009.

[15]David L. Hoover, "Word Frequency, Statistical Stylistics, and Authorship Attribution", AHRC ICT Methods Network, Centre for Computing in the Humanities, 2006.

[16]Burrows, J.F.," 'Delta': A measure of stylistic difference and a guide to likely authorship", Literary and Linguistic Computing, 17(3), 267-287, 2002.

[17]Michael W. Berry, Susan T. Dumais, Todd A. Letsche, "Computational Methods for Intelligent Information Access", Proceedings of IEEE/ACM 1995 conference.

[18]Gavin W. O'Brien, Gavin W. O'Brien," Information Management Tools for Updating an SVD-Encoded Indexing Scheme", October 1994.

[19]T.V. Madhusudhana Rao,S.Pallam Setty, Y.Srinivas, "An Efficient System for Medical Image Retrieval using Generalized Gamma Distribution", I.J. Image, Graphics and Signal Processing,May 2015.

[20]Ibrahim S. I. Abuhaiba,Ruba A. A. Salamah, "Efficient Global and Region Content Based Image Retrieval", IJIGSP, vol.4, no.5, pp.38-46, 2012.

[21]Mohamed M. Fouad,"Content-based Search for Image Retrieval", IJIGSP, vol.5, no.11, pp.46-52, 2013.

[22]Hiroki Kobayashi, Masashi Toda, "Utilization of Textural Features in Video Retrieval System by Hand-writing Sketch", I.J. Image, Graphics and Signal Processing, August 2012.

[23]Moshe Koppel, Jonathan Schler, Shlomo Argamon,"Computational Methods in Authorship Attribution", Journal of the American Society for Information Science and Technology, Volume 60 Issue 1, Pages 9-26, January 2009.

[24]W. Oliveira Jr, E. Justino, L.S. Oliveira, Comparing compression models for authorship attribution", Forensic Science International, pages 100–104, 2013.