IJISA Vol. 11, No. 6, 8 Jun. 2019
Cover page and Table of Contents: PDF (size: 1147KB)
Full Text (PDF, 1147KB), PP.13-27
Views: 0 Downloads: 0
Aspect detection, big data, deep learning, latent semantic indexing, online learning, regularization, topic modeling
Due to freedom to express views, opinions, news, etc and easier method to disseminate the information to large population worldwide, social media platforms are inundated with big streaming data characterized by both short text and long normal text. Getting the glimpse of ongoing events happening over social media is quintessential from the viewpoint of understanding the trends, and for this, topic modeling is the most important step. With reference to increase in proliferation of big data streaming from social media platforms, it is crucial to perform large scale topic modeling to extract the topics dynamically in an online manner. This paper proposes an adaptive framework for dynamic topic modeling from big data using deep learning approach. Approach based on approximation of online latent semantic indexing constrained by regularization has been put forth. The model is designed using deep network of feed forward layers. The framework works in an adaptive manner in the sense that model is extracts incrementally according to streaming data and retrieves dynamic topics. In order to get the trends and evolution of topics, the framework supports temporal topic modeling, and enables to detect implicit and explicit aspects from sentences also.
Ajeet Ram Pathak, Manjusha Pandey, Siddharth Rautaray, "Adaptive Model for Dynamic and Temporal Topic Modeling from Big Data using Deep Learning Architecture", International Journal of Intelligent Systems and Applications(IJISA), Vol.11, No.6, pp.13-27, 2019. DOI:10.5815/ijisa.2019.06.02
[1]A. R. Pathak, M. Pandey, and S. Rautaray, “Construing the big data based on taxonomy, analytics and approaches,” Iran J. Comput. Sci., vol. 1, no. 4, pp. 237–259, Dec. 2018.
[2]D. M. Blei, “Probabilistic Topic Models,” Commun. ACM, vol. 55, no. 4, pp. 77–84, Apr. 2012.
[3]D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” J. Mach. Learn. Res., vol. 3, no. Jan, pp. 993–1022, 2003.
[4]T. Hofmann, “Probabilistic latent semantic analysis,” in Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, 1999, pp. 289–296.
[5]S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” J. Am. Soc. Inf. Sci., vol. 41, no. 6, pp. 391–407, 1990.
[6]X. Cheng, X. Yan, Y. Lan, and J. Guo, “Btm: Topic modeling over short texts,” IEEE Trans. Knowl. Data Eng., no. 1, p. 1, 2014.
[7]Y. Zuo, J. Zhao, and K. Xu, “Word network topic model: a simple but general solution for short and imbalanced texts,” Knowl. Inf. Syst., vol. 48, no. 2, pp. 379–398, Aug. 2016.
[8]K. Nigam, A. K. Mccallum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents using EM,” Mach. Learn., vol. 39, no. 2, pp. 103–134, May 2000.
[9]P. Xie and E. P. Xing, “Integrating document clustering and topic modeling,” arXiv Prepr. arXiv1309.6874, 2013.
[10]D. M. Blei, J. D. Lafferty, and others, “A correlated topic model of science,” Ann. Appl. Stat., vol. 1, no. 1, pp. 17–35, 2007.
[11]M. Hoffman, F. R. Bach, and D. M. Blei, “Online learning for latent dirichlet allocation,” in advances in neural information processing systems, 2010, pp. 856–864.
[12]Q. Wang, J. Xu, H. Li, and N. Craswell, “Regularized latent semantic indexing,” in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, 2011, pp. 685–694.
[13]L. AlSumait, D. Barbará, and C. Domeniconi, “On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking,” in Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, 2008, pp. 3–12.
[14]Y. Wang, E. Agichtein, and M. Benzi, “TM-LDA: efficient online modeling of latent topic transitions in social media,” in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012, pp. 123–131.
[15]X. Li, A. Zhang, C. Li, J. Ouyang, and Y. Cai, “Exploring coherent topics by topic modeling with term weighting,” Inf. Process. Manag., 2018.
[16]K. D. Kuhn, “Using structural topic modeling to identify latent topics and trends in aviation incident reports,” Transp. Res. Part C Emerg. Technol., vol. 87, pp. 105–122, 2018.
[17]S. Brody and N. Elhadad, “An Unsupervised Aspect-sentiment Model for Online Reviews,” in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2010, pp. 804–812.
[18]A. R. Pathak, M. Pandey, S. Rautaray, and K. Pawar, “Assessment of Object Detection Using Deep Convolutional Neural Networks,” in Intelligent Computing and Information and Communication, 2018, pp. 457–466.
[19]A. R. Pathak, M. Pandey, and S. Rautaray, “Deep Learning Approaches for Detecting Objects from Images: A Review,” in Progress in Computing, Analytics and Networking, 2018, pp. 491–499.
[20]A. R. Pathak, M. Pandey, and S. Rautaray, “Application of Deep Learning for Object Detection,” Procedia Comput. Sci., vol. 132, pp. 1706–1717, 2018.
[21]A. B. Dieng, C. Wang, J. Gao, and J. Paisley, “Topicrnn: A recurrent neural network with long-range semantic dependency,” arXiv Prepr. arXiv1611.01702, 2016.
[22]Y. Li, T. Liu, J. Hu, and J. Jiang, “Topical Co-Attention Networks for hashtag recommendation on microblogs,” Neurocomputing, vol. 331, pp. 356–365, 2019
[23]P. Gupta, F. Buettner, and H. Schütze, “Document informed neural autoregressive topic models,” arXiv Prepr. arXiv1808.03793, 2018
[24]K. Giannakopoulos and L. Chen, “Incremental and Adaptive Topic Detection over Social Media,” in International Conference on Database Systems for Advanced Applications, 2018, pp. 460–473
[25]Y. Zhang et al., “Does deep learning help topic extraction? A kernel k-means clustering method with word embedding,” J. Informetr., vol. 12, no. 4, pp. 1099–1117, 2018
[26]W. Gao, M. Peng, H. Wang, Y. Zhang, Q. Xie, and G. Tian, “Incorporating word embeddings into topic modeling of short text,” Knowl. Inf. Syst., pp. 1–23, 2018
[27]X. Li, Y. Wang, A. Zhang, C. Li, J. Chi, and J. Ouyang, “Filtering out the noise in short text topic modeling,” Inf. Sci. (Ny)., vol. 456, pp. 83–96, 2018
[28]H. Zhang, B. Chen, D. Guo, and M. Zhou, “WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling,” in International Conference on Learning Representations, 2018
[29]H. Abdi and L. J. Williams, “Principal component analysis,” Wiley Interdiscip. Rev. Comput. Stat., vol. 2, no. 4, pp. 433–459, 2010.
[30]H. Abdi, “Multivariate analysis,” Encycl. Res. methods Soc. Sci. Thousand Oaks Sage, pp. 699–702, 2003.
[31]H. Abdi and L. J. Williams, “Correspondence analysis,” Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010.