Speaker Emotion Recognition based on Speech Features and Classification Techniques

Full Text (PDF, 873KB), PP.61-77

Views: 0 Downloads: 0

Author(s)

J. Sirisha Devi 1,* Srinivas Yarramalle 2 Siva Prasad Nandyala 3

1. Dept of IT, GRIET, Hyderabad, 501401, Andhra Pradesh, India

2. Dept of IT, GITAM University, Visakhapatnam, 530045, Andhra Pradesh, India

3. Dept of ECE, NIT Warangal, Warangal, 506004, Andhra Pradesh, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2014.07.08

Received: 4 Feb. 2014 / Revised: 20 Mar. 2014 / Accepted: 11 May 2014 / Published: 8 Jun. 2014

Index Terms

Emotion recognition, feature extraction, speaker recognition

Abstract

Speech Processing has been developed as one of the vital provision region of Digital Signal Processing. Speaker recognition is the methodology of immediately distinguishing who is talking dependent upon special aspects held in discourse waves. This strategy makes it conceivable to utilize the speaker's voice to check their character and control access to administrations, for example voice dialing, data administrations, voice send, and security control for secret information. 
A review on speaker recognition and emotion recognition is performed based on past ten years of research work. So far iari is done on text independent and dependent speaker recognition. There are many prosodic features of speech signal that depict the emotion of a speaker. A detailed study on these issues is presented in this paper.

Cite This Paper

J. Sirisha Devi, Srinivas Yarramalle, Siva Prasad Nandyala,"Speaker emotion recognition based on speech features and classification techniques", IJIGSP, vol.6, no.7, pp. 61-77, 2014. DOI: 10.5815/ijigsp.2014.07.08

Reference

[1]Gish, H., Schmidt, M., 1994. Text-independent speaker recognition . IEEE Signal Process. Magazine (October), 18–32.

[2]Furui, S., 1981. Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal sProcess. 29 (2), 254–272.

[3]Gold, B. and Rabiner, L.R, Parallel processing techniques for estimating pitch periods of speech in time-domain.

[4]Y. Linde, A. Buzo, and R. M. Gray.: 'An algorithm for vector quantizer design," IEEE Trans. Commun.', vol. COM-28, no. 1, pp. 84-95, 1980. 

[5]A. Gersho, R.M. Gray.: 'Vector Quantization and Signal Compression', Kluwer Academic Publishers, Boston, MA, 1991. 

[6]Lawrence Rabiner, Biing-Hwang Juang and B.Yegnanarayana, "Fundamental of Speech Recognition ", Prentice-Hall, Englewood Cliffs, 2009.

[7]B. Schuller, S. Steidl, and A. Batliner, "The interspeech 2009 emotion challenge," in Interspeech (2009), ISCA, Brighton, UK, 2009.

[8]Serajul Haque, Roberto Togneri and, Anthony Zaknich," Zero Crossings with Peak Amplitudes and Perceptual Features for Robust Speech Recognition", http://www.ee.uwa.edu.au/~roberto/research/theses/tr0 6-01.pdf , March 2012.

[9]K. R. Scherer, "How emotion is expressed in speech and singing," in Proceedings of XIIIth International Congress of Phonetic Sciences., pp. 90-96. 1995.

[10]M. Kockmann, L. Ferrer, L. Burget, E. Shriberg, and J. H. Cernocký, "Recent progress in prosodic speaker verification," in Proc. IEEE ICASSP, (Prague), pp. 4556--4559, May 2011.

[11]Kumar, K. S., Reddy, M. S. H., Murty, K. S. R., & Yegnanarayana, B. (2009). Analysis of laugh signals for detecting in continuous speech. In INTERSPEECH-09, Brighton, UK, September 6–10 (pp. 1591–1594).

[12]Koolagudi, S. G., & Rao, K. S. (2009). Exploring speech features for classifying emotions along valence dimension. In S. Chandhury, et al. (Eds.), LNCS. The 3rd international conference on pattern recognition and machine intelligence (PreMI-09), IIT Delhi, December 2009 (pp. 537–542). Heidelberg: Springer.

[13]Bitouk, D., Verma, R., & Nenkova, A. (2010). Class-level spectral features for emotion recognition. Speech Communication, 52(7–8), 613–625.

[14] Iliou, T., & Anagnostopoulos, C. N. (2009). Statistical evaluation of speech features for emotion recognition. In Fourth international conference on digital telecommunications, Colmar, France, July 2009 (pp. 121–126).

[15]Kamaruddin, N., & Wahab, A. (2009). Features extraction for speech emotion. Journal of Computational Methods in Science and Engineering, 9(9), 1–12.

[16]Rao, K. S., & Yegnanarayana, B. (2009). Intonation modeling for Indian languages. Computer Speech and Language, 23, 240–256.

[17]Rao, K. S., Prasanna, S. R. M., & Yegnanarayana, B. (2007). Determination of instants of significant excitation in speech using Hilbert envelope and group delay function. IEEE Signal Processing Letters, 14, 762–765.

[18]Prasanna, S. R. M., Reddy, B. V. S., & Krishnamoorthy, P. (2009). Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Transactions on Audio, Speech, and Language Processing, 17, 556–565.

[19]Yegnanarayana, B., Swamy, R. K., & Murty, K. S. R. (2009). Determining mixing parameters from multispeaker data using speech specific information. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 1196–1207.

[20]Lugger, M., & Yang, B. (2007). The relevance of voice quality features in speaker independent emotion recognition. In ICASSP, Honolulu, Hawaii, USA, May 2007 (pp. IV17–IV20). New York: IEEE.

[21]Zhang, S. (2008). Emotion recognition in Chinese natural speech by combining prosody and voice quality features. In Sun, et al. (Eds.), Lecture notes in computer science. Advances in neural networks (pp. 457–464). Berlin: Springer.

[22]W. Wang, A. Kathol, and H. Bratt, "Automatic detection of speaker attributes based on utterance text," in Proc. Interspeech, (Florence, Italy), pp. 2361- 2364, August 2011.

[23]Khalid Saeed and Mohammad Kheir Nammous, "A Speech-and-Speaker Identification System: Feature Extraction, Description, and Classification of Speech-Signal Image", IEEE Transactions on Industrial Electronics Vol. 54, No.2, April 2007, pp. 887-897.

[24]Nitisha and Ashu Bansal, "Speaker Recognition Using MFCC Front End Analysis and VQ Modelling Technique for Hindi Words using MATLAB", Hindu College of Engineering, Haryana, India.

[25]Marius Vasile Ghiurcau , Corneliu Rusu,Jaakko Astola, "A Study Of The Effect Of Emotional State Upon Text-Independent Speaker Identification", ICASSP 2011.

[26]M.A.Anusuya , S.K.Katti ,"Speech Recognition by Machine: A Review", International Journal of Computer Science and Information Security, Vol. 6, No. 3, 2009.

[27]Shashidhar G. Koolagudi • K. Sreenivasa Rao," Emotion recognition from speech using source, system,and prosodic features", Springer Science+Business Media, LLC 2012.

[28]Jae-Bok Kim, Jeong-Sik Park, Yung Hwan Oh," Speaker-Characterized Emotion Recognition using Online and Iterative Speaker Adaptation", Springer Science+Business Media, LLC 2012.

[29]Yong-Soo Seol, Han-Woo Kim and Dong-Joo Kim, "Emotion Recognition from Textual Modality Using a Situational Personalized Emotion Model", International Journal of Hybrid Information Technology Vol. 5, No. 2, April, 2012.

[30]Kartik Audhkhasi, Shrikanth S. Narayanan," Emotion classification from speech using evaluator reliability-weighted combination of ranked lists", ICASSP 2011.

[31]Prof .Sujata Pathak , Prof .Arun Kulkarni ," Recognising emotions from speech", IEEE Transactions On Audio, Speech, And Language Processing , 2011 IEEE.

[32]Muzaffar Khan, Tirupati Goskula, Mohmmed Nasiruddin ,Ruhina Quazi," Comparison between k-nn and svm method for speech emotion recognition", International Journal On Computer Science And Engineering, Vol. 3 No. 2 Feb 2011.

[33]Priyanka Abhang, Shashibala Rao, Bharti W. Gawali, Pramod Rokade, "Emotion Recognition using Speech and EEG Signal – A Review", International Journal Of Computer Applications (0975 – 8887) Volume 15– No.3, February 2011.

[34]N. Murali Krishna, P.V. Lakshmi, Y. Srinivas J.Sirisha Devi, "Emotion Recognition using Dynamic Time Warping Technique for Isolated Words", IJCSI International Journal Of Computer Science Issues, Vol. 8, Issue 5, No 1, September 2011.

[35]Krishna Mohan Kudiri, Gyanendra K Verma and Bakul Gohel," Relative amplitude based feature for emotion detection from speech", IEEE Transactions On Audio, Speech, And Language Processing , 2010.

[36]Tsang-Long Pao, Jun-Heng Yeh, Yao-Wei Tsai," Recognition and analysis of emotion transition in Mandarin speech signal", IEEE Transactions On Audio, Speech, And Language Processing , 2010.

[37]Emily Mower, Maja J Mataric, Shrikanth Narayanan," A framework for automatic human emotion classification using emotion profiles", IEEE Transactions On Audio, Speech, And Language Processing, Vol. 19, No. 5, July 2011.

[38]Ying Shi , Weihua SONG," Speech emotion recognition based on data mining technology", 2010 Sixth International Conference on Natural Computation (ICNC 2010).

[39]Aditya Bihar Kandali, Aurobinda Routray, Tapan Kumar Basu," Emotion recognition from Assamese speeches using MFCC features and GMM classifier", Tencon 2008 - 2008 IEEE Region 10 Conference.

[40]Daniel Neiberg, Kjell Elenius, Inger Karlsson1, and Kornel Laskowski," Emotion Recognition in Spontaneous Speech", Lund University, Centre For Languages & Literature, Dept. Of Linguistics & Phonetics Working Papers 52 (2006), 101–104.

[41]Oh-Wook Kwon, Kwokleung Chan, Jiucang Hao, Te-Won Lee," Emotion Recognition by Speech Signals", Eurospeech 2003 - Geneva.

[42]Alejandro Bidondo, Shin-ichi Sato, Ezequiel Kinigsberg, Adrián Saavedra, Andrés Sabater, Agustín Arias, Mariano Arouxet, and Ariel Groisman, "Speaker recognition analysis using running autocorrelation function parameters", POMA - ICA 2013 Montreal Volume 19, pp. 060036 (June 2013).

[43]Taufiq Hasan, Seyed Omid Sadjadi, Gang Liu, Navid Shokouhi, Hynek Boˇril, John H.L. Hansen," CRSS SYSTEMS FOR 2012 NIST SPEAKER RECOGNITION EVALUATION", ICASSP 2013.

[44]M. Afzal Hossan • Mark A. Gregory," Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization", Int J Speech Technol (2013), Springer Science+Business Media, LLC 2012.

[45]Taufiq Hasan, John H. L. Hansen," Acoustic Factor Analysis for Robust Speaker Verification", IEEE Transactions On Audio, Speech, And Language Processing, Vol. 21, No. 4, April 2013.

[46]David A. van Leeuwen and Rahim Saeidi," Knowing The Non-Target Speakers: The Effect Of The I-Vector Population For Plda Training In Speaker Recognition", ICASSP 2013.

[47]Gang Liu, Taufiq Hasan, Hynek Bořil, John H.L. Hansen," An Investigation On Back-End For Speaker Recognition In Multi-Session Enrollment", ICASSP 2013.

[48]Balaji Vasan Srinivasan, Yuancheng Luo, Daniel Garcia-Romero, Dmitry N. Zotkin, and Ramani Duraiswami," A Symmetric Kernel Partial Least Squares Framework for Speaker Recognition", IEEE Transactions On Audio, Speech, And Language Processing, Vol. 21, No. 7, 

July 2013.

[49]Tomi Kinnunen, Rahim Saeidi, Filip Sedlák, Kong Aik Lee, Johan Sandberg, Maria Hansson-Sandsten, Haizhou Li," Low-Variance Multitaper MFCC Features: A Case Study in Robust Speaker Verification", IEEE Transactions On Audio, Speech, And Language Processing, Vol. 20, No. 7, September 2012.

[50]Tobias May, Steven van de Par, and Armin Kohlrausch," Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling", IEEE Transactions On Audio, Speech, And Language Processing, Vol. 20, No. 1, January 2012.

[51]Wen Wang, Andreas Kathol, Harry Bratt," Automatic Detection of Speaker Attributes Based in Utterance Text", INTERSPEECH, page 2361-2364. ISCA, (2011).

[52]Akshay S. Utane, Dr. S. L. Nalbalwar, "Emotion Recognition through Speech Using Gaussian Mixture Model and Support Vector Machine", International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013.

[53]Marc Lanze Ivan C. Dy, Ivan Vener L. Espinosa, Paul Patrick V. Go, Charles Martin M. Mendez, Jocelynn W. Cu, "Multimodal Emotion Recognition Using a Spontaneous Filipino Emotion Database", IEEE Transactions On Audio, Speech, And Language Processing , 2010.

[54]Sheguo Wang,Xuxiong Ling,Fuliang Zhang,Jianing Tong," Speech Emotion Recognition Based on Principal Component Analysis and Back Propagation Neural Network", 2010 International Conference on Measuring Technology and Mechatronics Automation.

[55]Sanghamitra Mohanty , Basanta Kumar Swain," Emotion Recognition using Fuzzy K-Means from Oriya Speech", 2010 for International Conference [ACCTA-2010], 3-5 August 2010, Special Issue of IJCCT.

[56]Firoz Shah.A, Raji Sukumar.A, Babu Anto.P," Automatic Emotion Recognition from Speech using Artificial Neural Networks with Gender- Dependent Databases", 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies.

[57]S. Das1, A. Halder, P. Bhowmik, A. Chakraborty, A. Konar, R. Janarthanan," A Support Vector Machine Classifier of Emotion from Voice and Facial Expression Data ",IEEE Transactions On Audio, Speech, And Language Processing 2009.

[58]M.D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q.V. Le,P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, , G.E. Hinton, " ON RECTIFIED LINEAR UNITS FOR SPEECH PROCESSING", Zeileretal_ICASSP13.

[59]A. B. Ingale, D. S. Chaudhari,"Speech Emotion Recognition", Int'l Journal of Soft Computing and Engineering, vol-2, Issue-1, pp 235-238, Mar. 2012.

[60]C. Busso, S. Lee and S. Narayanan, "Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection", IEEE Trans. on Audio, Speech and Language processing, vol. 17, no. 4, pp 582-596,May 2009.

[61]I. Luengo, E. Navas, I. Hernáez , "Feature Analysis and Evaluation for Automatic Emotion Identification in Speech", IEEE Trans. on Multimedia, vol. 12, no. 6,pp 1117-1127, Oct. 2010.

[62]Chung-Hsien Wu and Wei-Bin Liang, "Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels", IEEE Trans. on Affective Computing, vol. 2, no. 1, pp 10-21,Jan-Mar 2011.

[63]Manish P. Kesarkar, Prof. Preeti Rao, "FEATURE EXTRACTION FOR SPEECH RECOGNITON", M.Tech. Credit Seminar Report, Electronic Systems Group, EE. Dept, IIT Bombay, Submitted November2003.

[64]Garg, Vipul , Kumar, Harsh ; Sinha, Rohit," Speech based Emotion Recognition based on hierarchical decision tree with SVM, BLG and SVR classifiers" Communications (NCC), 2013 National Conference.

[65]Nitin Trivedi, Dr. Vikesh Kumar, Saurabh Singh, Sachin Ahuja, Raman Chadha, "Speech Recognition by Wavelet Analysis", International Journal of Computer Applications (0975 – 8887) Volume 15– No.8, February 2011.

[66]W.M.Campbell, J.P.Campbell, T.P. Gleason, D.A. Reynolds, and T.R.Leek,'High-Level Speaker Verification With Support Vector Machines,' ICASSP, 2004.

[67]R. Schwartz, J. Campbell, W. Shen, D. E. Sturim, W. M. Campbell, F. S. Richardson, R. B. Dunn et al. USSSMITLL 2010 human assisted speaker recognition. Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference, pp. 5904–5907 , (2011).