Speaker Identification using SVM during Oriya Speech Recognition

Full Text (PDF, 410KB), PP.28-36

Views: 0 Downloads: 0

Author(s)

Sanghamitra Mohanty 1,* Basanta Kumar Swain 2

1. Department of Computer Science and Application, Utkal University, Bhubaneswar, Odisha, India

2. Department of Computer Science & Engineering, Government College of Engineering, Kalahandi, Bhawanipatna, Odisha, 766002, India.

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2015.10.04

Received: 17 Apr. 2015 / Revised: 8 Jul. 2015 / Accepted: 30 Jul. 2015 / Published: 8 Sep. 2015

Index Terms

Speaker identification, speech recognition, mel-frequency cepstral coefficients, gammatone frequency cepstral coefficients, support vector machine

Abstract

In this research paper, we have developed a system that identifies users by their voices and helped them to retrieve the information using their voice queries. The system takes into account speaker identification as well as speech recognition i.e. two pattern recognition techniques in speech domain. The conglomeration of speaker identification task and speech recognition task provides multitude of facilities in comparison to isolated approach. The speaker identification task is achieved by using SVM where as speech recognition is based on HMM. We have used two different types of corpora for training the system. Gamma tone cepstral coefficients and mel frequency cepstral coefficients are extracted for speaker identification and speech recognition respectively. The accuracy of the system is measured from two perspective i.e. accuracy of speaker identity and accuracy of speech recognition task. The accuracy of the speaker identification is enhanced by adopting the speech recognition at the initial stage of speaker identification.

Cite This Paper

Sanghamitra Mohanty, Basanta Kumar Swain,"Speaker Identification using SVM during Oriya Speech Recognition", IJIGSP, vol.7, no.10, pp.28-36, 2015. DOI: 10.5815/ijigsp.2015.10.04

Reference

[1]Jurafsky, D., Martin, J.H., Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Pearson Education Asia, 2000.

[2]Keshet, J., Bengio, S., Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods, John Wiley & Sons, 2009.

[3]Shaughnessy, D, Speech Communications Human and Machine, Universities Press, 2nd Edition, 2001.

[4]Campbell, J. P., "Speaker recognition: a tutorial," Proceedings of IEEE, vol. 85, no. 9, pp. 1437-1462, Sept. 1997.

[5]Patterson, R. D., Nimmo-Smith, I., Holdsworth, J. and Rice, P. "An efficient auditory filterbank based on the Gammatone function," Appl. Psychol. Unit, Cambridge University, 1988. 

[6]Assaleh, K. T. and R. J. Mammone, "Robust cepstral features for speaker identification," In Proc. of IEEE Int. Conf. Acoust., Speech, and Signal Processing,1994.

[7]Rose, P., Forensic Speaker Recognition. Taylor and Francis, Inc., New York, 2002.

[8]Nijhawan, G., Soni, M.K.,"A New Design Approach for Speaker Recognition Using MFCC and VAD", IJIGSP, vol.5, no.9, pp.43-49, 2013.DOI: 10.5815/ijigsp.2013.09.07.

[9]Han, J., Kamber, M. and Pei, J., Data Mining Concepts and Techniques, Elsevier, Third Edition. 2007.

[10]Mohanty, S, Swain,B.K, "Language identification using support vector machine" http://desceco.org/OCOSDA2010/proceedings/paper_43.pdf.

[11]Imen Trabelsi, Dorra Ben Ayed, Noureddine Ellouze, "Improved Frame Level Features and SVM Supervectors Approach for The Recogniton of Emotional States from Speech: Application to Categorical and Dimensional States", IJIGSP, vol.5, no.9, pp.8-13, 2013.DOI: 10.5815/ijigsp.2013.09.02.

[12]http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf.

[13]Boser, E., Guyon, I. and Vapnik, V. "A training algorithm for optimal margin classifiers," In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144{152. ACM Press, 1992.

[14]Schlkopf, B. and A. J.Smola, Learning with kernels: support vector machines, regularization, optimization, and beyond adaptive computation and machine learning. MITPress, Cambridge, 2002.

[15]Juang B. H, Pattern Recognition in Speech and Language Processing, CRC Press, 2003.

[16]Shao, Y., Srinivasan, S., Jin, Z. and Wang, D., "A computational auditory scene analysis system for speech segregation and robust speech recognition," Comput. Speech Lang. Elsevier, vol. 24, no. 1, pp. 77–93, 2010.

[17]Rabiner, L.R, Schafer, R.W, Digital Processing of Speech Signals, Pearson Education, 1st Edition, 2004.

[18]Quatieri, T.F., Discrete-Time Speech Signal Processing Principles and Practice, Pearson Education, Third Impression 2007.

[19]Mohanty, S. and Swain, B. K. "Speech Input-Output System in Indian Farming Sector," 2012 IEEE International Conference on Computational Intelligence and Computing Research.

[20]Rabiner, L.R, and Juang, B. H., Fundamentals of SpeechRecognition. Englewood Cliffs, NJ: Prentice Hall, 1993.

[21]Samudravijaya K. ,Smitha Nair, Minette D'lima, "Recognition of spoken number", Proc.sixth Int. workshop on Recent Trends in speech, Music and Allied Signal Processing, New Delhi, 2001, pp. 1-5.

[22]http://www.speech.cs.cmu.edu/cgi-bin/cmudict.

[23]Paul, L., et al. "Design of the CMU sphinx-4 decoder." INTERSPEECH. 2003.