Efficient Modelling Technique based Speaker Recognition under Limited Speech Data

Full Text (PDF, 340KB), PP.41-48

Views: 0 Downloads: 0

Author(s)

Satyanand Singh 1,* Abhay Kumar 2 David Raju Kolluri 3

1. CMRIT, Department of ECE, Secunderabad, 500010, India

2. SSSUTMS, Department of CSE, Sehore, 466001, India

3. St.Peter’s Engineering College, Department of CSE, Secunderabad, 500014, India

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2016.11.06

Received: 30 Jun. 2016 / Revised: 17 Aug. 2016 / Accepted: 4 Oct. 2016 / Published: 8 Nov. 2016

Index Terms

Vector Quantization, Fuzzy c-means Vector Quantization, Fuzzy Vector Quantization2, Novel Fuzzy Vector Quantization, Objective Function

Abstract

As on date, Speaker-specific feature extraction and modelling techniques has been designed in automatic speaker recognition (ASR) for a sufficient amount of speech data. Once the speech data is limited the ASR performance degraded drastically. ASR system for limited speech data is always a highly challenging task due to a short utterance. The main goal of ASR to form a judgment for an incoming speaker to the system as being which member of registered speakers. This paper presents a comparison of three different modelling techniques of speaker specific extracted information (i) Fuzzy c-means (FCM) (ii) Fuzzy Vector Quantization2 (FVQ2) and (iii) Novel Fuzzy Vector Quantization (NFVQ). Using these three modelling techniques, we developed a text independent automatic speaker recognition system that is computationally modest and equipped for recognizing a non-cooperative speaker. In this investigation, the speaker recognition efficiency is compared to less than 2 sec of text-independent test and train utterances of Texas Instruments and Massachusetts Institute of Technology (TIMIT) and self-collected database. The efficiency of ASR has been improved by 1% with the baseline by hiding the outliers and assigns them by their closest codebook vectors the efficiency of proposed modelling techniques is 98.8%, 98.1% respectively for TIMIT and self-collected database. 

Cite This Paper

Satyanand Singh, Abhay Kumar, David Raju Kolluri,"Efficient Modelling Technique based Speaker Recognition under Limited Speech Data", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.8, No.11, pp.41-48, 2016. DOI: 10.5815/ijigsp.2016.11.06

Reference

[1]Prashanthi, Satyanand Singh, Dr. E.G. Rajan and Pat Krishanan, "Sparsification of Voice Data Using Discrete Rajan Transform and its Applications in Speaker Recognition," IEEE International Conference on Systems, Man, and CyberneticsOctober 5-8, San Diego, CA, USA, pp. 437-442, 2014.

[2]H.S. Jayana, S. R. M. Prasanna, "Fuzzy vector quantization for speaker recognition under limited data conditions," TENCON 2008 , IEEE Region 10 Conference, pp. 1-4, Nov. 2008.

[3]Satyanand Singh and ajeet Singh, "Accuracy Comparison using Different Modeling Techniques under Limited Speech Data of Speaker Recognition Systems," The Global Journal of Science Frontier Research, Vol 16, No 2-F , 2016.

[4]P. Angkititrakul and J. H. L. Hansen, "Discriminative In-Set/Out-of-Set Speaker Recognition," IEEE Trans. Audio Speech Language Processing, vol. 15(2), pp. 498-508, Feb. 2007.

[5]P. Angkititrakul, J. H. L. Hansen, and S. Bagahaii, "Cluster-dependent modeling and confidence measure processing for in-set/out-of-set speaker identification," in Proc. Odyssey 2004 Speaker Lang. Recognition Workshop, pp. 2385–2388, 2004.

[6]P. Angkititrakul and J. H. L. Hansen, "Identifying in-set and out-of-set speakers use neighbourhood information," in Proc. ICASSP'04, pp. 393–396, 2004.

[7]Soumendu Das and Sreeparna Banerjee, "An Algorithm for Japanese Character Recognition," I.J. Image, Graphics and Signal Processing Vol. 7, No. 1, PP.9-15, December 2014.

[8]S. Kwon and S. Narayanan, "Robust speaker identification based on selective use of feature vectors," Pattern Recognit. Lett, vol. 28, pp. 85–89, 2007.

[9]Lantian Li, Dong Wang, Chenhao Zhang, and Thomas Fang Zheng, "Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 24, Issue. 6, pp. 1129-1139, 21 March 2016. 

[10]M.W. Mak, R. Hsiao, and B. Mak, "A comparison of various adaptation methods for speaker verification with limited enrollment data," in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2008, vol. 1. IEEE, 2006, pp. I–I.

[11]R. J. Vogt, C. J. Lustri, and S. Sridharan, "Factor analysis modelling for speaker verification with short utterances," in The Speaker and Language Recognition Workshop. IEEE, 2008.

[12]A. Kanagasundaram, R. Vogt, D. B. Dean, S. Sridharan, and M. W.Mason, "i-vector based speaker recognition on short utterances," in Proceedings of the 12th Annual Conference of the International Speech Communication Association. International Speech Communication Association (ISCA), 2011, pp. 2341-2344.

[13]Melek, W.W., Emami, M.R., Goldenberg, A.A., "An improved robust fuzzy clustering algorithm," Fuzzy Systems Conference Proceedings, FUZZ-IEEE '99. IEEE International vol.3, pp.1261-1265, 1999.

[14]S.Singh and Dr. E.G. Rajan, "Application Of Different Filters In Mel Frequency Cepstral Coefficients Feature Extraction And Fuzzy Vector Quantization Approach In Speaker Recognition," International Journal of Engineering Research & Technology, Vol. 2 Issue 6, pp.419-425, June 2013

[15]Moyen Mohammad Mustaquim., "Fuzzy-Logic Controller for Speaker-Independent Speech Recognition System in Computer Games," Universal Access in Human-Computer Interaction. Applications and Services Vol. 6768 of the series Lecture Notes in Computer Science, pp. 91-100, July. 2011.

[16]Jasdeep Kaur and Manish Mahajan, "Hybrid of Fuzzy Logic and Random Walker Method for Medical Image Segmentation," I.J. Image, Graphics and Signal Processing, Vol. 7, No. 2, January 2015.

[17]Jacek M. Leski, Marian Kotas, "Generalized fuzzy c-means clustering strategies using L p norm distances," Journal Fuzzy Sets and Systems., Vol 279 Issue C , pp. 112-129, Nov. 2015.

[18]N. B. Karayiannis, P.I. Pai, "Fuzzy Vector Quantization Algorithms and Their Application in Image Compression," IEEE Trans Image Processing, vol. 4, no.9, pp. 1193-1201, 1995.

[19]S. Singh, Mansour H. Assaf, Sunil R.Das, Emil M. Petriu, and Voicu Groza, "Short Duration Voice Data Speaker Recognition System Using Novel Fuzzy Vector Quantization Algorithms," IEEE International Instrumentation and Measurement Technology Conference, pp. 1-6, 23-26 May 2016.

[20]S.Singh and Dr. E.G. Rajan, "MFCC VQ Based Speaker Recognition and Its Accuracy Affecting Factors.," International Journal of Computer Application. Vol. 21, No 6, pp 1-6, May-2011.

[21]Shashidhar G. Koolagudi , Kritika Sharma , K. Sreenivasa Rao, "Speaker Recognition in Emotional Environment," International Conference, ICECCS 2012, Kochi, India, August 9-11, 2012. 

[22]Jyoti Malik,Dhiraj Girdhar,Ratna Dahiya, and G. Sainarayanan, "Reference Threshold Calculation for Biometric Authentication," I.J. Image, Graphics and Signal Processing, Vol. 6, No. 2, pp. 46-53, 2014.