Work place: Department of Information Science and Engineering Siddaganga Institute of Technology, Tumakuru, Karnataka, India
E-mail: jayannahs@gmail.com
Website:
Research Interests: Computer Networks, Image Processing, Speech Recognition, Data Compression
Biography
H S Jayanna
He received his BE and ME degree from Bangalore university in 1992 and 1995 respectively. Ph.D from prestigious Indian Institute of Technology, Guwahati, India in 2009. He has published number of papers in various national and international journals and conferences apart from guiding number of UG, PG and research scholars. Currently, he is working as Professor in the Department of Information Science and Engineering, Siddaganga Institute of Technology, Tumkur, Karnataka, India. His research interests are in the areas of speech, limited data speaker recognition, image processing, computer networks and computer architecture.
By Thimmaraja Yadava G H S Jayanna
DOI: https://doi.org/10.5815/ijisa.2018.03.03, Pub. Date: 8 Mar. 2018
In this work, the Language Models (LMs) and Acoustic Models (AMs) are developed using the speech recognition toolkit Kaldi for noisy and enhanced speech data to build an Automatic Speech Recognition (ASR) system for Kannada language. The speech data used for the development of ASR models is collected under uncontrolled environment from the farmers of different dialect regions of Karnataka state. The collected speech data is preprocessed by proposing a method for noise elimination in the degraded speech data. The proposed method is a combination of Spectral Subtraction with Voice Activity Detection (SS-VAD) and Minimum Mean Square Error-Spectrum Power Estimator (MMSE-SPZC) based on Zero Crossing. The word level transcription and validation of speech data is done by Indic language transliteration tool (IT3 to UTF-8). The Indian Language Speech Label (ILSL12) set is used for the development of Kannada phoneme set and lexicon. The 75% and 25% of transcribed and validated speech data is used for system training and testing respectively. The LMs are generated by using the Kannada language resources and AMs are developed by using Gaussian Mixture Models (GMM) and Subspace Gaussian Mixture Models (SGMM). The proposed method is studied determinedly and used for enhancing the degraded speech data. The Word Error Rates (WERs) of ASR models for noisy and enhanced speech data are highlighted and discussed in this work. The developed ASR models can be used in spoken query system to access the real time agricultural commodity price and weather information in Kannada language.
[...] Read more.DOI: https://doi.org/10.5815/ijigsp.2013.09.03, Pub. Date: 8 Jul. 2013
In this paper we demonstrate the impact of language parameter variability on mono, cross and multi-lingual speaker identification under limited data condition. The languages considered for the study are English, Hindi and Kannada. The speaker specific features are extracted using multi-taper mel-frequency cepstral coefficients (MFCC) and speaker models are built using Gaussian mixture model (GMM)-universal background model (UBM). The sine-weighted cepstrum estimators (SWCE) with 6 tapers are considered for multi-taper MFCC feature extraction. The mono and cross-lingual experimental results show that the performance of speaker identification trained and/or tested with Kannada language is decreased as compared to other languages. It was observed that a database free from ottakshara, arka and anukaranavyayagalu results a good performance and is almost equal to other languages.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals