Myanmar Continuous Speech Recognition System Using Convolutional Neural Network

Full Text (PDF, 577KB), PP.44-52

Views: 0 Downloads: 0

Author(s)

Yin Win Chit 1,* Win Ei Hlaing 2 Myo Myo Khaing 1

1. University of Computer Studies (Lashio)

2. University of Technology (Yatanarpon Cyber City)

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2021.02.04

Received: 21 Aug. 2020 / Revised: 14 Oct. 2020 / Accepted: 27 Nov. 2020 / Published: 8 Apr. 2021

Index Terms

Automatic Speech Recognition, Convolutional Neural Network, Mel Frequency Cepstral Coefficient, Continuous Speech, Speech Segmentation.

Abstract

Translating the human speech signal into the text words is also known as Automatic Speech Recognition System (ASR) that is still many challenges in the processes of continuous speech recognition. Recognition System for Continuous speech develops with the four processes: segmentation, extraction the feature, classification and then recognition. Nowadays, because of the various changes of weather condition, the weather news becomes very important part for everybody. Mostly, the deaf people can’t hear weather news when the weather news is broadcast by using radio and television channel but the deaf people also need to know about that news report. This system designed to classify and recognize the weather news words as the Myanmar texts on the sounds of Myanmar weather news reporting. In this system, two types of input features are used based on Mel Frequency Cepstral Coefficient (MFCC) feature extraction method such MFCC features and MFCC features images. Then these two types of features are trained to build the acoustic model and are classified these features using the Convolutional Neural Network (CNN) classifiers. As the experimental result, The Word Error Rate (WER) of this entire system is 18.75% on the MFCC features and 11.2% on the MFCC features images.

Cite This Paper

Yin Win Chit, Win Ei Hlaing, Myo Myo Khaing, " Myanmar Continuous Speech Recognition System Using Convolutional Neural Network ", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.13, No.2, pp. 44-52, 2021. DOI: 10.5815/ijigsp.2021.02.04

Reference

[1]G. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural Comput., vol. 14, pp. 1771–1800, 2002.

[2]D. Hau and K. Chen, “Exploring hierarchical speech representations using a deep convolutional neural network”. 11th UK. (UKCI ‟11), Manchester, U.K., 2011.

[3]I. G. Khaing, K. Z. Linn, “Myanmar Continuous Speech Recognition System based on DTW and HMM”, IJIET., Vol.2, Issue 1, February, 2013.

[4]H. Lukman and Thiang, “Limited Word Recognition Using Fuzzy Matching ".  ICOLA. Jakarta, 2002.

[5]H. Singh and A. K. Bathla "A Survey on Speech Recognition ". 9th ICMT, 2005.

[6]A. Stolcke, and et al., “Highly accurate phonetic segmentation using boundary correction models and system fusion,” ICASSP, 2014, 5552–5556.

[7]S. Karpagavalli and E. Chandra, “A Review on Automatic Speech Recognition Architecture and Approaches”. IJSP, Image Processing and Pattern Recognition, 9(4), 2016, 393-404.