Work place: Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh
E-mail: mkasiprasad@gmail.com
Website:
Research Interests: Artificial Intelligence, Image Compression, Image Manipulation, Image Processing, Speech Recognition
Biography
Kasiprasad Mannepalli is working as Assoc. Professor, Department of Electronics and Communication Engineering, Koneru Lakshmaiah Education Foundation (K L University, Vijayawada), Andhra Pradesh. His research interest includes Speech Processing, Image Processing, machine intelligence. He has published 10 papers in International Journals and 6 Papers in International / National Conferences. He has 12 years of teaching experience.
By J. Bennilo Fernandes Kasiprasad Mannepalli
DOI: https://doi.org/10.5815/ijmecs.2022.03.03, Pub. Date: 8 Jun. 2022
The Recurrent Neural Network (RNN) is well suited for emotional speech recognition because its uses constantly time shifting property. Even though RNN gives better results GRU, LSTM and BILSTM solves the gradient problem and overfitting problem joins the path to reduces the efficiency. Hence in this paper five deep learning architecture is designed in order to overcome the major issues using data augmentation and spatial feature. Five different architectures like: Enhanced Deep Hierarchal LSTM & GRU (EDHLG), EDHBG, EDHGL, EDHGB & EDHGG are developed with dropout layers. The raw data learned from LSTM will be given as the input to GRU layer for deepest learning. Thus, the gradient problem is reduced, and accuracy of each emotion was increased. Also, to enhance the accuracy level spatial features were concatenated with MFCC. Thus, in all models, the experimental evaluation with the Tamil emotional dataset yielded the best results. EDHLG has a 93.12% accuracy, EDHGL has a 92.56 percent accuracy, EDHBG has a 95.42 percent accuracy, EDHGB has a 96 percent accuracy, and EDHGG has a 94 percent accuracy. Furthermore, the average accuracy rate of a single individual LSTM layer is 74%, while BILSTM is 77%. EDHGB outperforms almost all other systems, by an optimal system of 94.27 percent and then a maximum overall accuracy of 95.99 percent. For the Tamil emotion data, emotional states such as happy, fearful, angry, sad, and neutral have a 100% prediction accuracy, while disgust has a 94 percent efficiency rate and boredom has an 82 percent accuracy rate. Also, the training time and evaluation time utilized by EDHGB is 4.43 mins and 0.42 mins which is less when compared with other models. Hence by changing the LSTM, BILSTM and GRU layers large analysis of experiment on Tamil dataset is done and EDHGB is superior to other models, and when compared with basic models LSTM and BILSTM around 26% more efficiency is gained.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals