P. Mahesha

Work place: Department of Computer Science and Engineering, S.J. College of Engineering, JSS Science and Technology University, Mysuru, Karnataka, India

E-mail: maheshap@sjce.ac.in

Website: https://orcid.org/0000-0002-8711-7302

Research Interests:

Biography

Mahesha P. is working as Associate Professor in the department of Computer Science & Engineering. His areas of interest include Speech Signal Processing, Machine Learning, Data analytics, Digital Signal Forensics and Web Technologies. He is currently involved in clinical speech processing research. He has presented and published research papers across conferences and Journals of repute, serves as a program committee member for International Conferences and reviewer for various journals and is on the advisory board for a couple of organizations. He is recognized as official reviewer for two Elsevier Journals, International Journal of Computer Methods and Programs in Biomedicine, International Journal of Biomedical Signal Processing and Control. He holds a Bachelor of Engineering degree from University of Mysore and M.Tech and Ph.D. from Visvesvaraya Technological University,Belgaum.

Author Articles
Voice Comparison Using Acoustic Analysis and Generative Adversarial Network for Forensics

By Kruthika S. G Trisiladevi C Nagavi P. Mahesha Abhishek Kumar

DOI: https://doi.org/10.5815/ijigsp.2025.02.07, Pub. Date: 8 Apr. 2025

Forensic Voice Comparison (FVC) is a scientific analysis that examines audio recordings to determine whether they come from the same or different speakers in digital forensics. In this research work, the experiment utilizes three different techniques, like pre-processing, feature extraction, and classification. In preprocessing, the stationery noise reduction algorithm is used to remove unwanted background noise by increasing the clarity of the speech. This in turn helps to improve the overall audio quality by reducing distractions. Further, acoustic features like Mel Frequency Cepstral Coefficients (MFCC) are used to extract relevant and distinctive features from audio signals to characterize and analyze the unique vocal patterns of different individual. Later, the Generative Adversarial Network (GAN) is used to generate synthetic MFCC features and also for augmenting the data samples. Finally, the Logistic Regression (LR) is realized using UK framework for the classification of the model to predict whether the result is true or false. The results achieved in terms of accuracy are 62% considering 3899 samples and 85% when considering set of 985 samples for the Australian English datasets.

[...] Read more.
Other Articles