IJIGSP Vol. 17, No. 2, 8 Apr. 2025
Cover page and Table of Contents: PDF (size: 866KB)
PDF (866KB), PP.106-118
Views: 0 Downloads: 0
Generative Adversarial Network (GAN), Acoustic Features, Digital Forensics, Mel Frequency Cepstral Coefficients (MFCC), Logistic Regression (LR), Forensic Voice Comparison (FVC)
Forensic Voice Comparison (FVC) is a scientific analysis that examines audio recordings to determine whether they come from the same or different speakers in digital forensics. In this research work, the experiment utilizes three different techniques, like pre-processing, feature extraction, and classification. In preprocessing, the stationery noise reduction algorithm is used to remove unwanted background noise by increasing the clarity of the speech. This in turn helps to improve the overall audio quality by reducing distractions. Further, acoustic features like Mel Frequency Cepstral Coefficients (MFCC) are used to extract relevant and distinctive features from audio signals to characterize and analyze the unique vocal patterns of different individual. Later, the Generative Adversarial Network (GAN) is used to generate synthetic MFCC features and also for augmenting the data samples. Finally, the Logistic Regression (LR) is realized using UK framework for the classification of the model to predict whether the result is true or false. The results achieved in terms of accuracy are 62% considering 3899 samples and 85% when considering set of 985 samples for the Australian English datasets.
Kruthika S. G, Trisiladevi C. Nagavi, P. Mahesha, Abhishek Kumar, "Voice Comparison Using Acoustic Analysis and Generative Adversarial Network for Forensics", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.17, No.2, pp. 106-118, 2025. DOI:10.5815/ijigsp.2025.02.07
[1]Phonexia Homepage, “Forensic voice comparison: The Essential Guide:Phonexia”, PhonexiaSpeech Technologies,2022.
[2]Casey, E. “Digital Evidence and Computer Crime”: Forensic Science, Computers, and the Internet, 3rd (edn), Academic Press 2011.
[3]Nelson, B., Phillips, A., Enfinger, F. “Guide to Computer Forensics and Investigations”, 5th (edn), Cengage Learning, 2014.
[4]Morrison, G.S., Enzinger, E. “Introduction to forensic voice comparison. In Katz W.F., Assmann P.F. edn The Rout edge Handbook of Phonetics(ch. 21), Abingdon, UK: Taylor & Francis, pp. 599–634, 2019.
[5]B. K. Dethe, A. V. Waghmare, V. G. Mulik, B. P. More, and B. B. Daundkar, “Forensic Discrimination of Voices using Multi-Speech Techniques”, vol. 4, no. 6 pp. 28, 2016.
[6]Li, Xiaowen and Yan, Diqun and Dong, Li and Wang, Rangding, “Anti-Forensics of Audio Source Identification Using Generative Adversarial Network”, in IEEE Access. vol.7, pp. 1-1,2019.
[7]P. Rose and F. Clermont, “A comparison of two acoustic methods for forensic speaker discrimination”, vol.29, no.1, 2001.
[8]D. Kaur, R. Sharma, M. Gaur, and N. Sawarkar, International Journal of Advances in Engineering and Management ( IJAEM ).
[9]Singh, Mahesh and Singh, Ashutosh and Singh, Narendra, “Acoustic comparison of electronics disguised voice using Different semitones”, International Journal of Engineering and Technology (UAE). vol. 7. pp. 98-101, 2018.
[10]Y. Jiang and D. Ye, “Black-box adversarial attacks against audio forensics models,” Security and Communication Networks, 2022.
[11]Q. Wang, B. Zheng, Q. Li, C. Shen and Z. Ba, "Towards Query-Efficient Adversarial Attacks Against Automatic Speech Recognition Systems," in IEEE Transactions on Information Forensics and Security, vol. 16, pp. 896-908, 2021,
[12]H. Tan, L. Wang, H. Zhang, J. Zhang, M. Shafiq, and Z. Gu, “Adversarial Attack and Defense Strategies of Speaker Recognition Systems: A Survey,” Electronics, vol. 11, no. 14, pp. 2183, 2022.
[13]S. Joshi, J. Villalba, P. Zelasko, L. M. Velazquez, and N. Dehak, “Study of Pre-processing Defenses against Adversarial Attacks on State-of-the-art Speaker Recognition Systems”, Journal of La adversarial attacks and defenses for speaker arxiv, 2021.
[14]E. Gold and J. P. French, “International Practices in Forensic speaker comparisons: Second survey,” White Rose Research, 2018.
[15]M.M Karakoc and A. Varol,“Visual and auditory analysis Methods for speaker recognition in digital forensic”, in IEEE, 2017.
[16]Morrison G. S., Rose P., Zhang C.,“Protocol for the collection of databases of recordings for forensic-voice-comparison research and practice”, Australian Journal of Forensic Sciences, vol.44, pp. 155–167, 2012.
[17]Morrison G.S., Zhang C., Enzinger E., Ochoa F., Bleach D., Johnson M., Folkes B.K., De Souza S., Cummins N., Chow D., Szczekulska A., “Forensic database of voice recordings of 500+ Australian English speakers (AusEng 500+)”, 2021.
[18]G.S. Morrison, E. Enzinger, “Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval _01): Introduction”, Speech communication, Elsevier. B.V, pp. 1–8, 2016.
[19]Trisiladevi C. Nagavi and Nagappa U. Bhajantri, “An Extensive Analysis of Query by Singing/Humming System through Query Proportion” The International Journal of Multimedia & Its Applications (IJMA) vol.4, no.6, 2012.