On the Use of Time–Frequency Reassignment and SVM-Based Classifier for Audio Surveillance Applications

Full Text (PDF, 1115KB), PP.17-25

Views: 0 Downloads: 0

Author(s)

Souli S. Sameh 1,* Zied LACHIRI 2

1. Signal, Image and pattern recognition research unit, ENIT/ Dept. of Genie Electrique BP 37, 1002, Le Belvédère, Tunisia

2. Dept. of Physique and Instrumentation, INSAT/ BP 676, 1080, Centre Urbain, Tunisia

* Corresponding author.

DOI: https://doi.org/10.5815/ijigsp.2014.12.03

Received: 4 Aug. 2014 / Revised: 30 Aug. 2014 / Accepted: 1 Oct. 2014 / Published: 8 Nov. 2014

Index Terms

Environmental sounds, Reassignment method, Gabor filters, SVM Multiclass, Mutual Information

Abstract

In this paper, we propose a robust environmental sound spectrogram classification approach. Its purpose is surveillance and security applications based on the reassignment method and log-Gabor filters. 
Besides, the reassignment method is applied to the spectrogram to improve the readability of the time-frequency representation, and to assure a better localization of the signal components. Our approach includes three methods. In the first two methods, the reassigned spectrograms are passed through appropriate log-Gabor filter banks and the outputs are averaged and underwent an optimal feature selection procedure based on a mutual information criterion. The third method uses the same steps but applied only to three patches extracted from each reassigned spectrogram. The proposed approach is tested on a large database consists of 1000 sounds belonging to ten classes. The recognition is based on Multiclass Support Vector Machines.

Cite This Paper

Souli S. Sameh, Lachiri Z. Zied,"On the Use of Time–Frequency Reassignment and SVM-Based Classifier for Audio Surveillance Applications", IJIGSP, vol.6, no.12, pp.17-25, 2014. DOI: 10.5815/ijigsp.2014.12.03

Reference

[1]S. Chu, S. Narayanan, and C.C.J Kuo, “Environmental Sound Recognition with Time-Frequency Audio Features”, IEEE Trans. on Speech, Audio, and Language Processing, Vol. 17, No.6, 2009, pp.1142-1158.

[2]A. Rabaoui, M. Davy, S. Rossignol, and N. Ellouze, “Using One-Class SVMs and Wavelets for Audio Surveillance”, IEEE Transactions on Information Forensics And Security, Vol.3, No.4, 2008, pp.763-775.

[3]V. Peltonen, J. Tuomi, A. Klapuri, J. Huopaniemi, and T. Sorsa, “Computational audiroty scene recognition”, Int. Conf.Acoustics, Speech Signal Processing, 2002, pp.1941-1944.

[4]M. Vacher, D. Istrate, L. Besacier, J. F. Serignat, and E. Castelli, “Sound detection and classification for medical telesurvey”, In Proc. IASTED Biomedical Conf., Innsbruck, Autriche, 2004, pp.395–399.

[5]A. Dufaux, L. Besacier, M. Ansorge, and F. Pellandini, “Automatic Sound Detection and Recognition For Noisy Environment”, In Proceedings of European Signal Processing Conference (EUSIPCO), 2000, pp.1033-1036.

[6]A. Fleury, N. Noury, M. Vacher, H. Glasson and J.F Serignat, “Sound and speech detection and classification in a Health Smart Home”, 30th IEEE Engineering in Medicine and Biology Society (EMBS), 2008, pp. 4644-4647.

[7]D. Mitrovic, M. Zeppelzauer, H. Eidenberger, “Analysis of the Data Quality of Audio Descriptions of Environmental Sounds”, Journal of Digital Information Management (JDIM), Vol.5, No.2, 2007, pp.48-54.

[8]K. El-Maleh, A. Samouelian, and P. Kabal, “Frame-level noise classification in mobile environments”, In Proc. ICASSP,1999, pp.237–240.

[9]D. Istrate, “Détection et reconnaissance des sons pour la surveillance médicale”, PhD thesis, INPG, France, 2003.

[10]G.Yu, and J.J. Slotine, “Fast Wavelet-based Visual Classification”, In Proc. IEEE International Conference on Pattern Recognition. ICPR, 2008, pp.1-5.

[11]G.Yu, and J.J. Slotine, “Audio Classification from Time-Frequency Texture”, In Proc. IEEE. ICASSP, 2009, pp.1677-1680.

[12]J. Dennis, and H.D. Tran, and H. Li, “Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions”, Signal Processing Letters, IEEE, Vol.18, No.2, 2011, pp.130-133.

[13]T. Ezzat, J. Bouvrie and T. Poggio, “Spectro-Temporal Analysis of Speech Using 2-D Gabor Filters”, Proc. Interspeech, 2007, pp.1-4.

[14]S. Souli, Z. Lachiri, “Environmental Sounds Classification Based on Visual Features”, CIARP, Vol. 7042, Springer, Chile, 2011, pp.459-466.

[15]R. Kelly Fitz, A. Sean Fulop, “A unified theory of time-frequency reassignment”, Computing Research Repository- CORR, abs/0903.3, 2009.

[16]F. Auger and P. Flandrin, “Improving the Readability of Time-Frequency and Time- Scale Representations by the Reassignment Method”, IEEE Trans. Signal Proc.,Vol.40, No.5,1995 pp.1068-1089.

[17]M. Kleinschmidt, “Methods for capturing spectro-temporal modulations in automatic speech recognition”, Electrical and Electronic Engineering Acoustics, Speech and Signal Processing Papers, Acta Acustica, Vol.88, No.3, 2002, pp. 416-422.

[18]M. Kleinschmidt, “Localized spectro-temporal features for automatic speech recognition”, In Proc. Eurospeech, 2003, pp.2573-2576.

[19]L. He, M. Lech, N. C. Maddage and N Allen, “Stress Detection Using Speech Spectrograms and Sigma-pi Neuron Units”, Int. Conf. on Natural Computation, 2009, pp.260-264.

[20]L. He, M. Lech, N. Maddage, N. Allen, “Stress and Emotion Recognition Using Log-Gabor Filter”, Affective Computing and Intelligent Interaction and Workshops, ACII, 3rd International Conference on, 2009, pp.1-6.

[21]Z. Xinyi, Y. Jianxiao, H. Qiang, “Research of STRAIGHT Spectrogram and Difference Subspace Algorithm for Speech Recognition”, 2nd Int. Congress On Image and Signal Processing (CISP), IEEE, 2009, pp.1-4.

[22]J. Dennis, and H.D. Tran, and H. Li, “Image Representation of the Subband Power Distribution for Robust Sound Classification”, Proc. INTERSPEECH, ISCA, 2011, pp.2437-2440.

[23]F. Millioz, N. Martin, “Réallocation du spectrogramme pour la détection de frontières de motifs temps-fréquence”, Colloque GRETSI, 2007, pp.11-14.

[24]K. Fitz and L. Haken, “On the Use of Time-Frequency Reassignment in Additive Sound Modeling”, J.Audio Eng. Soc (AES), Vol.50, No.1, 2002, pp.879-893. 

[25]F. Millioz, N. Martin, “Reassignment Vector Field for Time-Frequency Segmentation”, 14th international congress on sound and vibration, ICSV 14, 2007.

[26]E. Chassande-Mottin, “Méthodes de réallocation dans le plan temps-fréquence pour l'analyse et le traitement de signaux non stationnaires”, PhDhesis, Cergy-PontoiseUniversity, 1998.

[27]N. Kwak, C. Choi, “Input Feature Selection for Classification Problems”, IEEE Trans, On Neural Networks, Vol.13, No.1, 2002, pp.143-159.

[28]V. Vladimir, and N. Vapnik, “An Overview of Statistical Learning Theory”, IEEE Transactions on Neural Networks, Vol.10, No.5, 1999, pp.988-999.

[29]V. Vapnik, and O. Chapelle, “Bounds on Error Expectation for Support Vector Machines”, Journal Neural Computation, MIT Press Cambridge, MA, USA, Vol.12, No.9, 2000, pp.2013-2036. 

[30]B. Scholkopf, and A. Smola, “Learning with Kernels”, MIT Press, 2001.

[31]C.-W Hsu, C.-J Lin, “A comparison of methods for multi-class support vector machines”, J. IEEE Transactions on Neural Networks, Vol.13, No.2, 2002, pp.415-425.

[32]Leonardo Software website. [Online]. Available: http ://www.leonardosoft.com. Santa Monica, CA 90401.

[33]Real World Computing Paternship, Cd-sound scene database in real acoustical environments, 2000, http://tosa.mri.co.jp/ sounddb/indexe.htm.

[34]E. Sejdi, I. Djurovi, and J. Jiang, “Time--frequency feature representation using energy concentration: An overview of recent advances”, Digit. Signal Process. Vol.19, No. 1, 2009, pp. 153-183.

[35]J. Weston and C. Watkins, “Support vector machines for multi-class pattern recognition”, 7th Eur Symp. Artificial Neural Networks, Vol.4, No.6, 1999, pp. 219-224. 

[36]I. L Kuncheva, “Combining Pattern Classifiers Methods and Algorithms”, ISBN 0-471-21078-1 (cloth). A Wiley-Interscience publication, Printed in the United States of America. TK7882.P3K83, 2004.

[37]C.-W. Hsu, C-C. Chang, C-J Lin, “A practical Guide to Support Vector Classification, Department of Computer Science and Information Engineering National Taiwan University”, Taipei, Taiwan. Available: www.csie.ntu.edu.tw/~cjlin/, 2009.

[38]J-C. Wang, H-P Lee, J-F Wang, and C-B Lin, “Robust Environmental Sound Recognition for Home Automation”, Automation Science and Engineering, IEEE Transactions on, Vol.5, No.1, 2008, pp. 25-31.

[39]S. Mehdi Lajevardi, Z. M. Hussain, “Facial Expression Recognition Using Log-Gabor Filters and Local Binary Pattern Operators”, International Conference On Communication, Computer and Power (ICCCP’09), Muscat February15-18, 2009.

[40]D. Mitrovic, M. Zeppelzauer, H. Eidenberger, “Towards an Optimal Feature Set for Environmental Sound Recognition”, Technical Report TR-188-2, 2006.

[41]G. Kattmah,G. A. Azim, “Identification Based on Mutual Information and Neural Networks”, International Journal on Image Graphics and Signal Processing, Vol.9, pp.50-57, 2013.