IJISA Vol. 10, No. 10, 8 Oct. 2018
Cover page and Table of Contents: PDF (size: 743KB)
Full Text (PDF, 743KB), PP.63-71
Views: 0 Downloads: 0
Selfie sign language, Convolutional Neural Networks (CNN), Stochastic pooling, Sign language recognition (SLR), Deep learning
Extraction of complex head and hand movements along with their constantly changing shapes for recognition of sign language is considered a difficult problem in computer vision. This paper proposes the recognition of Indian sign language gestures using a powerful artificial intelligence tool, convolutional neural networks (CNN). Selfie mode continuous sign language video is the capture method used in this work, where a hearing-impaired person can operate the Sign language recognition (SLR) mobile application independently. Due to non-availability of datasets on mobile selfie sign language, we initiated to create the dataset with five different subjects performing 200 signs in 5 different viewing angles under various background environments. Each sign occupied for 60 frames or images in a video. CNN training is performed with 3 different sample sizes, each consisting of multiple sets of subjects and viewing angles. The remaining 2 samples are used for testing the trained CNN. Different CNN architectures were designed and tested with our selfie sign language data to obtain better accuracy in recognition. We achieved 92.88 % recognition rate compared to other classifier models reported on the same dataset.
P.V.V. Kishore, G. Anantha Rao, E. Kiran Kumar, M. Teja Kiran Kumar, D. Anil Kumar, "Selfie Sign Language Recognition with Convolutional Neural Networks", International Journal of Intelligent Systems and Applications(IJISA), Vol.10, No.10, pp.63-71, 2018. DOI:10.5815/ijisa.2018.10.07
[1]Parton, Becky Sue. "Sign language recognition and translation: A multidisciplined approach from the field of artificial intelligence." Journal of deaf studies and deaf education, winter:11, no.1, 2006, pp:94-101. doi:10.1093/deafed/enj003.
[2]Mitra, Sushmita, and Tinku Acharya. "Gesture recognition: A survey." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 37, no.3, 2007, pp: 311-324. doi: 10.1109/TSMCC.2007.893280.
[3]Raffa, Giuseppe, Lama Nachman, and Jinwon Lee. "Efficient gesture processing." U.S. Patent 9,535,506, issued January 3, 2017.
[4]Liu, Zhengzhe, Fuyang Huang, Gladys Wai Lan Tang, Felix Yim Binh Sze, Jing Qin, et al. "Real-time Sign Language Recognition with Guided Deep Convolutional Neural Networks." In Proceedings of the 2016 Symposium on Spatial User Interaction, pp. 187-187. ACM, 2016. doi:10.1145/2983310.2989187.
[5]Chen, Feng-Sheng, Chih-Ming Fu, and Chung-Lin Huang. "Hand gesture recognition using a real-time tracking method and hidden Markov models." Image and vision computing 21, no.8, 2003,pp: 745-758. doi: 10.1016/S0262-8856(03)00070-2.
[6]Cavender, Anna, Rahul Vanam, Dane K. Barney, Richard E. Ladner, and Eve A. Riskin. "MobileASL: Intelligibility of sign language video over mobile phones." Disability and Rehabilitation: Assistive Technology 3, no. 1-2 , 2008 pp: 93-105. doi: 10.1080/17483100701343475.
[7]Starner, Thad, Joshua Weaver, and Alex Pentland. "Real-time american sign language recognition using desk and wearable computer based video." IEEE Transactions on Pattern Analysis and Machine Intelligence 20, no. 12, 1998, pp:1371-1375. doi: 10.1109/34.735811.
[8]Kushwah, Mukul Singh, Manish Sharma, Kunal Jain, and Anish Chopra. "Sign Language Interpretation Using Pseudo Glove." In Proceeding of International Conference on Intelligent Communication, Control and Devices, pp. 9-18. Springer Singapore, 2017.
[9]Kumar, Pradeep, Himaanshu Gauba, Partha Pratim Roy, and Debi Prosad Dogra. "Coupled HMM-based Multi-Sensor Data Fusion for Sign Language Recognition." Pattern Recognition Letters, Vol. 86, pp.1-8, 2017. doi: 10.1016/j.patrec.2016.12.004
[10]Bhuyan, M. K., D. Ghoah, and P. K. Bora. "A framework for hand gesture recognition with applications to sign language." In India Conference, 2006 Annual IEEE, pp. 1-6. IEEE, 2006. doi: 10.1109/INDCON.2006.302823.
[11]Yu Zhou and Xilin Chen, “Adaptive sign language recognition with Exemplar extraction and MAP/IVFS”, IEEE signal processing letters, Vol 17, No-3, March 2010, pp297-300. doi: 10.1109/LSP.2009.2038251.
[12]Och, J., Ney, H., “A systematic comparison of various alignment models”. Computational Linguistics 29 (1), pp.19–51, 2003. doi: 10.1162/089120103321337421
[13]Koehn, Philipp. "Pharaoh: a beam search decoder for phrase-based statistical machine translation models." In Conference of the Association for Machine Translation in the Americas, pp. 115-124. Springer, Berlin, Heidelberg, 2004.
[14]Kishore PVV, Rajesh Kumar P. “A video based Indian Sign Language Recognition System (INSLR) using wavelet transform and fuzzy logic”. International Journal of Engineering and Technology. 4(5), pp.537-42, 2012. doi: 10.7763/IJET.2012.V4.427.
[15]Inthiyaz Syed, B.T.P.Madhav, and P.V.V.Kishore. "Flower segmentation with level sets evolution controlled by colour, texture and shape features." Cogent Engineering 4, no.1(2017):1323572.doi:10.1080/23311916.2017.1323572.
[16]Shimada, Mitsuaki, Satoshi Iwasaki, and Toshiyuki Asakura. "Finger spelling recognition using neural network with pattern recognition model." In SICE 2003 Annual Conference, vol. 3, pp. 2458-2463. IEEE, 2003.
[17]Rätsch, Gunnar, Takashi Onoda, and K-R. Müller. "Soft margins for AdaBoost." Machine learning, vol.42, no.3, pp.287-320, 2001. doi: 10.1023/A:1007618119488.
[18]Z. Dong, X. Tian, “Multi-level photo quality assessment with multi-view features”, Neurocomputing. Vol.168, pp.308-319, 2015. doi: 10.1016/j.neucom.2015.05.095.
[19]Z. Dong, X. Shen, H. Li, X. Tian, “Photo quality assessment with DCNN that understands image well”, In proceedings of the International Conference on MultiMedia Modeling (MMM), 2015, pp.524-535.
[20]X. Lu, Z. Lin, H. Jin, J. Yang, J. Wang, “Rating pictorial aesthetics using deep learning”, In proceedings of the ACM Conference on Multimedia, 2014, 457-466.
[21]A. Krizhevsky, I.Sutskever, G.E. Hinton, “ImageNet classification with deep convolution neural networks”, In proceedings of the Annual Conference on Neural Information Processing System (NIPS), 2012, pp.1097-1105.
[22]Y. Sun, X. Wang, X. Tang, “Deep learning face representation from predicting 10,000 classes”, In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1891-1898.
[23]K. Jarrett, K. Kavukcuoglu, M. Ranzato, Y. LeCun, “What is the best multi-stage architecture for object recognition”, In proceedings of the IEEE International Conference on Computer Vision (ICCV), 2009, pp. 2146-2153. doi: 10.1109/ICCV.2009.5459469.
[24]H. Lee, R. Grosse, R. Ranganath, A.Y.Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations”, In proceedings of the International Conference on Machine Learning (ICML), 2009, pp. 609-616. doi: 10.1145/1553374.1553453.
[25]Y. Bengio, “Learning deep architectures for AI, Foundations and trends in Machine Learning”, Vol. 2, No. 1, pp. 1-127, 2009. doi: 10.1561/2200000006.
[26]Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based learning applied to document recognition”, In proceedings of the IEEE , Vol. 86, No. 11, pp. 2278-2324, 1998. doi: 10.1109/5.726791.
[27]H. Lee, A. Battle, R. Raina and A. Y. Ng, “Efficient sparse coding algorithms”, In Advances in neural information processing systems, pp. 801-808, 2006.
[28]R. Salakhutdinov and G. E. Hinton, “Deep Boltzmann Machines”, In proceedings of the International Conference on Artificial Intelligence and Statistics, Clearwater Beach, Florida USA, pp. 448-455, 2009.
[29]Y. LeCun, Y. Bengio and G. Hinton, “Deep learning”, Nature, vol. 521, No. 7553, pp. 436-444, 2015. doi: 10.1038/nature14539.
[30]Karpathy, Andrej, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. "Large-scale video classification with convolutional neural networks." In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725-1732. 2014. doi: 10.1109/CVPR.2014.223.
[31]Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." In Advances in neural information processing systems, pp. 568-576. 2014.
[32]H. Lee, R. Grosse, R. Ranganath, A.Y.Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations”, In proceedings of the International Conference on Machine Learning (ICML), 2009, pp. 609-616. doi: 10.1145/1553374.1553453.
[33]J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, “ImageNet: a large-scale hierarchical image dataset”, In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR) , 2009, pp. 248-255. doi: 10.1109/CVPR.2009.5206848.
[34]A. Krizhevsky, I. Sutskever and G. E. Hinton, “Imagenet classification with deep convolutional neural networks”, In Advances in Neural Information Processing Systems(NIPS), Lake Tahoe, Nevada, USA pp. 1097-1105, 2012.
[35]Rao, G. Anantha, and P. V. V. Kishore. "Sign language recognition system simulated for video captured with smart phone front camera." International Journal of Electrical and Computer Engineering 6.5 (2016): 2176. doi: 10.11591/ijece.v6i5.11384
[36]Rao, G. Anantha, P. V. V. Kishore, D. Anil Kumar, and A. S. C. S. Sastry. "Neural network classifier for continuous sign language recognition with selfie video." Far East Journal of Electronics and Communications 17.1: 49,2017.
[37]Rao, G. Anantha, and P. V. V. Kishore. "Selfie video based continuous Indian sign language recognition system." Ain Shams Engineering Journal (2017). doi: 10.1016/j.asej.2016.10.013
[38]K. V. V. Kumar, P. V. V. Kishore, and D. Anil Kumar, “Indian Classical Dance Classification with Adaboost Multiclass Classifier on Multifeature Fusion,” Mathematical Problems in Engineering, vol. 2017, Article ID 6204742, 18 pages, 2017. doi: 10.1155/2017/6204742