IJIGSP Vol. 14, No. 2, 8 Apr. 2022
Cover page and Table of Contents: PDF (size: 1170KB)
Full Text (PDF, 1170KB), PP.47-60
Views: 0 Downloads: 0
Object detection, Convolutional Neural Network, deep learning techniques
In the field of computer vision, object detection is the fundamental most widely used and challenging problem. Last several decades, great effort has been made by computer scientists or researchers to handle the object detection problem. Object detection is basically, used for detecting the object from image/video. At the beginning of the 21st century, a lot of work has been done in this field such as HOG, SIFT, SURF etc. are performing well but can’t be efficiently used for Real-time detection with speed and accuracy. Furthermore, in the deep learning era Convolution Neural Network made a rapid change and leads to a new pathway and a lot of excellent work has been done till dated such as region-based convolution network YOLO, SSD, retina NET etc. In this survey paper, lots of research papers were reviewed based on popular traditional object detection methods and current trending deep learning-based methods and displayed challenges, limitations, methodologies used to detect the object and also directions for future research.
Diwakar, Deepa Raj, " Recent Object Detection Techniques: A Survey", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.14, No.2, pp. 47-60, 2022. DOI: 10.5815/ijigsp.2022.02.05
[1] D.G. Lowe, Object recognition from local scale-invariant features, in: Proc. Seventh IEEE Int. Conf. Comput. Vis., 1999: pp. 1150–1157 vol.2. https://doi.org/10.1109/ICCV.1999.790410.
[2] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in: Proc. 2001 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. CVPR 2001, 2001: p. I–I. https://doi.org/10.1109/CVPR.2001.990517.
[3] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: 2005 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. CVPR05, 2005: pp. 886–893 vol. 1. https://doi.org/10.1109/CVPR.2005.177.
[4] H. Bay, T. Tuytelaars, L. Van Gool, SURF: Speeded Up Robust Features, in: A. Leonardis, H. Bischof, A. Pinz (Eds.), Comput. Vis. – ECCV 2006, Springer, Berlin, Heidelberg, 2006: pp. 404–417. https://doi.org/10.1007/11744023_32.
[5] R. Girshick, J. Donahue, T. Darrell, J. Malik, Region-Based Convolutional Networks for Accurate Object Detection and Segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 38 (2016) 142–158. https://doi.org/10.1109/TPAMI.2015.2437384.
[6] R. Girshick, Fast R-CNN, in: 2015: pp. 1440–1448. https://openaccess.thecvf.com/content_iccv_2015/html/Girshick_Fast_R-CNN_ICCV_2015_paper.html (accessed September 14, 2021).
[7] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, in: Adv. Neural Inf. Process. Syst., Curran Associates, Inc., 2015. https://proceedings.neurips.cc/paper/2015/hash/14bfa6bb14875e45bba028a21ed38046-Abstract.html (accessed September 14, 2021).
[8] K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask R-CNN, in: 2017: pp. 2961–2969. https://openaccess.thecvf.com/content_iccv_2017/html/He_Mask_R-CNN_ICCV_2017_paper.html (accessed September 14, 2021).
[9] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You Only Look Once: Unified, Real-Time Object Detection, in: 2016: pp. 779–788. https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Redmon_You_Only_Look_CVPR_2016_paper.html (accessed September 14, 2021).
[10] J. Redmon, A. Farhadi, YOLO9000: Better, Faster, Stronger, in: 2017: pp. 7263–7271. https://openaccess.thecvf.com/content_cvpr_2017/html/Redmon_YOLO9000_Better_Faster_CVPR_2017_paper.html (accessed September 14, 2021).
[11] J. Redmon, A. Farhadi, YOLOv3: An Incremental Improvement, ArXiv180402767 Cs. (2018). http://arxiv.org/abs/1804.02767 (accessed September 14, 2021).
[12] A. Bochkovskiy, C.-Y. Wang, H.-Y.M. Liao, YOLOv4: Optimal Speed and Accuracy of Object Detection, ArXiv200410934 Cs Eess. (2020). http://arxiv.org/abs/2004.10934 (accessed September 14, 2021).
[13] ultralytics/yolov5, Ultralytics, 2021. https://github.com/ultralytics/yolov5 (accessed September 14, 2021).
[14] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A.C. Berg, SSD: Single Shot MultiBox Detector, in: B. Leibe, J. Matas, N. Sebe, M. Welling (Eds.), Comput. Vis. – ECCV 2016, Springer International Publishing, Cham, 2016: pp. 21–37. https://doi.org/10.1007/978-3-319-46448-0_2.
[15] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, Focal Loss for Dense Object Detection, in: 2017: pp. 2980–2988. https://openaccess.thecvf.com/content_iccv_2017/html/Lin_Focal_Loss_for_ICCV_2017_paper.html (accessed September 14, 2021).
[16] S. Zhang, L. Wen, X. Bian, Z. Lei, S.Z. Li, Single-Shot Refinement Neural Network for Object Detection, in: 2018: pp. 4203–4212. https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_Single-Shot_Refinement_Neural_CVPR_2018_paper.html (accessed September 14, 2021).
[17] P.F. Felzenszwalb, R.B. Girshick, D. McAllester, Cascade object detection with deformable part models, in: 2010 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., 2010: pp. 2241–2248. https://doi.org/10.1109/CVPR.2010.5539906.
[18] T. Kong, F. Sun, H. Liu, Y. Jiang, L. Li, J. Shi, FoveaBox: Beyound Anchor-Based Object Detection, IEEE Trans. Image Process. 29 (2020) 7389–7398. https://doi.org/10.1109/TIP.2020.3002345.
[19] J. Guo, J. Wang, R. Bai, Y. Zhang, Y. Li, A New Moving Object Detection Method Based on Frame-difference and Background Subtraction, IOP Conf. Ser. Mater. Sci. Eng. 242 (2017) 012115. https://doi.org/10.1088/1757-899X/242/1/012115.
[20] F. Particke, R. Kolbenschlag, M. Hiller, L. Patiño-Studencki, J. Thielecke, Deep Learning for Real-Time Capable Object Detection and Localization on Mobile Platforms, IOP Conf. Ser. Mater. Sci. Eng. 261 (2017) 012005. https://doi.org/10.1088/1757-899X/261/1/012005.
[21] D. Lin, X. Shen, C. Lu, J. Jia, Deep LAC: Deep Localization, Alignment and Classification for Fine-Grained Recognition, in: 2015: pp. 1666–1674. https://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Lin_Deep_LAC_Deep_2015_CVPR_paper.html (accessed September 14, 2021).
[22] T. Malisiewicz, A. Gupta, A.A. Efros, Ensemble of exemplar-SVMs for object detection and beyond, in: 2011 Int. Conf. Comput. Vis., 2011: pp. 89–96. https://doi.org/10.1109/ICCV.2011.6126229.
[23] X.-C. Yin, X. Yin, K. Huang, H.-W. Hao, Robust Text Detection in Natural Scene Images, IEEE Trans. Pattern Anal. Mach. Intell. 36 (2014) 970–983. https://doi.org/10.1109/TPAMI.2013.182.
[24] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, ArXiv13126229 Cs. (2014). http://arxiv.org/abs/1312.6229 (accessed September 14, 2021).
[25] D. Erhan, C. Szegedy, A. Toshev, D. Anguelov, Scalable Object Detection using Deep Neural Networks, in: 2014: pp. 2147–2154. https://openaccess.thecvf.com/content_cvpr_2014/html/Erhan_Scalable_Object_Detection_2014_CVPR_paper.html (accessed September 14, 2021).
[26] K. He, X. Zhang, S. Ren, J. Sun, Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, IEEE Trans. Pattern Anal. Mach. Intell. 37 (2015) 1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824.
[27] D. Yoo, S. Park, J.-Y. Lee, A.S. Paek, I. So Kweon, AttentionNet: Aggregating Weak Directions for Accurate Object Detection, in: 2015: pp. 2659–2667. https://www.cv-foundation.org/openaccess/content_iccv_2015/html/Yoo_AttentionNet_Aggregating_Weak_ICCV_2015_paper.html (accessed September 14, 2021).
[28] S. Gidaris, N. Komodakis, Object Detection via a Multi-Region and Semantic Segmentation-Aware CNN Model, in: 2015: pp. 1134–1142. https://openaccess.thecvf.com/content_iccv_2015/html/Gidaris_Object_Detection_via_ICCV_2015_paper.html (accessed September 14, 2021).
[29] A. Ghodrati, A. Diba, M. Pedersoli, T. Tuytelaars, L. Van Gool, DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers, in: 2015: pp. 2578–2586. https://openaccess.thecvf.com/content_iccv_2015/html/Ghodrati_DeepProposal_Hunting_Objects_ICCV_2015_paper.html (accessed September 14, 2021).
[30] T. Kong, A. Yao, Y. Chen, F. Sun, HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection, in: 2016: pp. 845–853. https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Kong_HyperNet_Towards_Accurate_CVPR_2016_paper.html (accessed September 14, 2021).
[31] Z. Cai, Q. Fan, R.S. Feris, N. Vasconcelos, A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection, in: B. Leibe, J. Matas, N. Sebe, M. Welling (Eds.), Comput. Vis. – ECCV 2016, Springer International Publishing, Cham, 2016: pp. 354–370. https://doi.org/10.1007/978-3-319-46493-0_22.
[32] T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos, (n.d.). https://ieeexplore.ieee.org/abstract/document/8003302/ (accessed September 14, 2021).
[33] Z. Shen, Z. Liu, J. Li, Y.-G. Jiang, Y. Chen, X. Xue, DSOD: Learning Deeply Supervised Object Detectors From Scratch, in: 2017: pp. 1919–1927. https://openaccess.thecvf.com/content_iccv_2017/html/Shen_DSOD_Learning_Deeply_ICCV_2017_paper.html (accessed September 14, 2021).
[34] C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, A.C. Berg, DSSD : Deconvolutional Single Shot Detector, ArXiv170106659 Cs. (2017). http://arxiv.org/abs/1701.06659 (accessed September 14, 2021).
[35] T. Kong, F. Sun, A. Yao, H. Liu, M. Lu, Y. Chen, RON: Reverse Connection With Objectness Prior Networks for Object Detection, in: 2017: pp. 5936–5944. https://openaccess.thecvf.com/content_cvpr_2017/html/Kong_RON_Reverse_Connection_CVPR_2017_paper.html (accessed September 14, 2021).
[36] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, Y. Wei, Deformable Convolutional Networks, in: 2017: pp. 764–773. https://openaccess.thecvf.com/content_iccv_2017/html/Dai_Deformable_Convolutional_Networks_ICCV_2017_paper.html (accessed September 14, 2021).
[37] L. Tychsen-Smith, L. Petersson, DeNet: Scalable Real-Time Object Detection With Directed Sparse Sampling, in: 2017: pp. 428–436. https://openaccess.thecvf.com/content_iccv_2017/html/Tychsen-Smith_DeNet_Scalable_Real-Time_ICCV_2017_paper.html (accessed September 14, 2021).
[38] P. Zhou, B. Ni, C. Geng, J. Hu, Y. Xu, Scale-Transferrable Object Detection, in: 2018: pp. 528–537. https://openaccess.thecvf.com/content_cvpr_2018/html/Zhou_Scale-Transferrable_Object_Detection_CVPR_2018_paper.html (accessed September 14, 2021).
[39] H. Hu, J. Gu, Z. Zhang, J. Dai, Y. Wei, Relation Networks for Object Detection, in: 2018: pp. 3588–3597. https://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Relation_Networks_for_CVPR_2018_paper.html (accessed September 14, 2021).
[40] H. Law, J. Deng, CornerNet: Detecting Objects as Paired Keypoints, in: 2018: pp. 734–750. https://openaccess.thecvf.com/content_ECCV_2018/html/Hei_Law_CornerNet_Detecting_Objects_ECCV_2018_paper.html (accessed September 14, 2021).
[41] J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, D. Lin, Libra R-CNN: Towards Balanced Learning for Object Detection, in: 2019: pp. 821–830. https://openaccess.thecvf.com/content_CVPR_2019/html/Pang_Libra_R-CNN_Towards_Balanced_Learning_for_Object_Detection_CVPR_2019_paper.html (accessed September 14, 2021).
[42] K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang, C.C. Loy, D. Lin, Hybrid Task Cascade for Instance Segmentation, in: 2019: pp. 4974–4983. https://openaccess.thecvf.com/content_CVPR_2019/html/Chen_Hybrid_Task_Cascade_for_Instance_Segmentation_CVPR_2019_paper.html (accessed September 14, 2021).
[43] Y. Li, Y. Chen, N. Wang, Z. Zhang, Scale-Aware Trident Networks for Object Detection, in: 2019: pp. 6054–6063. https://openaccess.thecvf.com/content_ICCV_2019/html/Li_Scale-Aware_Trident_Networks_for_Object_Detection_ICCV_2019_paper.html (accessed September 14, 2021).
[44] Multi-scale Positive Sample Refinement for Few-Shot Object Detection | SpringerLink, (n.d.). https://link.springer.com/chapter/10.1007/978-3-030-58517-4_27 (accessed September 14, 2021).
[45] M. Tan, R. Pang, Q.V. Le, EfficientDet: Scalable and Efficient Object Detection, in: 2020: pp. 10781–10790. https://openaccess.thecvf.com/content_CVPR_2020/html/Tan_EfficientDet_Scalable_and_Efficient_Object_Detection_CVPR_2020_paper.html (accessed September 14, 2021).
[46] M. Tan, R. Pang, Q.V. Le, EfficientDet: Scalable and Efficient Object Detection, in: 2020: pp. 10781–10790. https://openaccess.thecvf.com/content_CVPR_2020/html/Tan_EfficientDet_Scalable_and_Efficient_Object_Detection_CVPR_2020_paper.html (accessed September 14, 2021).
[47] P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, L. Li, Z. Yuan, C. Wang, P. Luo, Sparse R-CNN: End-to-End Object Detection With Learnable Proposals, in: 2021: pp. 14454–14463. https://openaccess.thecvf.com/content/CVPR2021/html/Sun_Sparse_R-CNN_End-to-End_Object_Detection_With_Learnable_Proposals_CVPR_2021_paper.html (accessed September 14, 2021).
[48] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going Deeper With Convolutions, in: 2015: pp. 1–9. https://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Szegedy_Going_Deeper_With_2015_CVPR_paper.html (accessed September 14, 2021).
[49] A. Pramanik, S.K. Pal, J. Maiti, P. Mitra, Granulated RCNN and Multi-Class Deep SORT for Multi-Object Detection and Tracking, IEEE Trans. Emerg. Top. Comput. Intell. (2021) 1–11. https://doi.org/10.1109/TETCI.2020.3041019.