International Journal of Intelligent Systems and Applications(IJISA)

ISSN: 2074-904X (Print), ISSN: 2074-9058 (Online)

Published By: MECS Press

IJISA Vol.11, No.4, Apr. 2019

Sky-CNN: A CNN-based Learning Approach for Skyline Scene Understanding

Full Text (PDF, 875KB), PP.14-25

Views:110   Downloads:9


Ameni Sassi, Wael Ouarda, Chokri Ben Amar, Serge Miguet

Index Terms

Convolutional Neural Network;deep learning;scene categorization;skyline;features representation;deep learned features


Skyline scenes are a scientific matter of interest for some geographers and urbanists. These scenes have not been well-handled in computer vision tasks. Understanding the context of a skyline scene could refer to approaches based on hand-crafted features combined with linear classifiers; which are somewhat side-lined in favor of the Convolutional Neural Networks based approaches. In this paper, we proposed a new CNN learning approach to categorize skyline scenes. The proposed model requires a pre-processing step enhancing the deep-learned features and the training time. To evaluate our suggested system; we constructed the SKYLINEScene database. This new DB contains 2000 images of urban and rural landscape scenes with a skyline view. In order to examine the performance of our Sky-CNN system, many fair comparisons were carried out using well-known CNN architectures and the SKYLINEScene DB for tests. Our approach shows it robustness in Skyline context understanding and outperforms the hand-crafted approaches based on global and local features.

Cite This Paper

Ameni Sassi, Wael Ouarda, Chokri Ben Amar, Serge Miguet, "Sky-CNN: A CNN-based Learning Approach for Skyline Scene Understanding", International Journal of Intelligent Systems and Applications(IJISA), Vol.11, No.4, pp.14-25, 2019. DOI: 10.5815/ijisa.2019.04.02


[1]Wei, X., Phung, S.L., Bouzerdoum, A.: ‘Visual descriptors for scene categorization: experimental evaluation’, Artificial Intelligence Review, 2016, 45, (3), pp. 333–368. Available from:

[2]Sassi, A., Amar, C.B., Miguet, S. ‘Skyline-based approach for natural scene identification’. In: 13th IEEE/ACS International Conference of Computer Systems and Applications, AICCSA 2016, Agadir, Morocco, November 29 - December 2, 2016. pp. 1–8.

[3]Day, A.: ‘Urban visualization and public inquiries: the case of the heron tower, london’, Architectural Research Quarterly, 2002, 6, (4), pp. 363–372

[4]III, A.S., Nasar, J.L., Hanyu, K.: ‘Using pre-construction validation to regulate urban skylines’, Journal of the American Planning Association, 2005, 71, (1), pp. 73–91

[5]Nasar, J.L., Terzano, K.: ‘The desirability of views of city skylines after dark’, Journal of Environmental. Psychology, 2010, 30, (2), pp. 215 – 225

[6]Ayadi, M., Suta, L., Scuturici, M., Miguet, S., Ben.Amar, C. In: Blanc.Talon, J., Distante, C., Philips, W., Popescu, D., Scheunders, P., editors. ‘A parametric algorithm for skyline extraction’. (Cham: Springer International Publishing, 2016. pp. 604–615

[7]Tonge, R., Maji, S., Jawahar, C.V. ‘Parsing world’s skylines using shape-constrained mrfs’. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition 2014, pp. 3174–3181

[8]Sassi, A., Ouarda, W., Ben.Amar, C., Miguet, S. ‘Neural Approach for Context Scene Image Classification based on Geometric, Texture and Color Information’. In: Representation, analysis and recognition of shape and motion FroM Image data. (Aussois, France: RFIA, 2017. Availablefrom:

[9]Yassin, F.M., Lazzez, O., Ouarda, W., Alimi, A.M. ‘Travel user interest discovery from visual shared data in social networks’. In: 2017 Sudan Conference on Computer Science and Information Technology (SCCSIT),  pp. 1–7 

[10]Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. ‘Going deeper with convolutions’. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. pp. 1–9 

[11]Zuo, Z., Shuai, B., Wang, G., Liu, X., Wang, X., Wang, B., et al. ‘Convolutional recurrent neural networks: Learning spatial dependencies for image representation’. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2015. pp. 18–26

[12]Gong, Y., Wang, L., Guo, R., Lazebnik, S.: ‘Multi-scale orderless pooling of deep convolutional activation features’, CoRR, 2014, abs/1403.1840. Available from:

[13]Krizhevsky, A., Sutskever, I., Hinton, G.E. ‘Imagenet classification with deep convolutional neural networks’. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. NIPS’12. (USA: Curran Associates Inc., 2012. pp. 1097–1105

[14]Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: ‘Image classification with the fisher vector: Theory and practice’, Int J Comput Vision, 2013, 105, (3), pp. 222–245

[15]Yang, J., Yu, K., Gong, Y., Huang, T.S. ‘Linear spatial pyramid matching using sparse coding for image classification’. In: CVPR. (IEEE Computer Society, 2009. pp. 1794–1801 

[16]Xiao, J., Ehinger, K.A., Hays, J., Torralba, A., Oliva, A.: ‘Sun database: Exploring a large collection of scene categories’, International Journal of Computer Vision, 2016, 119, (1), pp. 3–22

[17]Oliva, A. & Torralba, A.: ‘Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope’. In: International Journal of Computer Vision, 2001, 42: 145–147.

[18]Ojala, T., PietikÃd’inen, M., Harwood, D.: ‘A comparative study of texture measures with classification based on featured distributions’, Pattern Recognition, 1996, 29, (1), pp. 51 – 59 

[19]Huttunen, S., Rahtu, E., Kunttu, I., Gren, J., Heikkilä, J. In: Heyden, A., Kahl, F., editors. ‘Real-time detection of landscape scenes’. (Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. pp. 338–347

[20]Han, X., Chen, Y. ‘Image categorization by learned PCA subspace of combined visual-words and low-level features’. In: Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2009), Kyoto, Japan, 12-14 September, 2009, Proceedings, 2009. pp. 1282–1285

[21]Serrano, N., Savakis, A.E., Luo, J.: ‘Improved scene classification using efficient low-level features and semantic cues’, Pattern Recognition, 2004, 37, (9), pp. 1773– 1784

[22]Vailaya, A., Jain, A., Zhang, H.J.: ‘On image classification: City images vs. landscapes’, Pattern Recognition, 1998, 31, (12), pp. 1921 – 1935 

[23]Chen, Z., Chi, Z., Fu, H. ‘A hybrid holistic/semantic approach for scene classification’. In: 22nd International Conference on Pattern Recognition, ICPR 2014, Stockholm, Sweden, August 24-28, 2014. (, 2014. pp. 2299–2304

[24]Lecun, Y., Bottou, L., Bengio, Y., Haffner, P. ‘Gradient-based learning applied to document recognition’. In: Proceedings of the IEEE. (, 1998. pp. 2278–2324

[25]He, K., Zhang, X., Ren, S., Sun, J. ‘Deep residual learning for image recognition’. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. pp. 770–778

[26]Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. ‘Tensorflow: A system for large-scale machine learning’. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation. OSDI’16. (Berkeley, CA, USA: USENIX Association, 2016. pp. 265–283. Available from:

[27]Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., et al.: ‘A survey on deep learning in medical image analysis’, Medical Image Analysis, 2017, 42, pp. 60 – 88. Available from:

[28]Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: ‘Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40, (4), pp. 834–848

[29]Balduzzi, D., Frean, M., Leary, L., Lewis, J.P., Ma, K.W., McWilliams, B.: ‘The shattered gradients problem: If resnets are the answer, then what is the question?’, CoRR, 2017, Available from:

[30]Philipp, G., Song, D., Carbonell, J.G.. ‘Gradients explode - deep networks are shallow - resnet explained’, 2018. Available from: https://openreview. net/forum?id=HkpYwMZRb

[31]Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: ‘Rethinking the inception architecture for computer vision’, CoRR, 2015, abs/1512.00567. Available from:

[32]He, K., Zhang, X., Ren, S., Sun, J.: ‘Identity mappings in deep residual networks’, CoRR, 2016, abs/1603.05027. Available from:

[33]Hiippala, T. ‘Recognizing military vehicles in social media images using deep learning’. In: 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), 2017. pp. 60–65

[34]Alvarez, S., Vanrell, M.: ‘Texton theory revisited: A bag-of-words approach to combine textons’, Pattern Recognition, 2012, 45, (12), pp. 4312– 4325.