International Journal of Information Engineering and Electronic Business(IJIEEB)

ISSN: 2074-9023 (Print), ISSN: 2074-9031 (Online)

Published By: MECS Press

IJIEEB Vol.10, No.5, Sep. 2018

A Study on Test Variable Selection and Balanced Data for Cervical Cancer Disease

Full Text (PDF, 519KB), PP.1-7

Views:33   Downloads:1


Kemal Akyol

Index Terms

Cervical cancer;the importance of test variables;random over-sampling; random under-fitting;stability selection;random forest


Cancer is a pestilent disease. One of the most important cancer kinds, cervical cancer is a malignant tumor which threats women's life. In this study, the importance of test variables for cervical cancer disease is investigated by utilizing Stability Selection method. Also, Random Under-Sampling and Random Over-Sampling methods are implemented on the dataset. In this context, the learning model is designed by using Random Forest algorithm. The experimental results show that Stability Selection, Random Over-Sampling and Random Forest based model are more successful, approximately 98% accuracy.

Cite This Paper

Kemal Akyol," A Study on Test Variable Selection and Balanced Data for Cervical Cancer Disease", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.10, No.5, pp. 1-7, 2018. DOI: 10.5815/ijieeb.2018.05.01


[1]E.L. Dickson, R.I. Vogel, X. Luo, L.S. Downs, “Recent trends in typespecific HPV infection rates in the United States,” Epidemiol Infect, vol. 143, no. 5, pp. 1042-1047, 2015.

[2]O.W. Brawley and S.G. Cowal, “Civil society’s role in efforts to control women’s cancers,” Lancet, vol. 389, no. 10071, pp. 775-776, 2017.

[3]I.D. Duncan, “Cervical screening,” The Obstetrician & Gynaecologist, vol. 6, no. pp. 93–97, 2004.

[4]H. Demirhindi, E. Nazlıcan, M. Akbaba, “Cervical cancer screening in Turkey: A community-based experience after 60 years of Pap smear usage,” Asian Pac J Cancer P, vol. 13, no.12, pp. 6497-6500, 2012.

[5]“Turkish Cervical Cancer and Cervical Cytology Research Group. Prevalence of cervical cytological abnormalities in Turkey,” Int J Gynaecol Obstet, vol. 106, no.3, pp. 206-209, 2009. 

[6]N. Gökgöz and D. Aktaş, “Determination of women awareness level of cervical cancer & conducting Pap-Smear Test,” Yildirim Beyazit Universitesi Hemşirelik E-Dergisi, vol. 3, pp.11-23, 2015.

[7]G. Ruzigana, L. Bazzet-Matabele, S. Rulisa, A.N. Martin, R.G. Ghebre, “Cervical cancer screening at a tertiary care center in Rwanda,” Gynecol Oncol Rep, vol. 21, pp.13-16, 2017.

[8]M. Başaran, A. Başaran and Z. Küçükaydın, “Restaging in cervical cancer,” Turkiye Klinikleri J Gynecol Obst-Special Topics, vol. 8, no.1, pp. 117-127, 2015.

[9]C. Eroglu, R. Keşli, M.A. Eryılmaz, Y. Ünlü, O. Gönenç, Ç. Çelik, “Serviks kanseri için riskli olan kadınlarda HPV tiplendirmesi ve HPV sıklığının risk faktörleri ve servikal smearle ilişkisi,” Nobel Medicus, vol. 7,  no.3, pp.72-77, 2011.

[10]L.H. Aktun, Y. Aykanat, F. Gökdağlı-Sağır, “Are cervicovaginal smear tests reliable during pregnancy?” Medeniyet Medical Journal, vol. 32, no.2, pp. 111-114, 2017.

[11]L. Denny, S. de Sanjose, M. Mutebi, B.O. Anderson, Kim J, Jeronimo J, Herrero R, Yeates K, O. Ginsburg, R. Sankaranarayanan, “Interventions to close the divide for women with breast and cervical cancer between low-income and middle-income countries and high-income countries,” Lancet, vol. 389, no. 10071, pp.861-870, 2017. 

[12]B.F. Lees, B.K. Erickson, W.K. Huh, “Cervical cancer screening: evidence behind the guidelines,” Am J Obstet Gynecol, vol. 214, no.4, pp. 438-443, 2016.

[13]E. Nazlıcan, M. Akbaba, H. Koyuncu, N. Savaş, B. Karaca, “Cervical cancer screening between 35-40 aged women at Kisecik region of Hatay provinence,” TAF Preventive Medicine Bulletin, vol.9, no.5, pp. 471-474, 2010.

[14]E. Fusco, F. Padula, E. Mancini, A. Cavalieri, G. Grubisic, “History of colposcopy: a brief biography of Hinselmann,” Journal of Prenatal Medicine, vol. 2, no.2, pp. 19-23, 2008.

[15]A. Singer, J.M. Monaghan, S.C. Quek, “Lower genital tract precancer colposcopy, pathology and treatment,” 2nd ed. Wiley: Blackwell Science, 2008.

[16]J.S. Bentz, “Liquid-based cytology for cervical cancer screening,” Expert Rev Mol Diagn, vol. 5, no.6, pp. 857-871, 2005.

[17]S.B. Kaveri, S. Khandelwal, “Role of Pap smear N cervical biopsy in unhealthy cervix,” Journal of Scientific and Innovative Research, vol.4, no.1, pp.4-9, 2015. 

[18]D.J. Dittman, T.M. Khoshgoftaar, R. Wald, A. Napolitano,  “Comparison of data sampling approaches for imbalanced bioinformatics data,” Proceedings of the Twenty-Seventh International Florida Artificial Intelligence Research Society Conference, May 21-23, Florida, 2014.

[19]A.O. Durahim, “Comparison of sampling techniques for imbalanced learning,” Yönetim Bilişim Sistemleri Dergisi, vol. 1, no. 3, pp. 181-191, 2016.

[20]U. R. Salunkhe, S. N. Mali, "A Hybrid Approach for Class Imbalance Problem in Customer Churn Prediction: A Novel Extension to Under-sampling", International Journal of Intelligent Systems and Applications (IJISA), Vol.10, No.5, pp.71-81, 2018. DOI: 10.5815/ijisa.2018.05.08

[21]T. Sumadhi, M. Hemalatha, “An Enhanced Approach for Solving Class Imbalance Problem in Automatic Image Annotation,” International Journal of Image, Graphics and Signal Processing (IJIGSP), vol.5, no.2, pp.9-16, 2013.DOI: 10.5815/ijigsp.2013.02.02

[22]H. Kaur, Er. P. Verma, “E-Mail Spam Detection Using Refined MLP with Feature Selection,” International Journal of Modern Education and Computer Science (IJMECS), vol.9, no.9, pp. 42-52, 2017. DOI: 10.5815/ijmecs.2017.09.05

[23]S. Goswami, S. Chakraborty, H. N. Saha, "An Univariate Feature Elimination Strategy for Clustering Based on Metafeatures", International Journal of Intelligent Systems and Applications (IJISA), vol.9, no.10, pp.20-30, 2017. DOI: 10.5815/ijisa.2017.10.03

[24]F. Mordelet, J. Horton, A.J. Hartemink, B.E. Engelhardt and R. Gordân, “Stability selection for regression-based models of transcription factor–DNA binding specificity,” Bioinformatics, vol. 29, no.13, pp. i117–i125, 2013.  

[25]M. Kumar, A.J. Singh, "Evaluation of Data Mining Techniques for Predicting Student’s Performance", International Journal of Modern Education and Computer Science (IJMECS), Vol.9, No.8, pp.25-31, 2017.DOI: 10.5815/ijmecs.2017.08.04

[26]L. Breiman, “Random forests,” Mach Learn, vol. 45, pp. 5-32, 2001.

[27]O. Akar and O. Gungor, “Classification of multispectral images using random forest algorithm,” Journal of Geodesy and Geoinformation, vol. 1, pp. 139-146, 2012.

[28]S.A. Shaikh, Measures derived from a 2x2 table for an accuracy of a diagnostic test. J Biom Biostat, vol. 2, no. 128, pp. 1-4, 2011.