Arkadip Ray

Work place: Department of Information Technology, Government College of Engineering and Ceramic Technology, Kolkata, West Bengal, 700010, India

E-mail: arka1dip2ray3@gmail.com

Website:

Research Interests: Data Mining, Network Security

Biography

Arkadip Ray received his B. Tech degree in Computer Science & Engineering from Government College of Engineering & Ceramic Technology (GCECT), Kolkata, India and completed M. Tech degree in Information Technology from GCECT with GATE scholarship. His current research interests include Digital Watermarking, Network Security, Data Mining, Machine Learning. He has 8 research publications in esteemed International and National journals.

Author Articles
A Multi-Stage Approach Combining Feature Selection with Machine Learning Techniques for Higher Prediction Reliability and Accuracy in Cervical Cancer Diagnosis

By Avijit Kumar Chaudhuri Arkadip Ray Dilip K. Banerjee Anirban Das

DOI: https://doi.org/10.5815/ijisa.2021.05.05, Pub. Date: 8 Oct. 2021

Cervical cancer is the fourth most prevalent cancer in women which has claimed 3,41,831 lives and accounted for 6,04,127 new cases in 2020 worldwide. To reduce such a vast mortality rate, early detection of the disease is essential. A fast, accurate, and interpretable machine learning model is a research subject. Fewer features reduce the computational effort and improve interpretation. A 3-Stage Hybrid feature selection approach and a Stacked Classification model are evaluated on the cervical cancer dataset obtained from the UCI Machine Learning Repository with 35 features and one outcome variable. Stage-1 uses a Genetic Algorithm and Logistic Regression Architecture for Feature Selection and selects twelve features well correlated with the class but not among themselves. Stage-2 utilizes the same Genetic Algorithm and Logistic Regression Architecture for Feature Selection to select five features. In Stage-3, Logistic Regression (LR), Naïve Bayes (NB), Support Vector Machine (SVM), Extra Trees (ET), Random Forest (RF), and Gradient Boosting (GDB) are used with the five features to identify patients with or without cancer. Data splitting, several metrics, and statistical tests are used, along with 10-fold cross validation, to do a comparative analysis. LR, NB, SVM, ET, RF, and GDB demonstrate improvement across performance measures by reducing the number of features to five. In the 66-34 split, all five machine learning methods except NB recorded 97% accuracy with 5 features. Also, the Stacked model produced higher than 96% accuracy with five features in 66-34 and 80-20 splits, and in 10-fold cross validation. Various performance aggregators have shown improved results with reduced features when compared to previous studies. Finally, with approximately 100% performance in classification results, the suggested ensemble model showed its promise. The output results were compared to those of other studies on the same dataset, and the proposed classifiers were found to be the most effective across all performance dimensions.

[...] Read more.
Other Articles