Cover page and Table of Contents: PDF (size: 988KB)
Full Text (PDF, 988KB), PP.60-73
Views: 0 Downloads: 0
Digital Footprint, Stacked Generalization, Machine Learning, Word Embedding, Stacking and Blending Ensemble, Sentiment Analysis, Neurological Disorder
Digital footprints track online behaviors of an individual when communicating over social media platforms. In this paper, sentiment classification is carried out over online posts and tweets to pre detect whether a person is having neurological disorder or not. This study proposed a Hybrid Optimized Model Ensemble STACKed (HOMESTACK) algorithm built on stacked generalization approach that uses stacking and blending ensemble learning technique. The model is then evaluated over two datasets (Reddit Dataset1 & Twitter Dataset2) that include varied number of tweets. The pre-processing of the data and feature extraction is carried out to get cleaned text and vector corpus. The proposed HOMESTACK algorithm is then applied over training data using four base classifiers as Support Vector, Random Forest, K-Nearest Neighbor and CatBoost along with a Meta classifier as Logistic Regression. The testing data is then fed to the tuned model to compare the classification results and analysis. Also, Stacking and Blending ensemble frameworks and algorithms are proposed in this study. Execution time and metric evaluation are calculated in respect of Accuracy, Precision, Recall and F1-score. The experimental results clearly show that the proposed HOMESTACK algorithm performed better over chosen datasets as compared to blending ensemble and standalone machine learning classifiers.
Tejaswita Garg, Sanjay K. Gupta, "A Novel Algorithm for Stacked Generalization Approach to Predict Neurological Disorder over Digital Footprints", International Journal of Modern Education and Computer Science(IJMECS), Vol.15, No.5, pp. 60-73, 2023. DOI:10.5815/ijmecs.2023.05.05
L. Yue, W. Chen, X. Li, W. Zuo, and M. Yin., 2019. A survey of sentiment analysis in social media. Knowl. Inf. Syst., vol. 60, no. 2, pp. 617-663.
F. Hemmatian and M. K. Sohrabi., 2019. A survey on classification techniques for opinion mining and sentiment analysis. Artif. Intell. Rev., vol. 52, no. 3, pp. 1495-1545.
NA, Reseena Mol, and S. Veni., 2022. A STACKED ENSEMBLE TECHNIQUE WITH GLOVE EMBEDDING MODEL FOR DEPRESSION DETECTION FROM TWEETS. Indian Journal of Computer Science and Engineering, Vol. 13 No. 2, e-ISSN : 0976-5166, p-ISSN : 2231-3850, DOI : 10.21817/indjcse/2022/v13i2/221302088
A. Tariyal, S. Goyal, and N. Tantububay., 2018. Sentiment Analysis of Tweets Using Various Machine Learning Techniques. Int. Conf. Adv. Comput. Telecommun. ICACAT 2018, pp. 2–4, 2018, doi: 10.1109/ICACAT.2018.8933612.
F. Almeida and G. Xexéo., 2019. Word embeddings: A survey. arXiv, no. 1991, 2019.
C. Troussas, A. Krouska and M. Virvou., 2016. Evaluation of ensemble-based sentiment classifiers for Twitter data, in 7th International Conference on Information, Intelligence, Systems & Applications (IISA), Chalkidiki, pp. 1-6.
Y. Emre Isik, Y. Görmez, O. Kaynar And Z. Aydin., 208. NSEM: Novel Stacked Ensemble Method for Sentiment Analysis, 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, pp. 1-4.
M. Naz, K. Zafar, A. Khan., 2019. Ensemble Based Classification of Sentiments Using Forest Optimization Algorithm, vol. 4, no. 2, pp. 1-13. https://doi.org/10.3390/data4020076
J. Prusa, T. M. Khoshgoftaar and D. J. Dittman., 2015. Using Ensemble Learners to Improve Classifier Performance on Tweet Sentiment Data. IEEE International Conference on Information Reuse and Integration, San Francisco, pp. 252-257.
Salur, M.U. and Aydin, I., 2020. A novel hybrid deep learning model for sentiment classification. IEEE Access 2020, 8, 58080–58093.
A. Sharma and W. J. M. I. Verbeke, 2020. Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset (n = 11,081). Front. Big Data, vol. 3, no. April, pp. 1–11. doi: 10.3389/fdata.2020.00015.
G. Geetha, G. Saranya, K. Chakrapani, J. G. Ponsam, M. Safa, and S. Karpagaselvi, 2020. Early Detection of Depression from Social Media Data Using Machine Learning Algorithms. ICPECTS 2020 - IEEE 2nd Int. Conf. Power, Energy, Control Transm. Syst. Proc., pp. 3–8. doi: 10.1109 / ICPECTS49113 .2020. 9336974.
S. Almouzini, M. Khemakhem, and A. Alageel., 2019. Detecting Arabic Depressed Users from Twitter Data. Procedia Comput. Sci., vol. 163, pp. 257–265. doi: 10.1016/j.procs.2019.12.107.
O. B. Deho, W. A. Agangiba, F. L. Aryeh, and J. A. Ansah., 2018. Sentiment analysis with word embedding. IEEE Int. Conf. Adapt. Sci. Technol. ICAST, vol. 2018-Augus, pp. 1–4. doi: 10.1109/ICASTECH.2018.8506717
Y. Al Amrani, M. Lazaar, and K. E. El Kadirp., 2018. Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Comput. Sci., vol. 127, pp. 511–520. doi: 10.1016/j.procs.2018.01.150.
Boser, B., Guyon, I., Vapnik, V., 1992. A Training Algorithm for Optimal Margin Classifiers. In: In ProceedIngs of the Fifth Annual Workshop on Computational LearnIng Theory, pp. 144–152.
Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32.
Sonal Singh, 2022. Leveraging Stacking Model to Identify Depression. INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 11, Issue 01 (January 2022),
Z. Wang, H. Ren, R. Lu and L. Huang., 2022. Stacking Based LightGBM-CatBoost-RandomForest Algorithm and Its Application in Big Data Modeling. 4th International Conference on Data-driven Optimization of Complex Systems (DOCS), pp. 1-6, doi: 10.1109/DOCS55193.2022.9967714.
Qiu, X.; Zhang, L.; Ren, Y.; Suganthan, P.N.; Amaratunga, G., 2014. Ensemble deep learning for regression and time series forecasting. In Proceedings of the 2014 IEEE symposium on Computational Intelligence in Ensemble Learning (CIEL), Orlando, FL, USA; pp. 1–6.
Ankit and N. Saleena., 2018. An Ensemble Classification System for Twitter Sentiment Analysis. Procedia Comput. Sci., vol. 132, no. Iccids, pp. 937–946. doi: 10.1016/j.procs.2018.05.109