Binary vs. Multiclass Sentiment Classification for Bangla E-commerce Product Reviews: A Comparative Analysis of Machine Learning Models

Full Text (PDF, 863KB), PP.48-63

Views: 0 Downloads: 0

Author(s)

Shakib Sadat Shanto 1 Zishan Ahmed 1 Nisma Hossain 1 Auditi Roy 1 Akinul Islam Jony 1,*

1. Department of Computer Science, American International University Bangladesh, Kuratoli, Khilkhet, Dhaka 1229, Bangladesh

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2023.06.04

Received: 23 Mar. 2023 / Revised: 18 May 2023 / Accepted: 17 Jul. 2023 / Published: 8 Dec. 2023

Index Terms

Sentiment analysis, E-Commerce, Binary classification, Multiclass classification, Natural language processing, Feature extraction, Accuracy

Abstract

Sentiment analysis, the process of determining the emotional tone of a text, is essential for comprehending user opinions and preferences. Unfortunately, the majority of research on sentiment analysis has focused on reviews written in English, leaving a void in the study of reviews written in other languages. This research focuses on the understudied topic of sentiment analysis of Bangla-language product reviews. The objective of this study is to compare the performance of machine learning models for binary and multiclass sentiment classification in the Bangla language in order to gain a deeper understanding of user sentiments regarding e-commerce product reviews. Creating a dataset of approximately one thousand Bangla product reviews from the e-commerce website 'Daraz', we classified sentiments using a variety of machine learning algorithms and natural language processing (NLP) feature extraction techniques such as TF-IDF, Count Vectorizer with N-gram methods. The overall performance of machine learning models for multiclass sentiment classification was lower than binary class sentiment classification. In multiclass sentiment classification, Logistic Regression with bigram count vectorizer achieved the maximum accuracy of 82.64%, while Random Forest with unigram TF-IDF vectorizer achieved the highest accuracy of 94.44%. Our proposed system outperforms previous multiclass sentiment classification techniques by a fine margin.

Cite This Paper

Shakib Sadat Shanto, Zishan Ahmed, Nisma Hossain, Auditi Roy, Akinul Islam Jony, "Binary vs. Multiclass Sentiment Classification for Bangla E-commerce Product Reviews: A Comparative Analysis of Machine Learning Models", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.15, No.6, pp. 48-63, 2023. DOI:10.5815/ijieeb.2023.06.04

Reference

[1]W. Medhat, A. Hassan, and H. J. A. S. e. j. Korashy, "Sentiment analysis algorithms and applications: A survey," Ain Shams Eng J, vol. 5, no. 4, pp. 1093-1113, April. 2014, doi: 10.1016/j.asej.2014.04.011.
[2]C. O. Alm, D. Roth, and R. Sproat, "Emotions from text: machine learning for text-based emotion prediction," in Proceedings of human language technology conference and conference on empirical methods in natural language processing, 2005, pp. 579-586, doi: 10.3115/1220575.1220648.
[3]P. Gonçalves, M. Araújo, F. Benevenuto, and M. Cha, "Comparing and combining sentiment analysis methods," in Proceedings of the first ACM conference on Online social networks, 2013, pp. 27-38, doi: 10.1145/2512938.2512951.
[4]K. A. Hasan, S. Islam, G. Mashrur-E-Elahi, and M. N. Izhar, "Sentiment recognition from bangla text," in Technical Challenges and Design Issues in Bangla Language Processing: IGI Global, 2013, pp. 315-327, doi: 10.4018/978-1-4666-3970-6.ch014.
[5]K. A. Hasan, M. S. Sabuj, and Z. Afrin, "Opinion mining using naive bayes," in 2015 IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE), Dec. 2015, pp. 511-514, doi: 10.1109/WIECON-ECE.2015.744398.
[6]F. Haque, M. M. H. Manik, and M. Hashem, "Opinion mining from bangla and phonetic bangla reviews using vectorization methods," in 2019 4th International Conference on Electrical Information and Communication Technology (EICT), Dec. 2019, pp. 1-6, doi: 10.1109/EICT48899.2019.9068834.
[7]M. T. Akter, M. Begum, and R. Mustafa, "Bengali sentiment analysis of E-commerce product reviews using K-Nearest neighbors," in 2021 International conference on information and communication technology for sustainable development (ICICT4SD), April. 2021, pp. 40-44, doi: 10.1109/ICICT4SD50815.2021.9396910.
[8]T. Ahmed, S. F. Mukta, T. Al Mahmud, S. Al Hasan, and M. G. Hussain, "Bangla Text Emotion Classification using LR, MNB and MLP with TF-IDF & CountVectorizer," in 2022 26th International Computer Science and Engineering Conference (ICSEC), Dec. 2022, pp. 275-280, doi: 10.1109/ICSEC56337.2022.10049341.
[9]S. A. Mahtab, N. Islam, and M. M. Rahaman, "Sentiment analysis on bangladesh cricket with support vector machine," in 2018 international conference on Bangla speech and language processing (ICBSLP), Sep. 2018, pp. 1-4, doi: 10.1109/ICBSLP.2018.8554585.
[10]R. R. Chowdhury, M. S. Hossain, S. Hossain, and K. Andersson, "Analyzing sentiment of movie reviews in bangla by applying machine learning techniques," in 2019 International Conference on Bangla Speech and Language Processing (ICBSLP), Sep. 2019, pp. 1-6, doi: 10.1109/ICBSLP47725.2019.201483.
[11]O. Sharif, M. M. Hoque, and E. Hossain, "Sentiment analysis of Bengali texts on online restaurant reviews using multinomial Naïve Bayes," in 2019 1st international conference on advances in science, engineering and robotics technology (ICASERT), May. 2019, pp. 1-6, doi: 10.1109/ICASERT.2019.8934655.
[12]M. A. Shafin, M. M. Hasan, M. R. Alam, M. A. Mithu, A. U. Nur, and M. O. Faruk, "Product review sentiment analysis by using NLP and machine learning in Bangla language," in 2020 23rd International Conference on Computer and Information Technology (ICCIT), Dec. 2020, pp. 1-5, doi: 10.1109/ICCIT51783.2020.9392733.
[13]M. E. Khatun and T. Rabeya, "A Machine Learning Approach for Sentiment Analysis of Book Reviews in Bangla Language," in 2022 6th International Conference on Trends in Electronics and Informatics (ICOEI), Apr. 2022, pp. 1178-1182, doi: 10.1109/ICOEI53556.2022.9776752.
[14]M. Hassan et al., "Sentiment analysis on Bangla conversation using machine learning approach," Int J Elec & Comp Eng, vol. 12, no. 5, p. 5562, Oct. 2022, doi: 10.11591/ijece.v12i5.pp5562-5572.
[15]A. Aizawa, "An information-theoretic perspective of tf–idf measures," Information Processing Management, vol. 39, no. 1, pp. 45-65, Jan. 2003, doi: 10.1016/S0306-4573(02)00021-3.
[16]M. Garg, "UBIS: Unigram bigram importance score for feature selection from short text," Expert Systems with Applications, vol. 195, p. 116563, Jun. 2022, doi: 10.1016/j.eswa.2022.116563.
[17]M. Maalouf, "Logistic regression in data analysis: an overview," International Journal of Data Analysis Techniques Strategies, vol. 3, no. 3, pp. 281-299, Jul. 2011, doi: 10.1504/IJDATS.2011.041335.
[18]Y.-Y. Song and L. Ying, "Decision tree methods: applications for classification and prediction," Shanghai archives of psychiatry, vol. 27, no. 2, p. 130, Apr. 2015, doi: 10.11919/j.issn.1002-0829.215044.
[19]G. Biau and E. Scornet, "A random forest guided tour," Test, vol. 25, pp. 197-227, Apr. 2016, doi: 10.1007/s11749-016-0481-7.
[20]L. Jiang, S. Wang, C. Li, and L. Zhang, "Structure extended multinomial naive Bayes," Information Sciences, vol. 329, pp. 346-356, Feb. 2016, doi: 10.1016/j.ins.2015.09.037.
[21]Z. Deng, X. Zhu, D. Cheng, M. Zong, and S. Zhang, "Efficient kNN classification algorithm for big data," Neurocomputing, vol. 195, pp. 143-148, Jun. 2016, doi: 10.1016/j.neucom.2015.08.112.
[22]V. K. Chauhan, K. Dahiya, and A. Sharma, "Problem formulations and solvers in linear SVM: a review," Artificial Intelligence Review, vol. 52, no. 2, pp. 803-855, Jan. 2019, doi: 10.1007/s10462-018-9614-6.
[23]P. Netrapalli, "Stochastic gradient descent and its variants in machine learning," Journal of the Indian Institute of Science, vol. 99, no. 2, pp. 201-213, Jan. 2019, doi: 10.1007/s41745-019-0098-4.
[24]R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, "The impact of features extraction on the sentiment analysis," Procedia Computer Science, vol. 152, pp. 341-348, 2019, doi: 10.1016/j.procs.2019.05.008.
[25]T. Ghosal, S. K. Das, and S. Bhattacharjee, "Sentiment analysis on (Bengali horoscope) corpus," in 2015 Annual IEEE India Conference (INDICON), Dec. 2015, pp. 1-6, doi: 10.1109/INDICON.2015.7443551.
[26]R. A. Laksono, K. R. Sungkono, R. Sarno, and C. S. Wahyuni, "Sentiment analysis of restaurant customer reviews on tripadvisor using naïve bayes," in 2019 12th international conference on information & communication technology and system (ICTS), Jul. 2019, pp. 49-54, doi: 10.1109/ICTS.2019.8850982.