A Comparative Study: Best Machine Learning Algorithm for Social Media Sentiment Analysis

Manthrirathna, M.A.L.; Weerakoon, W.M.H.G.T.C.K.; Rathnayaka, R.M.K.T.

A Comparative Study: Best Machine Learning Algorithm for Social Media Sentiment Analysis

dc.contributor.author	Manthrirathna, M.A.L.
dc.contributor.author	Weerakoon, W.M.H.G.T.C.K.
dc.contributor.author	Rathnayaka, R.M.K.T.
dc.date.accessioned	2021-02-01T06:35:28Z
dc.date.available	2021-02-01T06:35:28Z
dc.date.issued	2020
dc.description.abstract	Sentiment analysis is a field of study that aims to derive the sentiment or the opinion of a text using natural language processing techniques. Performing sentiment analysis on Twitter data has a vast number of applications including predicting stock market prices, product recommendations, etc. Sentiment analysis can be done in lexicon-based, machine learning-based, or hybrid approaches. K Nearest Neighbor, Support Vector Machine, Logistic Regression, Naïve Bayes, K Means Clustering, Decision Trees, and Random Forest are the few most popular machine learning algorithms. This study aims to conduct a comparative analysis among the usage of K Nearest Neighbor, Support Vector Machine, Logistic Regression, and Multinomial Naïve Bayes machine learning algorithms combined with sentword net lexicon to suggest which one provides the best accuracy in sentiment classification of Twitter data. A data set of 1028 tweets was acquired using the Twitter Standard Search API (Application Programming Interface) and Tweepy python library. The name of a popular brand of mobile phones was used to search for tweets. 570 tweets remained after the duplication removal and cleaning process. Then the remaining data was classified as positive, negative, or neutral using sentiword net lexicon and used to train selected machine learning algorithms.80% of the data was used for training and 20% was used for testing. Word counts in the tweets were used as features. Multinomial Naïve Bayes is proved to be the best machine learning algorithm with a model accuracy of 74.56% and K Nearest Neighbor (k=3) is the worst-performing algorithm with an accuracy of 54.38%. Logistic Regression and Support Vector Machine (linear kernel) respectively had accuracies: 72.80% and 70.17%. The result of this research proves Multinomial Naïve Bayes performs relatively better in Twitter sentiment analysis than K Nearest Neighbor, Support Vector Machine, Logistic Regression. This is because two basic assumptions for applying the Multinomial Naïve Bayes algorithm: feature independency and multinomial distribution are well satisfied by the features selected for this study. Also, Multinomial Naïve Bayes can perform well with high dimensional data like tweet text. On the other hand, the poor performance of the K Nearest Neighbor is due to the same reason. K Nearest Neighbor cannot handle a large number of features very well. Keywords: Sentiment analysis, Twitter, Hybrid approach, Machine learning algorithms, Comparative analysis.	en_US
dc.identifier.isbn	9789550481293
dc.identifier.uri	http://www.erepo.lib.uwu.ac.lk/bitstream/handle/123456789/5716/proceeding_oct_08-193.pdf?sequence=1&isAllowed=y
dc.language.iso	en	en_US
dc.publisher	Uva Wellassa University of Sri Lanka	en_US
dc.relation.ispartofseries	;International Research Conference
dc.subject	Computer Science	en_US
dc.subject	Social Media	en_US
dc.subject	Information Science	en_US
dc.subject	Computing and Information Management	en_US
dc.title	A Comparative Study: Best Machine Learning Algorithm for Social Media Sentiment Analysis	en_US
dc.title.alternative	International Research Conference 2020	en_US
dc.type	Other	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: proceeding_oct_08-193.pdf
Size:: 31.22 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

International Research Conference of UWU-2020