Smart SMS Classification for Android Operating System Using Natural Language Processing
No Thumbnail Available
Date
2020
Journal Title
Journal ISSN
Volume Title
Publisher
Uva Wellassa University of Sri Lanka
Abstract
The use of Short Message Service (SMS) is increasing as more people exchange SMS
messages very frequently due to the rapid increase of mobile phone usage and the
simplicity in sending SMS messages. However, this has led to an increase in mobile
device attacks using SMS Spam. The two main categories of SMS Messages are spam
messages and ham (legitimate) messages. Up to now, several kinds of research were done
on SMS classification but all of them are on spam filtering techniques by using various
algorithms and machine learning techniques. In this paper, we present a novel approach
that can detect and filter both spam and ham messages into a better organization under six
different predefined categories named as Primary for legitimate messages, Bank and
Finance, Social and Web, Promotions, Service Provider Messages, and Spam Messages
by using Natural Language Processing for Android Operating System. A smart messaging
application that can properly organize SMS into categories will help to identify the SMS
easily as they are classified under different tabs. Even though SMS can be identified and
categorized manually with little or no effort by people, it remains difficult for mobile
phones. A dataset is created according to the Sri Lankan context and various experiments
are performed to evaluate the performance of the SMS Classification. Initially, the
features were selected based on the behavior of messages and extracted the features from
the dataset to get the feature vectors. Naive Bayes and Support Vector Machines
algorithms were used to select the best classification algorithm. With the highest accuracy
rate, the Support Vector Machines algorithm is selected to train the model while k-Fold
cross-validation is used to perform the validation. Our proposed approach achieved a 93%
accuracy rate and the model is deployed in the Android environment and its performance
is confirmed using a proof of concept.
Keywords: SMS classification, Natural language processing, Support vector machines,
Naive bayes algorithm, Android
Description
Keywords
Computer Science, Information Science, Computing and Information Management, Telecommunication