Short Text Topic Modelling using Non-negative Matrix Factorization with Neighbourhood-based Assistance

Athukorala, W.S.; Mohotti, W.A.

Short Text Topic Modelling using Non-negative Matrix Factorization with Neighbourhood-based Assistance

dc.contributor.author	Athukorala, W.S.
dc.contributor.author	Mohotti, W.A.
dc.date.accessioned	2022-09-01T04:57:20Z
dc.date.available	2022-09-01T04:57:20Z
dc.date.issued	2021
dc.description.abstract	A massive number of short texts are generated every day in the forms of tweets, news headlines, questions, and answers. Analyzing short texts is an effective method to acquire valuable insights from these online archives that show diverse applications in community detection, trend analysis, classification, and summarization. Topic modeling is a widely used technique for this purpose as it is capable of latent topic discovery, and finding relationships among terms, topics, and text documents. In discovering thematic structure in collections of texts, a higher number of terms appear in the document × term matrix representation and associated sparseness creates issues for distance-based and density-based document similarities calculations. This phenomenon is known as distance concentration where the distance differences between points become negligible due to sparseness in high dimensions. Additionally, the short text shows a shorter length compared to conventional documents. This leads short texts to create extremely sparse, high-dimensional text and challenge finding documents that share the same topic structure within them. Non-negative Matrix Factorization (NMF) which is aligned with the natural non-negativity of text data is proposed as an effective technique that handles high dimensional representation with lower-dimensional projection. However, this higher-to-lower dimensional projection results in an information loss. This paper proposes Neighbourhood-based assistance to compensate for this loss. Neighborhood information within documents is captured using Jaccard similarity considering term sets included in the documents. We coupled a symmetric document × document matrix that carries this neighborhood information with the document × term matrix using NMF to identify the lower order topic × document matrix. This unsupervised method learns a dense lower-order topic presentation by minimizing the encoding error of factor matrices. We empirically evaluate the effectiveness of the method against the state-of-the-art short text topic modeling methods belongs to probabilistic and matrix factorization categories. Experimental results using three Twitter datasets show that the proposed approach is able to deal with information loss attached with higher dimensional matrix factorization of short-text and attain high accuracy compared to relevant benchmarking methods. Keywords: Topic Modelling; Short Text; Non-negative Matrix Factorization; Neighbourhood-based Assistance	en_US
dc.identifier.isbn	978-624-5856-04-6
dc.identifier.uri	http://www.erepo.lib.uwu.ac.lk/bitstream/handle/123456789/9582/Page%20117%20-%20IRCUWU2021-310%20-Athukorala-%20Short%20Text%20Topic%20Modelling%20using%20Non-negative%20Matrix%20Factorization%20with%20Neighbourhood.pdf?sequence=1&isAllowed=y
dc.language.iso	en	en_US
dc.publisher	Uva Wellassa University of Sri Lanka	en_US
dc.subject	Computing and Information Science	en_US
dc.subject	Computer Science	en_US
dc.subject	Language	en_US
dc.subject	Education	en_US
dc.title	Short Text Topic Modelling using Non-negative Matrix Factorization with Neighbourhood-based Assistance	en_US
dc.title.alternative	International Research Conference 2021	en_US
dc.type	Other	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Page 117 - IRCUWU2021-310 -Athukorala- Short Text Topic Modelling using Non-negative Matrix Factorization with Neighbourhood.pdf
Size:: 147.12 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

International Research Conference of UWU-2021