TY - JOUR
T1 - Tweets Topic Classification and Sentiment Analysis Based on Transformer-Based Language Models
AU - Mandal, Ranju
AU - Chen, Jinyan
AU - Becken, Susanne
AU - Stantic, Bela
N1 - Publisher Copyright:
© 2023 The Author(s).
PY - 2023/5/1
Y1 - 2023/5/1
N2 - People provide information on their thoughts, perceptions, and activities through a wide range of channels, including social media. The wide acceptance of social media results in vast volume of valuable data, in variety of format as well as veracity. Analysis of such 'big data' allows organizations and analysts to make better and faster decisions. However, this data had to be quantified and information has to be extracted, which can be very challenging because of possible data ambiguity and complexity. To address information extraction, many analytic techniques, such as text mining, machine learning, predictive analytics, and diverse natural language processing, have been proposed in the literature. Recent advances in Natural Language Understanding-based techniques more specifically transformer-based architectures can solve sequence-to-sequence modeling tasks while handling long-range dependencies efficiently. In this work, we applied transformer-based sequence modeling on short texts' topic classification and sentiment analysis from user-posted tweets. Applicability of models is investigated on posts from the Great Barrier Reef tweet dataset and obtained findings are encouraging providing insight that can be valuable for researchers working on classification of large datasets as well as large number of target classes.
AB - People provide information on their thoughts, perceptions, and activities through a wide range of channels, including social media. The wide acceptance of social media results in vast volume of valuable data, in variety of format as well as veracity. Analysis of such 'big data' allows organizations and analysts to make better and faster decisions. However, this data had to be quantified and information has to be extracted, which can be very challenging because of possible data ambiguity and complexity. To address information extraction, many analytic techniques, such as text mining, machine learning, predictive analytics, and diverse natural language processing, have been proposed in the literature. Recent advances in Natural Language Understanding-based techniques more specifically transformer-based architectures can solve sequence-to-sequence modeling tasks while handling long-range dependencies efficiently. In this work, we applied transformer-based sequence modeling on short texts' topic classification and sentiment analysis from user-posted tweets. Applicability of models is investigated on posts from the Great Barrier Reef tweet dataset and obtained findings are encouraging providing insight that can be valuable for researchers working on classification of large datasets as well as large number of target classes.
KW - deep learning
KW - natural language processing
KW - target classification
KW - topic classification
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85138216315&partnerID=8YFLogxK
U2 - 10.1142/S2196888822500269
DO - 10.1142/S2196888822500269
M3 - Article
AN - SCOPUS:85138216315
SN - 2196-8896
VL - 10
SP - 117
EP - 134
JO - Vietnam Journal of Computer Science
JF - Vietnam Journal of Computer Science
IS - 2
ER -