Abstract
Affect tasks, which range from sentiment polarity classification to finer grained sentiment strength and emotional intensity detection, have become of increasing interest due to the vast amount of user-generated content and advanced learning models. Word representation models have been leveraged effectively within a variety of natural language processing tasks. However, these models are not always effective in the context of social media. When dealing with social media posts in Arabic, the use of Arabic dialects needs to be considered. Although using informal text to train word-level models can lead to the identification of words that convey the same meaning, these models are unable to capture the full extent of the words that are used in the real world due to out-of-vocabulary (OOV) words. The inability to identify such words is one of the main limitations of word-level models. One approach of overcoming OOV is through the use of character-level embeddings as they can effectively learn the vectors of word parts or character n-grams. This study uses a combination of character-level and word-level models to identify the most effective methods by which affective Arabic words in tweets can be represented semantically and morphologically. We evaluate our generated models and the proposed method by integrating them in a supervised learning framework that was used for a range of affect tasks and other related tasks. Our findings reveal that the developed models surpassed the performance of state-of-the-art Arabic pre-trained word embeddings over eight datasets. In addition, our models enhance previous state-of-the-art outcomes on tasks involving Arabic emotion intensity, outperforming the top-systems that used advanced ensemble learning models and several additional features.
Original language | English |
---|---|
Article number | 101973 |
Number of pages | 14 |
Journal | Data and Knowledge Engineering |
Volume | 138 |
Early online date | 6 Jan 2022 |
DOIs | |
Publication status | Published - Mar 2022 |
Bibliographical note
Publisher Copyright:© 2022 Elsevier B.V.
Keywords
- Affect tasks
- Arabic tweets
- Character-level embeddings
- Word-level embeddings
ASJC Scopus subject areas
- Information Systems and Management