Enhancing contextualised language models with static character and word embeddings for emotional intensity and sentiment strength detection in Arabic tweets

Abdullah I. Alharbi; Phillip Smith; Mark Lee

doi:10.1016/j.procs.2021.05.089

Enhancing contextualised language models with static character and word embeddings for emotional intensity and sentiment strength detection in Arabic tweets

Abdullah I. Alharbi^*, Phillip Smith, Mark Lee

^*Corresponding author for this work

Computer Science

Research output: Contribution to journal › Conference article › peer-review

137 Downloads (Pure)

Abstract

Many studies have focused on Arabic sentiment or emotion classification tasks. However, research on alternative aspects of affect, such as emotional intensity and sentiment strength tasks, has been somewhat limited. In this paper, we propose a method for enriching a contextualised language model that incorporates static character and word embeddings for emotional intensity and sentiment strength in Arabic tweets. We examine the assumption that models using static embeddings that are trained specifically on a corpus containing extensive Arabic affect-related words can boost the performance of language models. Through the development of character-level embeddings, we have found that our method is able to overcome the limitations associated with out-of-vocabulary words, which is a common problem when dealing with Arabic informal text. Given this, the method that we have developed achieves state-of-the-art results for the detection of the intensity of emotion and sentiment strength in Arabic social media.

Original language	English
Pages (from-to)	258-265
Number of pages	8
Journal	Procedia CIRP
Volume	189
DOIs	https://doi.org/10.1016/j.procs.2021.05.089
Publication status	Published - 14 Jul 2021
Event	5th International Conference on Artificial Intelligence in Computational Linguistics, ACLing 2021 - Virtual, Online, United Arab Emirates Duration: 4 Jun 2021 → 5 Jun 2021

Keywords

Arabic Emotional Intensity
Character
Contextualised Language Models
Word Embeddings

ASJC Scopus subject areas

Control and Systems Engineering
Industrial and Manufacturing Engineering

Access to Document

10.1016/j.procs.2021.05.089Licence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

AlharbiA2021EnhancingFinal published version, 402 KBLicence: Creative Commons: Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)

Cite this

@article{51f2698a465f49ae9dafe6fc2d456fcd,

title = "Enhancing contextualised language models with static character and word embeddings for emotional intensity and sentiment strength detection in Arabic tweets",

abstract = "Many studies have focused on Arabic sentiment or emotion classification tasks. However, research on alternative aspects of affect, such as emotional intensity and sentiment strength tasks, has been somewhat limited. In this paper, we propose a method for enriching a contextualised language model that incorporates static character and word embeddings for emotional intensity and sentiment strength in Arabic tweets. We examine the assumption that models using static embeddings that are trained specifically on a corpus containing extensive Arabic affect-related words can boost the performance of language models. Through the development of character-level embeddings, we have found that our method is able to overcome the limitations associated with out-of-vocabulary words, which is a common problem when dealing with Arabic informal text. Given this, the method that we have developed achieves state-of-the-art results for the detection of the intensity of emotion and sentiment strength in Arabic social media.",

keywords = "Arabic Emotional Intensity, Character, Contextualised Language Models, Word Embeddings",

author = "Alharbi, {Abdullah I.} and Phillip Smith and Mark Lee",

year = "2021",

month = jul,

day = "14",

doi = "10.1016/j.procs.2021.05.089",

language = "English",

volume = "189",

pages = "258--265",

journal = "Procedia CIRP",

issn = "2212-8271",

publisher = "Elsevier",

note = "5th International Conference on Artificial Intelligence in Computational Linguistics, ACLing 2021 ; Conference date: 04-06-2021 Through 05-06-2021",

}

TY - JOUR

T1 - Enhancing contextualised language models with static character and word embeddings for emotional intensity and sentiment strength detection in Arabic tweets

AU - Alharbi, Abdullah I.

AU - Smith, Phillip

AU - Lee, Mark

PY - 2021/7/14

Y1 - 2021/7/14

N2 - Many studies have focused on Arabic sentiment or emotion classification tasks. However, research on alternative aspects of affect, such as emotional intensity and sentiment strength tasks, has been somewhat limited. In this paper, we propose a method for enriching a contextualised language model that incorporates static character and word embeddings for emotional intensity and sentiment strength in Arabic tweets. We examine the assumption that models using static embeddings that are trained specifically on a corpus containing extensive Arabic affect-related words can boost the performance of language models. Through the development of character-level embeddings, we have found that our method is able to overcome the limitations associated with out-of-vocabulary words, which is a common problem when dealing with Arabic informal text. Given this, the method that we have developed achieves state-of-the-art results for the detection of the intensity of emotion and sentiment strength in Arabic social media.

AB - Many studies have focused on Arabic sentiment or emotion classification tasks. However, research on alternative aspects of affect, such as emotional intensity and sentiment strength tasks, has been somewhat limited. In this paper, we propose a method for enriching a contextualised language model that incorporates static character and word embeddings for emotional intensity and sentiment strength in Arabic tweets. We examine the assumption that models using static embeddings that are trained specifically on a corpus containing extensive Arabic affect-related words can boost the performance of language models. Through the development of character-level embeddings, we have found that our method is able to overcome the limitations associated with out-of-vocabulary words, which is a common problem when dealing with Arabic informal text. Given this, the method that we have developed achieves state-of-the-art results for the detection of the intensity of emotion and sentiment strength in Arabic social media.

KW - Arabic Emotional Intensity

KW - Character

KW - Contextualised Language Models

KW - Word Embeddings

UR - http://www.scopus.com/inward/record.url?scp=85113713798&partnerID=8YFLogxK

U2 - 10.1016/j.procs.2021.05.089

DO - 10.1016/j.procs.2021.05.089

M3 - Conference article

AN - SCOPUS:85113713798

SN - 2212-8271

VL - 189

SP - 258

EP - 265

JO - Procedia CIRP

JF - Procedia CIRP

T2 - 5th International Conference on Artificial Intelligence in Computational Linguistics, ACLing 2021

Y2 - 4 June 2021 through 5 June 2021

ER -