Words like justice and equality are so often used disparagingly on Twitter that it impacts the ability of social media tools to assess whether a tweet is being positive or negative about its subject, according to a recent study from Cambridge Judge Business School.
‘Sentiment analysis’ that gauges sentiment based on the quantity and emotional context of word use is a handy tool to measure the tone of online discourse through social media.
But a new study at Cambridge Judge Business School finds that politics can create discrepancy in such analyses – because even positive political terms like ‘independence’ and ‘ethics’ can take on negative sentiment given how they are used in Twitter posts.
Study is based on 5 million tweets by people aged 18 to 78
The study published in the journal PLOS ONE (Public Library of Science One) is based on nearly 5 million tweets by 3,573 people aged 18 to 78. It draws a link between age and whether people express positive or negative sentiment in their tweets – but finds that politically tinged terms can lead to incorrect conclusions.
Words associated with politics, like ballot, Cabinet and President, can skew the findings from sentiment analysis because older people are more likely to tweet on political subjects – and their tweets as reflected in such analysis seem less positive than are the specific words used.
Positive words often take on negative tones on Twitter
“Even words connected to politics that seem to represent positive values – such as justice, democracy and equality – can actually result in negative sentiment because of the way those words are often used in tweets,” says study co-author David Stillwell, Professor of Computational Social Science at Cambridge Judge Business School and Academic Director of the school’s Psychometrics Centre, where the study was conducted.
“Rather than expressing positive values, people often use these words in tweets to criticise a political situation or express concern with values such as justice and democracy.”
The study examines the difference in age-related results between 2 commonly used lexicons to do sentiment analysis: the Linguistic Inquiry and Word Count (LIWC) and NRC Word-Emotion Association Lexicon (NRC).
While both methods show an increase in positive affect of tweets until age 50, such positivity drops sharply after age 50 according to LIWC but increases steadily until age 65 based on NRC. The research found that this inconsistency was “mostly due to a particular class of words: those related to politics”.
How top methods for sentiment analysis differ substantially
The 2 lexicon systems studied have some important differences: LIWC was created by linguistics experts and contains terms that represent emotions (eg, love, cry). In contrast, NRC was created by aggregating ratings of associations between words and emotions (eg, homeless) from hundreds of non-experts.
LIWC provides around 1,300 terms signifying positive or negative affect while NRC provides nearly 5,600 terms associated with positive or negative affect.
Older people are more likely to discuss politics on Twitter
To pinpoint that political terms were causing the discrepancy based on age, the researchers identified four topics relevant to politics that correlated with age: politics in general (with top words such as war, world, and police); US politics (Trump, President, wall, and GOP); UK politics (Brexit, EU, Labour); and Indian politics (India, Modi, BJP, Congress).
“All of these topics were positively correlated with age, ie it seems that older people are more likely to discuss politics online, or on Twitter specifically,” the study says. Removing words identified with politics substantially decreased the NRC positive affect scores for older people, “making the model predictions from LIWC and NRC more consistent”.
Two sentiment analyses are better than one
The study thus concludes that using a single sentiment analysis lexicon might lead to unreliable conclusions, so it suggests that researchers should routinely use at least 2 lexicons in assessing positive and negative affect through such techniques.
The study in PLOS ONE – entitled “Two is Better than One: Using a Single Emotion Lexicon Can Lead to Unreliable Conclusions” – is co-authored by Gabriela Czarnek of the Institute of Psychology at Jagiellonian University in Krakow, Poland, who worked on the study while an Academic Visitor at the Psychometrics Centre at Cambridge Judge Business School, and David Stillwell, Professor of Computational Social Science at Cambridge Judge.