by Dr David Stillwell, University Lecturer in Big Data Analytics & Quantitative Social Science, Cambridge Judge Business School
Privacy campaigners this week applauded Facebook’s decision to block big UK insurance firm Admiral from using young people’s social media data to help set their car insurance premiums. But this is just the start of a debate over the use of social media information for such purposes. Setting aside privacy issues for a moment, there is a very valid social reason for doing this. In fact, it could benefit countless numbers of people.
Whatever side you take on this issue, it’s important to understand the science behind Admiral’s plan and behind similar plans sure to come from companies large and small. Indeed, my research suggests that using social media data to make such predictions could be very accurate.
In 2015, the average Facebook user had liked 225 things, from films to politicians, as well as statements such as “I like stepping on crunchy leaves”.
My colleagues and I collected data from six million Facebook users through an opt-in survey that measured their personality and gave them feedback on their results. We then measured how well their Facebook activity could predict their personality using a number between 0 and 1. The higher the number, the stronger the correlation.
When we used 60,000 users’ “likes” to predict their self-reported psychological traits, we found that the correlation between “likes” and personality was 0.56. To put that in perspective, if you ask someone’s work colleague to predict their personality the accuracy is 0.27, friends can predict at 0.45, family at 0.50 and even someone’s spouse can only predict at 0.58. In other words, the computer knows you almost as well as your husband or wife – and better than almost everyone else.
“Sensation seekers” (extroverts who look for new, varied, and risky experiences) are a poor car insurance risk. On Facebook, these are the people who like “white water rafting” and “bungee jumping”, and use phrases such as “chillin”, “great night” and, bizarrely, “soooooooo”.
We can be extra confident in such a system because online data is surprisingly difficult to fake. Everything that happens on Facebook is timestamped, so if on the day before you apply for car insurance you suddenly like “chess” and “reading” (predictors of introversion) – after years of talking about parties and drinking – the system can easily pick that up.
For the record, I think it’s a shame that, in this case, social media data has been barred from a use that could benefit young people and society. Most young people take driving seriously and many never make an insurance claim, but they have no way of distinguishing themselves from the minority whose thrill-seeking and expensive crashes increase the premiums for everyone.
Older drivers have had time to build up their no-claims bonuses. But new drivers all look the same through the lens of the traditional demographic and geographical data used to set premiums. Young people whose social media data indicates they are mature and self-controlled could have had the opportunity to prove they are worthy of a £150 discount. This would have been a nice saving given that the cheapest comprehensive insurance cover for 17-22 year olds in the UK costs £1,287 per year.
There are plenty of ways our social media data could be used both for and against us, and that’s why we’ll see many more battles such as this one. The Admiral case could well be remembered as just the beginning of a tortuous back and forth over using digital footprints in financial modelling. Other social networks, mobile phones, store loyalty cards and the billions of sensors that are forming the so-called Internet of Things all collect data that can predict psychological traits.
There will be plenty of close calls going forward as we debate these issues of social usefulness versus privacy, but in my view this wasn’t one of them. As long as companies use our data transparently and with our consent, why not allow both parties to an insurance transaction to rely on what appears to be very accurate data?
This article was originally published in The Conversation.