Data from millions of Facebook users who used a popular personality app, including their answers to intimate questionnaires, was left exposed online for anyone to access, a New Scientist investigation has found.
Academics at the University of Cambridge distributed the data from the personality quiz app myPersonality to hundreds of researchers via a website with insufficient security provisions, which led to it being left vulnerable to access for four years. Gaining access illicitly was relatively easy.
The data was highly sensitive, revealing personal details of Facebook users, such as the results of psychological tests. It was meant to be stored and shared anonymously, however such poor precautions were taken that deanonymising would not be hard.
. . . .
Facebook suspended myPersonality from its platform on 7 April saying the app may have violated its policies due to the language used in the app and on its website to describe how data is shared.
More than 6 million people completed the tests on the myPersonality app and nearly half agreed to share data from their Facebook profiles with the project. All of this data was then scooped up and the names removed before it was put on a website to share with other researchers. The terms allow the myPersonality team to use and distribute the data “in an anonymous manner such that the information cannot be traced back to the individual user”.
To get access to the full data set people had to register as a collaborator to the project. More than 280 people from nearly 150 institutions did this, including researchers at universities and at companies like Facebook, Google, Microsoft and Yahoo.
. . . .
For the last four years, a working username and password has been available online that could be found from a single web search. Anyone who wanted access to the data set could have found the key to download it in less than a minute.
The publicly available username and password were sitting on the code-sharing website GitHub. They had been passed from a university lecturer to some students for a course project on creating a tool for processing Facebook data
The credentials gave access to the “Big Five” personality scores of 3.1 million users. These scores are used in psychology to assess people’s characteristics, such as conscientiousness, agreeableness and neuroticism. The credentials also allowed access to 22 million status updates from over 150,000 users, alongside details such as age, gender and relationship status from 4.3 million people.. . . .
Each user in the data set was given a unique ID, which tied together data such as their age, gender, location, status updates, results on the personality quiz and more. With that much information, de-anonymising the data can be done very easily. “You could re-identify someone online from a status update, gender and date,” says Dixon.