Big Questions about Big Data: Facebook, Cambridge Analytica and Wider Questions for Christians

On 10 April, Facebook CEO Mark Zuckerberg testified before the United States congress in the wake of a huge scandal in which the personal data of some 87 million Facebook users had been harvested by Cambridge Analytica and allegedly used to influence elections.

The five hour interrogation quickly melted into a remedial class on Social Media as Facebook’s founder spent most of his time walking senators through Facebook’s terms of service, how online advertising works and how the public interact with the platform that Facebook provides. Anyone hoping that Facebook would be subjected to intense scrutiny was seriously disappointed as Zuckerberg encountered the most powder-puff of questioning.

Amongst the beauties Zuckerberg had to answer, one Senator asked “How do you sustain a business model in which users don’t pay for your service?” To which Facebook’s CEO simply answered: “Senator, we run adverts”. Senator Lindsey Graham asked who Facebook’s biggest competitors were. Zuckerberg mumbled names like Google, Apple, Amazon and Microsoft. They aren’t really competitors. It doesn’t take a very deep understanding of Facebook to know that those other companies may be competitors as major tech giants, but they all serve very different functions. Looked at in terms of function, Facebook’s competitors are LinkedIn (for professional networks), Twitter and Instagram (which, incidentally, Facebook purchased in 2012).

Senator Brian Schatz asked “If I’m emailing within WhatsApp, does that ever inform your advertisers?” Where to start with this one. Not only do you not send emails over WhatsApp, it’s also well-known for being end-to-end encrypted (E2EE). The main problem with the Senator’s daft question, however, is that it came pretty close to being a good one. Facebook can’t use encrypted content to serve you ads, but Facebook does own WhatsApp and they do use WhatsApp “account information” to serve ads. European data protection law forced that practice to stop in 2016, but in other parts of the world (notably the USA) it continues and it has never been very clear as to what “account information” data in WhatsApp is being shared.

At the centre of the current scandal lies the story that in 2014 a Cambridge academic developed an app called ‘thisismydigitallife’. The app paid users a small amount of money to take a personality quiz, but to take that quiz, you had to give the app access to your Facebook profile. Based on your Facebook likes and other data for which the app was granted access, the app attempted to predict your personality, based on the widely used OCEAN scale.

Around 270,000 people did some version of this quiz. However, when those users gave 'thisismydigitallife' access to their accounts, the way that the app was built via Facebook’s API meant users also effectively granted the app access to the public profile information of all their Facebook friends. If their friends' Likes, relationship status, location and other information were set to “viewable by anyone”, those users’ information was fair game to be harvested even though they had never made use of the app or taken the quiz.

Given that the average user has around 300 Facebook friends, 270,000 ‘quizzers’ quickly gave the app makers access to a dataset involving millions. It was subsequently confirmed on the 4th of April that 87 million people had been affected.

It was controversial at the time the app was built in 2014. Facebook hadn’t anticipated people using the API in this way and an update around the end of 2014 into 2015 removed the ability to get at your friends’ information. Where it started to get very messy subsequently is that the academic allegedly violated Facebook’s API’s terms of service by selling that dataset of 87 million people to a political firm called Cambridge Analytica.

When Facebook found out about the data breach, the researcher, Cambridge Analytica and a former employee all certified to Facebook that the data had been deleted. It seems that the data was not deleted. Instead, and to make muddy waters even more murky, Cambridge Analytica matched that dataset with other available records to construct personality profiles on millions of American voters. Allegedly, this refined dataset was then teamed with CA’s ‘expertise’ to help Donald Trump win the Presidential Election in 2016. There may also have been ties to the UK’s Leave.EU campaign; one of the groups who campaigned in favour of Brexit.

As a Christian, it’s not hard to find ethical concerns quickly flag in our minds about privacy violations, about deceit and duplicity in business, and about the possible manipulation of election results. Investigations are ongoing in this country and the USA and hopefully “unfruitful works of darkness” will in due time be brought into the light. (Eph 5.11-13)

But depressing as it may be to admit - the case of Cambridge Analytica ought to be the least of our worries.

Let’s put it this way. Facebook has access to all that information Cambridge Analytica purchased, and then they also have much, much more. 87 million people were affected by this data breach. Facebook hold accounts for some two billion people. Facebook’s data resource is also much deeper and richer, always updating in real-time; much more powerful than a one-off snap-shot quiz from 2014.

Perhaps one of the most telling comments about this scandal came in Cambridge Analytica’s statement on 2 May that they were shutting down. They wrote:

Over the past several months, Cambridge Analytica has been the subject of numerous unfounded accusations and, despite the Company’s efforts to correct the record, has been vilified for activities that are not only legal, but also widely accepted as a standard component of online advertising in both the political and commercial arenas. (my emphasis)

Cambridge Analytica claim that what they did was not only legal, but standard practice in the world of online advertising. So while CA may be the big headline and the public may be tempted to think that their demise means justice has been served and the whole scandal is over, the reality is far more complex. Every day our personal information is being bought, bundled up, broken down, sold and resold. In this context, a dataset with 87 million entries is a drop in a vast data ocean. “Big Data” deals with millions, even billions of records at a time and the reality is that “Big Data” continues and the questions that surround it have not gone away.

I wish I had some answers. Inevitably in any fast-moving and relatively new technology, questions come easy and I’m very definitely not the first or only one to be asking them. It may take some time and greater minds than mine to find good answers. But that doesn’t mean the questions are any the less important. Perhaps in such circumstances the questions become even more vital; treasuring them on a cold, dark morning like the first hints of the sunrise on the horizon, pointing us to what we really need in the warmth of a summer’s day.

What is real?

To accurately assess the behaviour of someone online, we have to make a couple of assumptions. First, we have to assume that they are real people. Second, we have to assume they are behaving in a genuine and consistent manner. Whether in politics, business, or even the church, if one side of a debate is collecting large amounts of data in an attempt to target their message, what’s to stop the other side shifting their efforts into a disruption of the opposition’s calculations by creating fake accounts and misleading information? One camp can fund messages from fake profiles, not only to create alternative narratives and mislead the public, but with the added bonus of making their opponents' dataset far less accurate and useful. It becomes an arms race in confusion and disinformation with the only winners being the companies taking money from both sides as they fund the (very profitable) advertising.

For the public, it becomes very difficult to assess what is true and real. The 2016 US election had the dubious honour of giving the world a new phrase - “fake news”. And while misinformation, spin, lies and deceit may have been around since forever, our new context is a dastardly marriage of dataset algorithms, advertising systems and people prepared to simply make stuff up if it will result in easy cash. From Pope Francis endorsing Donald Trump to FBI agents involved in the Hillary Clinton email investigation being murdered, we've seen far too many stories in recent years all utterly faked, and far too many people fooled.

Edward Bernays, a pioneer of public relations, wrote in his book Propaganda in 1928:

We are governed, our minds are molded, our tastes formed, our ideas suggested, largely by men we have never heard of. ¹

A criminal Hillary Clinton, £350 million extra per week to spend on the NHS – all of this and more was put to us as fact and possibility. In reality, it was the creation of modern day political digi-gurus some of whom were allegedly involved in the illegal collection of data, disinformation, bribery and entrapment. It’s no wonder that people are being made cynical over the nature of truth.

The really worrying thing is that for many people there is no distinction between what they might read on Facebook or the the BBC News website, read in the Daily Mail or (dare I say) hear from the pulpit. The assumption is becoming that everyone has a hidden agenda, everyone is in it for themselves. So why should I listen to what you have to say?

The myth of impartiality

Into such cynicism, the lure of impartial, verifiable data becomes tantalising. The modern world doesn’t want to take things on faith or trust my opinion just because I’m Prime Minister, or a Professor with years of experience in a subject of expertise, or a Vicar. The post-modern suspicion of institutions has only grown and morphed into distrust of all experts. But because people ultimately need answers and truth they can trust, they look elsewhere; and there is a growing assumption that what you can trust in our modern world is scientific experiment, verifiable proof, facts... and somewhere into that mix comes data.

But the great secret when it comes to data is that nothing is impartial. Just as the secular mindset is as much about a context and a point-of-view as a religious one (despite their protestations in the public square to the contrary), so are datasets far from neutral.

Extremely large datasets are not simply much bigger than smaller datasets. Their scale brings about a step change. To analyse them and handle them, new practices are emerging. The capabilities of a “data scientist”, according to Mayer-Schönberger and Cukier must include “the skills of the statistician, software programmer, infographics designer, and storyteller”² (my emphasis). And like any "storyteller", it’s fair to ask what sort of assumptions or values are held by that scientist handling the data, or doing the analysis, or spinning us a story.

Any bias in the interpretation of large datasets can come about through the tools being used, but also within the analysts themselves. The tobacco industry was famous in a previous era for funding academic research that came to amazingly positive conclusions about smoking. But most such bias is usually far more subtle and hard to spot. Writers in data science are beginning to observe that individual data scientists can show bias, but also their context and pre-existing values, 'the air they breathe' will also result in bias.

Not only do we need to continue to challenge the myth of neutral, impartial positions when we debate from a Christian perspective in the public square, but in the specific question of the interpretation of Big Data, the Christian Church ought to have plenty to offer on a very different level. Fundamentally, it's all a question of exegesis and hermeneutics, and something in which we have 2,000 years of wisdom that could be brought to bear.³

How to protect the vulnerable through education

While there’s plenty of legitimate criticism to level at the tech companies, we have to acknowledge that most of the data these companies hold and use is given by users willingly in return for access to a good or service. It’s therefore legitimate to ask whether users really understand the terms of this ‘deal’, and when projects have been carried out to inform users, or scandals like Cambridge Analytica hit the headlines, it’s increasingly clear that users are not happy with the bargain.

Terms of service with complicated legal jargon are common place. We accept terms and conditions often without even reading them. I’ve done it myself countless times. The fact that no-one ever reads such documents was highlighted recently by a joke seen on a church noticeboard “Adam and Eve were only the first people not to read apple’s terms and conditions”.

In a world of free-to-user social media, if you’re not paying for it, you are not the customer; you are the product being sold. But with Terms of Service agreements as they are, and ignorance frequently running at senatorial levels, studies suggest that most people do not understand the trade-off or, at least, are ignorant of its implications.

Christians affirm God’s concern for the weakest and the vulnerable and, in this context, the lack of education and understanding in the wider population of the implications of our behaviour makes the vast majority of us, well, vulnerable. And like any subject or context where a lack of education or knowledge seeks to keep a person or group of people subservient, enslaved, or even oppressed, the Christian Church ought to be ready to speak out about what our collective lack of understanding is enabling tech companies to get away with, and perhaps advocating for what we should all be doing to get informed.

But there’s also a more ‘classic’ definition of vulnerability

In the USA in particular, Big Data has found widespread use in such fields as policing, assessing people’s suitability for a loan or a job, and targeted advertising aimed at everything from selling goods to university courses.

Cathy O’Neill has pointed out that such practices have a net effect of reinforcing social inequality. If a person’s credit score or postcode is deemed to make them unsuitable for a job, or as a justification for higher insurance premiums, the overall result is that they are denied opportunities open to others with ‘better data’. A key assumption in using datasets is that past behaviour will accurately predict the future. In the case of the poor and vulnerable, that means that if large datasets are used in such ways, as O’Neill says “the poor are expected to remain poor forever and are treated accordingly.”⁴

Ten or fifteen years ago, churches quickly recognised the danger of a disenfranchised underclass with no access to digital facilities and were at the forefront of providing Internet Cafés to help people gain access to the benefits of computers. Categorising and sorting people through the application of Big Data has the potential to take social inequity to an entirely new and deeply unwelcome place.

When will tech companies stop treating new media like a black art?

As a former new media professional, my sense of injustice reaches boiling point when I see churches and relatively cash-strapped church organisations being asked to pay fortunes for basic websites with limited functionality. The church’s ignorance of the processes involved and lack of understanding for good questions to ask has frequently meant that the Church (with a big C) has paid thousands of pounds more than they needed to for little more than glorified, digital brochures that should have cost a few hundred quid. New media is not that complicated but tech companies seem to have a vested interest in veiling what they really do, and keeping us from their equivalent of the Holy of Holies.

It’s just a relatively small example of a much bigger issue. Tech companies, Facebook included, are usually very quick to tout their acumen and incredible digital solutions to answer problems you didn’t know you had, except when it comes to resolving problems. When they want to fix something or make a problem go away, they are pretty good at doing so. But when they don’t feel like acting, all we hear about is how difficult it is to do such things with large datasets. Coding suddenly becomes some kind of deep magic.

Nevertheless, in the wake of Cambridge Analytica and, most likely, the current court proceedings involving Martin Lewis of moneysavingexpert.com, such excuses are increasingly being confronted. And when such excuses are scrutinised, they are looking increasingly thin.

When the tech companies remove that veil of something akin to witchcraft and treat users as mature human beings who understand what they do, meaningful dialogue, solutions and legislation that works for both users and tech companies alike is far more likely.

Conclusions

Big Data isn’t going away. On a day-by-day, if not hour-by-hour basis, datasets in social media and many other tech-based aspects of modern life are growing at exponential rates. Big Data isn’t necessarily a problem either. There are many, many positive things that have been enabled and made possible by Big Data and it has great possibilities for the future.

Many of the problems I’ve highlighted are complex and may not be very easy to address. I’m well aware that I can easily pose questions and not so easily find answers. But I think the Christian Church does have an important part to play, not just within the church but beyond, in addressing the questions, offering critiques and bringing wisdom. There could be enormous potential in dialogue between data scientists and theologians, and anything we can do to help enable and encourage that discussion ought to be welcome.

^{1. Bernays, E. (1928), Propaganda, London: Routledge ↩}
^{2. Mayer-Schönberger, V. & Cukier K (2013), Big Data: A Revolution That Will Transform How We Live, Work and Think, London: John Murray, p. 125 ↩}
^{3. Fuller, M. (2017), Big Data, Ethics and Religion: New Questions from a New Science, Religions [online], 2017, (8,88). Available at http://www.mdpi.com/2077-1444/8/5/88/pdf [accessed 05 May 2018] ↩}
^{4. O’Neil, C. (2016) Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, London: Allen Lane, p.155 ↩}

David Green

David Green is the Vicar of West Malling and Offham in the Diocese of Rochester. Prior to ordination, he was the New Media Manager at Church House Publishing and oversaw their digital output between 1999 and 2006 when Common Worship and Crockford first went online and Visual Liturgy 4 was released. He has written on the influence of projection technology on worship and continues to keep an eye on new media and the church. You can find him on Twitter at @RevDavidGreen.