Big Data and Privacy: Some Historical Perspectives

Author: Andrew Prescott (University of Glasgow)

Anxieties about the quantities of information gathered by governments and other organisations go back hundreds of years. Contemporary chronicles expressed shock at the quantity of information gathered in 1086 by commissioners of William the Conqueror which was summarised in Domesday Book. Domesday Book is the oldest English public record and ever since its compilation its data was regarded as one of the most precious possessions of the English state. With industrialisation and the growth of population, governments became increasingly interested in collecting information about the rapidly changing state of the nation. The changing format and character of censuses from 1801 onwards reflects many of these anxieties. The ability of nineteenth‐century governments to analyse and deploy this data was limited by the need for clerical sorting, but the introduction of punch cards sorted by electro‐mechanical machines for the American census in 1890 not only allowed governments to make more effective use of census data but also raised new issues about privacy. As Jon Agar has described, the availability of analogue computing technologies led to arguments in the first half of the twentieth century that the British government should create a vast national index, but these proposals were rejected on grounds of the defence of individual liberty.

In this context, how far are our current concerns about big data justifiable? Is our situation no different than the chroniclers who complained about William the Conqueror? I believe there are some fundamental changes which we need to address. First is the ubiquity of data.

For governments from the eleventh to the twentieth century, data was something gathered with enormous clerical and administrative effort which had to be carefully curated and safeguarded. Only large organisations such as governments or railroad companies had the resources to process this precious data. Privacy was therefore something which could be easily safeguarded by wider constitutional and legislative frameworks. As we have reached the point in the past few years where data is everywhere, this framework of trust no longer potential applies. As a result of this, the types of organisations deploying data have changed. In particular, it is noticeable that the driving forces behind the development of big data methods have frequently been commercial and retail organisations: not only Google and Amazon, but also large insurance, financial and healthcare corporations. This is a contrast to earlier developments, both analogue and digital, where governments have been prominent.

The Oxford English Dictionary draws a distinction between the term big data as applied to the size of datasets and big data referring to particular computational methods, most notably predictive analytics. Predictive analytics poses very powerful social and cultural challenges, especially as more and more personal data such as whole genome sequences becomes cheaper and more widely available. How far can your body be covered by existing concepts of privacy? And is the likely future path of your health, career and life a matter of purely personal concern? While the rise of predictive analytics represents the most powerful intellectual challenge of big data, it is nevertheless worth comparing the potential offered by predictive analytics to discussions in the 1920s and 1930s about the linking of card indexes to create large national indexes. It appears that political culture at that time was more robustly in favour of defending individual liberty, whereas this is not now so evident. Finally, it is worth noting that generally the most important large data sets (censuses, tax records) have been about people, but increasingly big data will become about things. For example, machine tools frequently have sensors attached to them which enable the state of the tools to be monitored remotely by the manufacturer. This might encourage the manufacturer to monitor use of their products by clients in ways that could have commercial implications. The monitoring of medical implants will raise even more complex issues. This is an area for which there is very little historical precedent, as a notorious case in the US Supreme Court which debated how the US Constitution would view GIS monitoring devices illustrates. Our construct of privacy is one that focuses on the human element, but perhaps we need to bring technical thinking more strongly into the discussion.

