The authors studied the impact of copyright on data mining by academic researchers. Proponents of strong copyright protection, such as many academic publishers, argue that additional income to right holders will foster the supply of suitable data. By contrast, many universities, libraries and research organizations argue that any positive effect of strong copyright on the supply of data is dwarfed by the costs of clearing rights. Furthermore, it is often argued that much relevant data has been produced with public funding and should thus be widely available for further use.
Under current copyright law in most EU Member States, data mining of academic publications or databases is only lawful with the specific consent of all rights holders. This also holds where researchers have lawful access, say when their university has a subscription to the relevant publications and databases. In data mining projects drawing on several sources, researchers may thus need to complete separate negotiations with many rights holders to avoid legal risks. More permissive rules apply in most territories outside of the EU, including the USA. In 2013, the UK adopted a copyright exception, allowing for non-commercial data mining without prior authorization of rights holders. It is unclear whether other EU Member States should follow suit.
So far, little empirical evidence has informed the debate on copyright and data mining. The authors used the extensive database “Web of Science” to collect data on the number of research articles on data mining in the most reputable academic journals. They covered all articles from researchers in 42 large economies between 1992, when the first academic publication on this topic appeared, and 2014. The panel includes the 15 largest national economies within the EU. Altogether, 15,000 articles addressed data mining, and the annual number has been rising steeply.
The authors classified countries according to whether data mining is allowed without express consent by rights holders or not. In many territories, legal arrangements are not straightforward, which we capture in two intermittent sub-categories. The authors then checked for significant differences in the share of data mining publications in published papers between categories of countries. Among other factors, the paper controls for countries’ total research output and thus the resources available and productivity of academic research, as well as the rule of law.
The main result is that researchers in EU Member States with strong copyright protection publish significantly fewer articles on data mining. The specific copyright system within the EU appears to inhibit data mining by academic researchers. In particular researchers in major Asian economies apply data mining much more than researchers in large European economies such as France or Germany.
Furthermore, copyright law should have stronger effects where it is associated with effective enforcement. Among EU Member States with strong copyright law, we find that countries with weaker rule of law (e.g. Portugal, Greece and Spain) publish significantly more data mining articles as a proportion of their total research output than countries with stronger rule of law (e.g. Germany, the Netherlands and Sweden).
These results suggest that stronger copyright protection in the EU hampers data mining. Ideally, copyright would foster innovation and creativity. Instead, it seems that in major European economies a combination of strong legal provisions with a strong rule of law has the opposite effect.
Policy-makers should consider the adverse consequences of the current copyright system on the adoption of data mining. One alternative favoured by academic publishers is a clearing house of relevant copyrights to reduce the transaction costs. This option is as yet untested. Other options are exemptions or limitations to copyright for some types of research, similar to the situation in the USA or the UK.
The authors see the debate on data mining and academic research as an early skirmish in a broader battle around data mining. The digital shift generates an unprecedented abundance of data. Adequate analysis of this data will help many social and economic agents to become more productive. Who has the right to access and analyse this data is of great importance. Empirical evidence may also help develop an efficient policy framework in this new context.
Lucie Guibault is Associate Professor at the Institute for Information Law at the University of Amsterdam
Joan-Josep Vallbé is Assistant Professor of Political Science, University of Barcelona
For further information contact the study academic Christian Handke (email: firstname.lastname@example.org) or the CREATe PR team (email: email@example.com).
New research presented at the EPIP 2015 conference will be shared on social media using the hashtag #epip2015