Book review: Big data is not a monolith

Sugimoto, Cassidy R., Ekbia, Hamid R. & Mattioli, Michael, (Eds.). Big data is not a monolith Cambridge, MA: MIT Press, 2016. xxiv, 284 p. ISBN 978-0-262-52948-8. Pback. £24.95/$30.00; Hback. £49.95/$60.00

I have grown to be rather suspicious of collections of chapters over recent years; very often they appear to have been assembled with no very clear guiding principle and the variation in quality from chapter to chapter has suggested little in the way of editorial control. I'm very happy to say that this is NOT true of this particular collection. It has obviously been guided by a very clear understanding of what the end result should be and the papers complement each other very effectively.

This is all the more commendable when one realises what a rich diversity of backgrounds the editors and the contributors bring to the venture. Sugimota is a bibliometrician teaching and researching in Indiana University's School of Informatics and Computing, Ekbia is a professor of informatics in the same School, and Mattioli is a professor of law at Indiana. The contributors show even more diversity: there are, for example, computer scientists, lawyers, information systems specialists, media scholars, economists, and data scientists. Most are academics, but not all, for example, Mike Bailey is a research economist at Facebook and Kent Anderson is a publisher. Nor are all of professorial standing, there is a clutch of Ph.D. students among the collaborators, which is a good way of ensuring that the very latest research is being drawn upon. In other words, this is a carefully thought-out and well-executed work.

The fourteen chapters (supplemented by an introduction and conclusion by the editors) are divided into four sections: Big data and individuals, Big data and society, Big data and science and Big data and organizations. Reviewing all fourteen chapters is not possible, without writing a very long review, so, as usual, I shall dip into the collection here and there, commenting on those chapters that caught my attention.

The scope, as may be indicated by the diversity of contributors, is wide, offering, as the Series Editor says in her Introduction: 'a wide lens across a technical, economic, social, legal and policy field that is still becoming shaped, or shaping itself'. (p. vii)

Fred Cate's chapter, the first in the first section, is concerned with issues of consent and data protection in the age of 'big data', defined by him as: 'data sets that are not only large, but increasingly complex and granular as well...' (p. 3). As an example, he cites the Acxiom Corporation, a marketing services company, which, 'engages in fifty trillion data transactions a year, almost none of which involve collecting data directly from individuals' (p. 3). However, even though data may not be collected directly from individuals, it very often includes data relating to individuals and the problem for regulatory agencies is how to ensure that people know that data is being collected, and that the data will be protected from misuse, either by those collecting the data, or by others obtaining access. It is issues of this kind that have driven concerns in the UK about the data-collection activities of the National Health Service, or, more problematical, agencies commissioned by the Health Service to undertake data collection tasks.

The issue of consent is even more problematical, as the author notes. The sheer volume of activity and the diversity and extra-territorial nature of many of the collectors, is such as to prevent any attempt at seeking the consent of people. The way things are developing, that would soon require corporations, market research organizations and governments to seek to consent of most of the world's population. Cate concludes, therefore, that protection should rely, 'less on individual consent, and more on placing responsibility for data stewardship and liability for reasonably foreseeable harm on data users' (p. 19). Perhaps I'm too cynical, but I can't be very sanguine about that.

In the section on Big data and society, Paul Ohm and Scott Peppet consider the consequences of any piece of data about an individual leading to all other data regarding that individual in, What happens if everything reveals everything? . Clearly, the consequences of this for personal privacy would be profound, if, for example, retailers such as Amazon not only had access to the data in their own systems on your buying choices, but also had access to your personal data and health records, they could target you with offers on, for example, disability aids or vitamin supplements or whatever. The consequences of this for privacy and data protection legislation would be profound. The authors suggest that, realistically, this situation is not true, for the present. However, as data proliferates and as data analytics develops, the probability that it may become true is increasingly likely. To ensure that the consequences of 'big data' are beneficent is going to be a difficult task for law-makers and regulators.

In Part 3, Big data and science, the chapters cover the role of the state in relation to genomic data; ensuring the trustworthiness of data; anticipating the unintended consequences of big data (which includes ignoring the usefulness of small data-sets); and The data gold rush in higher education. Of course, it is the last of these that I choose to examine in more detail.

The title of the chapter does not seem to be entirely appropriate, since it is not about the generation and use of big data in higher education. Rather, it is concerned with the potential shortfall in the education of data specialists and with, 'the emergent landscape of eduction in this field as institutions around the world race to get ahead of the forecasted shortfall'. In spite of the mention of 'institutions around the world', the chapter focuses on US institutions, but even so, any educator in this area will find the analysis of educational opportunities and alternatives of interest.

Part 4, Big data and organizations, covers a range of issues, from corporate responsibility for data, through the role of big data in decision making, and big data in medicine, to its use in artificial intelligence. The author, Ryan Abbott, explores a hypothetical case of drug development, but users of Google Translate may have already experienced some of the results of combining big data and AI. It was recently announced that major improvements had been made in Google Translate by dropping the previous development strategy and moving to a new, AI-based system, which, of course, relies upon the enormous amount of data Google Translate now has available.

While this is an interesting collection of chapters, they all suffer from the same problem—their authors rarely step outside the USA in their exploration of issues and problems. This is unfortunate, since many big data sets are international in scope, and the privacy and data protection laws of countries other than the USA are surely of interest. The fundamental and theoretical issues, of course, are not bounded by territory, but their more global illustration would have improved the value of the collection enormously.

Professor T.D. Wilson
Editor-in-Chief
February, 2017

How to cite this review

Gunnarsson, M. (2016). Review of: Sugimoto, Cassidy R., Ekbia, Hamid R. & Mattioli, Michael, (Eds.). Big data is not a monolith Cambridge, MA: MIT Press, 2016. Information Research, 22(1), review no. R595 [Retrieved from http://informationr.net/ir/reviews/revs595.html]

Information Research is published four times a year by the University of Borås, Allégatan 1, 501 90 Borås, Sweden.

Sugimoto, Cassidy R., Ekbia, Hamid R. & Mattioli, Michael, (Eds.). Big data is not a monolith Cambridge, MA: MIT Press, 2016. xxiv, 284 p. ISBN 978-0-262-52948-8. Pback. £24.95/$30.00; Hback. £49.95/$60.00

Professor T.D. Wilson Editor-in-Chief February, 2017

How to cite this review

Professor T.D. Wilson
Editor-in-Chief
February, 2017