Loukissas, Yanni Alexander. All data are local. Thinking critically in a data-driven society. Cambridge, MA: MIT Press, 2019. xix, 245 p. ISBN 978-0-262-03966-6. $30.00/£24.00.
Yanni Alexander Loukissas brings an interesting set of qualifications and experience to the writing of this book: he has a bachelor's degree in architecture from Cornell and Master's and Doctoral degrees in design and computation from MIT, and he records in the Preface that the ideas behind this book emerged while he was working as a designer of the way-finding systems for the American Wing of the Metropolitan Museum of Art in New York. One of the most telling points that he recounts is that the museum guards (who were not regarded as part of the information structure of the Museum by the curators) proved to be the most helpful in understanding the behaviour of visitors to the Museum:
the guards knew better than anyone what visitors do: how they move, where they go, and even why they get lost. The guards proved to be among the best sources of insight about the potential contexts of data use within the American Wing galleries. (p. xiv)
The first question that occurred to me, and I imagine it would occur to many, is: What do you mean, all data are local? The local is usually taken to mean the immediately relevant space for our actions. I live at a particular local address and the locality is quite tightly bounded. On the other hand, I also live in a particular city, which is a wider locality, but, nevertheless, local to me. The author, however, uses local in a similar sense, that is, 'rooted in practices and policies related to their time and place' (p. 19). However, the discussion of local or locality is rather diffuse, since the local can have global import and impact. Local news (for example, of school massacres in a US state) may resonate in different ways in other localities, on the one hand creating impressions of daily life in the USA, and on the other motivating political action on gun control in a completely different society.
Following the introductory chapter, the main thrust of the book is conveyed through four case studies. The first is of plant data in the Arnold Arboretum, which is maintained in Boston by Harvard University. The author uses accession record data and the associated metadata to explore the concept of place, employing novel visualizations of the accession data to show the many different associations of place in the Arboretum's collection. The second case study is of the Digital Public Library of America (DPLA). The DPLA is constructed from contributions from across the USA and Loukissas uses visualization again to show the various date formats that needed to be normalised for the DPLA and also the interrelations among the data fields in the DPLA records. The third case study explores the relationship between algorithms and data, through a word-frequency analysis of the words immediately before the word 'election' in news items from CNN, the Wall Street Journal, and the Breitbart News Network. Thus, the first five words by frequency are, a, an, different, general, and important. Some of the limitations are discussed, as well as the interesting facts that emerge from a detailed analysis of the data; for example, the transcription problem can generate words that are not intended, thus 'do very well, come election day', becomes 'do very welcome election day' - clearly nonsensical but resulting in 'welcome' appearing in the word list. As the author says, data and algorithm: 'function symbiotically in contingent local conditions that are both materially and historically grounded. (p. 121). The final case study uses the data from the Zillow real-estate Website and the author's intention is to discuss the role of the interface to data; however, the case is just as interesting from the perspective of the relationship between algorithm and data. A feature of the site is the Zestimate, that is, an estimate of the value of the property, arrived at by algorithmic calculation from publicly available data. The question, of course, is how these estimates affect negotiations: do they favour the buyer or the seller? In common with most such situations, my guess is that they favour the seller, since the site derives its income from the real-estate agencies that subscribe to its services.
The general principle derived from the cases is stated in Chapter six: 'all our amassed records are no more than indexes to local knowledge', and from that principle six implications are derived, each associated with a lesson, for example:
Interfaces recontextualize data. Create interfaces that cause friction.
These lessons are then further illustrated in the chapter.
The final chapter is very brief, consisting of a little more than eight pages devoted to the idea of creating guides to open data, since, as the author notes:
what good are open data if they cannot be understood by outside audiences? (p. 192)
This is a very interesting and, I think, an important book, which everyone involved in data science should read. It is not a text book, but certainly could be an item of additional reading in any data science course. I was a little surprised that their was no reference to Jim Dolby's The Language of Data project from the 1980s (Dolby, Clark and Rogers, 1986), which, among other things, proposed the use of faceted classification to describe the data settings, with which Loukissas is concerned in this text. Sadly, Dolby died before much of the work of that project could be disseminated and this may account for the difficulty in finding relevant information on the project.
The book is beautifully produced, with many, appropriate, colour illustrations, but, unfortunately, using a sans-serif font on glossy paper, which is almost tantamount to posting a "Don't read me" notice, since it generates immediate eye-strain. It's rather odd that an otherwise excellent publisher, like MIT Press, should choose a sans-serif font for any of its books: it seems that scientific=sans serif trope can permeate even the best of environments. The use of semi-gloss paper also makes the book very heavy and a somewhat uncomfortable reading experience.
Dolby, J.L., Clark, N. & Rogers, W.H. (1986). The language of data: a general theory of data. In Proceedings of the 18th Symposium on the Interface Between Statistics and Computing (pp. 96-193). Alexandria, VA: American Statistical Association.
How to cite this review
Wilson, T.D. (2019). Review of: Loukissas, Yanni Alexander. All data are local. Thinking critically in a data-driven society. Cambridge, MA: MIT Press, 2019. Information Research, 24(2), review no. R663 [Retrieved from http://www.informationr.net/ir/reviews/revs663.html]
Information Research is published four times a year by the University of Borås, Allégatan 1, 501 90 Borås, Sweden.