vol. 23 no. 2, June, 2018

Book Reviews

Kelleher, John D. and Tierney, Brendan. Data science Cambridge, MA: MIT Press, 2018. xiv, 264 p. (The MIT Press Essential Knowledge Series). ISBN 978-0-262-53543-4. £11.95/$15.95.

Data science is a relatively new term: when one searches for its occurrence in the titles of papers in Web of Science, it appears to be a 21st century formulation, taking off from about 2013. The start date and the volume of papers are rather uncertain, partly because the search process of Web of Science does not distinguish between a title like, "Data science: an action plan for expanding the technical areas of the field of statistics" and "Reanalysis of the data - science at its best and always informative". So, the number of papers is over-estimated.

Of course, the analysis of data is not a particularly modern phenomenon: as the authors of this text point out, writing appears to have been invented in about 3,200 BC specifically for record keeping, and the first national census was carried out in Egypt in 3,000 BC. Someone must have been analysing the data collected at these times, if only to ensure proper tax collection.

Methods of statistical analysis have also been developed from the 17th century onwards, but the driver for the emergence of data science appears to have been the massive increase in the amount of data now available for analysis—referred to as 'big data' (or even Big Data). At the same time, developments in data mining and machine learning have led to new ways of dealing with big data and discovering patterns within the data that would not be easily discovered by humans.

The authors of this excellent introductory text define data science as encompassing,

a set of principles, problem definitions, algorithms, and processes for extracting non-obvious and useful patterns from large data sets... Data science... also takes up other challenges such as the capturing, cleaning, and transforming of unstructured social media and web data; the use of big data technologies to store and process big, unstructured data sets; and questions related to data ethics and regulation.

They then proceed to deal with all of these elements, with chapters focusing on the nature of data and data sets; machine learning; the technologies of data science, with particular attention to the open source, Hadoop, data processing system; standard data science tasks, such as clustering, to identify, for example, categories of customers; and privacy and ethics.

The text is accessibly written and could constitute the basis for an introductory course on data science, as well as being an excellent introduction for the lay person interested in current developments in computing.

Professor T.D. Wilson
May 2018

How to cite this review

Wilson, T.D. (2018). Review of: Kelleher, John D. and Tierney, Brendan. Data science. Cambridge, MA: MIT Press, 2018. Information Research, 23(2), review no. R630 [Retrieved from http://informationr.net/ir/reviews/revs630.html]

Information Research is published four times a year by the University of Borås, Allégatan 1, 501 90 Borås, Sweden.