vol. 22 no. 4, December, 2017

Book Reviews

Holmes, Dawn E. Big data. A very short introduction Oxford: Oxford University Press, 2017. xvi, [4], 125, [7] p. ISBN 978-0-19-877957-5. £7.99 $11.95.

"Big data" is one of today's hot topics: put the term into Google and it comes back with seven and a half million hits, and, in Google Scholar, 233,000. Add business and Scholar returns 105,000; add health and it is 113,000, and with government, it is 66,700. Keep on additing application areas and you soon get a view of where the interest lies. But what is it? Well, there seems to be some confusion over definitions. The MIT Technology Review had an article on the subject, drawing attention to work by Ward and Barker (University of St. Andrews) which attempted to find a definiton by reviewing those found in the literature and proposed by some of the big technology companies. They came up with:

Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.

The author of this 'Short introduction' has a lengthier definition, or perhaps explanation would be a better term to apply to this quotation:

'Big data' is now used to refer not just to the total amount of data generated and stored electronically, but also to specific data sets that are large in both size and complexity, with which new algorithmic techniques are required in order to extract useful informaton from them.

This comes after an introduction to the subject involving the use of simple data analysis techniques by Athenian forces during the war with Sparta, to escape from the city of Platea. Holmes points out that archaeological finds have pushed back the date for the emergence of recorded counting of things to approximately 35,000 BC, setting the analysis of digital big data firmly in a historical context.

Following the introduction, there are chapters on storing and analysing 'big data', and then four chapters on areas in which 'big data' may be of significance. First, applications in healthcare are discussed, noting, for example that Google's much publicised flu prediction algorithm over-predicted the number of flu cases in the US in 2011/12 and 2012/13 and, similarly, the World Health Organization's attempt to predict the number of ebola virus cases in Africa also failed. In part, these failures can be attributed to the lack of relevant historical data against which to test algorithms and, in part, to the problem of gathering or analysing the data in appropriate ways. More success has been achieved with the Human Genome Project, and with the use of IBM's Watson in relation to medical research and health care. The author notes that, in the UK, the National Health Service aims to make patient data accessible by smartphone by 2018—given the Service's failure to deliver of previous IT projects, one may be inclined to take this suggestion less than serously.

Maintaining an historical perspective, Holmes takes the use of big data in business back to the use of the early computers by J. Lyons and Co., whose Corner House cafés disappeared, sadly, from the scene many years ago. The company was a pioneer, not only in using computers for business purposes, but in building and selling what were the first 'electronic office' computers. (That side of the basis was sold old to English Electric, later taken over by ICL, Ltd., and, ultimately, by Fujitsu.) Today, of course, companies like Google, Amazon, Netflix, Facebook and Twitter, generate terabytes of data and mine it for all kinds of purposes, from book recommendations, to targetted advertising, and tools for the analysis of 'big data' under their control will become more and more sophisticated as machine learning and AI develop new techniques.

The third of these four chapters deals with security issues and the Snowden and Wikileaks cases. This is, perhaps, the least successful of the chapters in the book, since very little is said about the techniques for avoiding security lapses, apart from the use of encryption.

Finally, the author looks at 'big data' and society, looking at concepts such as smart cars, smart homes and smart cities concluding that: 'Big data is power. Its potential for good is enormous. How we prevent its abuse is up to us.

This is a very useful introduction to the subject for the lay person: the specialist will probably not learn anyting new, but the book is not designed for the specialist. It would serve very well as an introductory text for general IT awareness coureses. My one criticism is of the series generally: the font-size is too small. I understand why, given the small size of the book (11 cms x 17.4 cms), but a short introduction does not need to be a small book and the ordinary paperback size would have allowed a larger font-size. The designer has done his or her best, providing considerably leading between the lines, but, unless you have 20/20 vision, reading is tiring.ß

Professor T.D. Wilson
Editor in Chief
December, 2017

How to cite this review

Wilson, T.D. (2017). Review of: Big data. A very short introduction. Information Research, 22(4), review no. R614 [Retrieved from http://informationr.net/ir/reviews/revs614.html]

Information Research is published four times a year by the University of Borås, Allégatan 1, 501 90 Borås, Sweden.