BOOK AND SOFTWARE REVIEWS


Russell, Matthew A. Mining the social Web.. 2nd ed. Sebastopol, CA: O'Reilly, 2014. xxiv, 421 p. ISBN 978-0-449-36761-9. £34.50/$44.99


The 'social Web' consists of all of the social media sites, the well-known ones such as Twitter and Facebook, the less familiar, like LinkedIn (which is used mainly be professional people) and those familiar to teenagers with particular interests, like DeviantArt and ReverbNation. For this book, however, the social Web is restricted to the better known sources, particularly Facebook, Twitter, LinkedIn and Google+. The aim of the book is to teach you how to develop Python programs to derive information such sites. The author suggests that the kind of questions we might like to ask include:

Who knows whom, and which people are common to their social networks?
How frequently are people communicating with one another?
Which social network connections generate the most value for a particular niche?
How does geography affect your social connections in an online world?
Who are the most influential/popular people in a social network?
What are people chatting about (and is it valuable)?
What are people interested in based upon the human language that they use in a digital world?

The book is in three parts: Part I. A guided tour of the social Web; Part II. Twitter cookbook; and Part III. Appendices. Each chapter in Part I is devoted to a specific social network and each chapter has a similar structure. Thus, Chapter 1 on mining Twitter, first gives an overview of the network and considers the reasons for its popularity, then provides guidance on how to access Twitter's API (with the assumption that you have a Twitter account) to issue instructions to the APA to get back reports on the kinds of questions above. The chapter provides the appropriate Python programs and a virtual machine is provided through which you can use IPython Notebook to develop and implement the programs. Following the API guidance, Python programs are presented to enable analysis of the tweets you may have extracted through a command to Twitter's API. The chapter ends with some final remarks, recommended exercises and a list of online resources. Part II of the book, the 'Twitter cookbook', 'features more than two dozen bit-sized recipes for mining Twitter data, including the discovery of trending topics

The remaining chapters in Part I deal in a similar fashion with Facebook; LinkedIn; mining Web pages in general, using natural language processing; mining mailboxes; mining the social coding site GitHub; and mining the semantically marked-up Web - perhaps the shortest of the chapters, which is understandable, given the embryonic nature of the semantic Web.

The book is heavily populated with links to various sources and, as the author remarks in the Preface, is ideally suited to the e-book format, which is also available. The the print reader, however, the code for all of the Python routines can be found at the GitHub site. The virtual machine referred to earlier is also available at GitHub

This is not a volume for the faint-hearted, you will need a high degree of enthusiasm for the subject and a willingness to spend some time setting up your own system to run the virtual machine and to implement the Python notebooks. However, the text would make a good class text for a course on data mining, or on the development of analytical techniques using social network data. One curious omission, however, is that the word 'ethics' does not appear in the index, and the author appears not be concerned about the ethical issue that might arise as a result of data mining in social networks.

Professor Tom Wilson
Editor-in-Chief
August, 2012