BOOK AND SOFTWARE REVIEWS


Shroff, Gautam. The intelligent Web. Search, smart algorithms and big data. Oxford: Oxford University Press, 2013. xxiii, 295 p. ISBN 978-0-19-964671-5. £20.00/$29.95


The title is perhaps a little optimistic, since so little is still known about human intelligence, but this is an excellent introduction to those aspects of artificial intelligence (within the weak AI paradigm) that are having an effect on the delivery of Web services and that may have even bigger effects in the future. In the acknowledgements the author notes his debt to his mother, who read the chapters as they were produced and, thereby, enabled him to ensure that the book is accessible to someone who knows nothing about computer science. And this is true: the book can be read by any reasonably intelligent person and they will gain an understanding of what is going on under the weak AI umbrella. Indeed, a welcome feature is just how well the author writes: there is the occasional use of the peculiarly Indian-English as per, which is always a give-away to the origins of the writer and which is rarely used in standard English, but, otherwise, the standard of writing is superior to that found in many computer science texts.

Each chapter in the book has a single-word title: Look, Listen, Learn, Connect, Predict, and Correct. We might regard these as the keywords that describe the content of the chapter, but if we look closely we can see that they also describe the sequence in which human intelligence may acquire understanding. We observe and listen to the world around us, we make connections and predict outcomes and we make corrections to our behaviour in the light of how accurate our predictions have been and the direction in which they have been wrong. For Web services to function as if they were intelligent (in some way analogous to human intelligence), it is clear that the same set of functions must be simulated.

Chapter 1, Look, is devoted mainly to search engines and, particularly, to Google and the extent to which it now gathers data about those who use the search engine (i.e., 'looking' at us). Google's search engine is merely the front end to its advertising business and the more it knows about us, the more easily it can target specific advertisements at us when we search. It can recognize, for example, by fairly simplistic means whether we are searching to buy something or simply searching to discover information about something; so, if I enter a produce type and the word 'price' in the search box, I am likely to find ads on the search output giving me links to firms offering the product and indicating a range of prices. If I add 'UK' to the search strategy, I am likely to find that links to sites in the USA have disappeared. If I use a term like 'CMC camera' and omit the word 'price', I'm likely to be directed to informative sites, such as Wikipedia, and the ads may be more generic in character. Using AI techniques to gather and analyse more and more data about us and our searches must occupy a lot of time in Google's offices!

In Chapter 2, Listen, the keyword is used rather loosely, since 'listening' does not mean attending to voice messages, but the more careful analysis of the activity on the Web. We get an introduction to the notion of the semantic Web and the problems of determining the meaning of language, and the techniques of natural language processing are introduced, since they underpin Google's Adwords method of auctioning ads to companies. The more effectively Google can interpret our searches, the more they are able to convince companies that money spent on advertising through Google will not be wasted. Among other things in this chapter, the author tells us about that auction method and the bidding process, through which a successful bidder does not pay the price they bid but a small increment over the next highest bidder. A process almost guaranteed to encourage high bidding because the highest bidder knows that s/he will not have to pay what was bid.

Learning, of course, is key in the human animal's attainment of intelligence and Chapter 3 is concerned mainly with what is known about human learning. Presumably we have to know a great deal more about how humans learn before we can even begin to create a machine that can learn. There are some technologies, such as neural nets, for example, that mimic how the human brain is thought to function and recently a project was funded to build a machine that replicates, as far as possible, the structure of neurones, etc., in the human brain. I read recently that the fibres that connect the elements of the brain, if stretched out in one continuous thread, would go four times round the Earth, so I imagine the designers will be able to do little more than replicate a fraction of the brain's capacity. It also seems that different parts of the brain are associated with different functions, such as vision, hearing, artistic creativity, etc., and figuring out how to simulate, say, the rational processing of text, perceived with the visual cortex, may be quite a challenge.

The brain, as noted earlier, is all connections and Chapter 4, Connect, is largely about attempts to simulate those connections through the logical apparatus used in the digital computer. I was a little surprised that the author gives no attention to pattern recognition in this chapter, since others have noted that this is one of the critical differences between machine activity and human brain activity. In chess, for example, it is believed that chess masters function mainly on their ability to recognize and interpret the patterns they see on the chess board, whereas the digital chess computer, Deep Blue, which beat the human chess champion Gary Kasparov in 1997 did so because of its enormous data processing power, reviewing, within the time allowed between moves, millions of possible outcomes of a move and its consequences. Kasparov on the other hand would be processing much less data and, in all probability, would see emerging patterns on the board and would be able to select a move that gave him the most advantageous pattern as a result.

The author presents prediction (Chapter 5) as a fundamental human capability. We can see, for example, that the ability to predict with incredible degrees of accuracy is central to the success of tennis players, and indeed participants in virtually every other ball game. In tennis, when player A strikes the ball, player B, working at instantaneous mental processing speeds, is able to predict to within a fine tolerance, exactly where the ball is going to hit the court surface and at what angle it will rise from the surface. Even moderate player will have the basic ability to make this kind of prediction. The ability appears to be innate and and capable of development in most people; but computers do not function as people do, no talents are innate, and their predictions must be based on the statistical analysis of the data presented to them. The notion of 'big data', which has emerged in recent years, has drawn attention to the need for rapid analytical analysis, to turn the masses of data into information that can be assimilated by the human mind. The revelations about the activities of the NSA in the USA and GCHQ in the UK, who have devised methods for stripping the metadata from billions of messages on the Web and analysing them to detect terrorist activity is a case in point. The author makes the point that the computational capacity of the Web, as a whole, is now sufficient to enable prediction; me must wait and see whether such capability is put to any more socially useful activity than predicting for companies what we are likely to buy.

In Chapter 6, correction is introduced by reference to Google's self-driving car and the use of feedback in managing the control systems of the vehicle. From here we move on to flocking behaviour in birds, the eight queens chessboard puzzle and optimization. Little is actually said in this chapter on the application of these ideas to the Web, but I imagine that Google's self-driving car has access to Google maps online and that optimization algorithms play a part in route planning.

Overall, this is an excellent introduction to a variety of disciplines that are exploring the capabilities of the Web and vast amount of data and information now available. It will be a useful text for any beginning course on artificial intelligence, Web science, Internet studies or information science. It will also be of interest to any lay person seeking to become acquainted with the further reaches of the Web.

Professor Tom Wilson
Editor-in-Chief
February, 2014