Antoniou, Grigoris and Van Harmelen, Frank. A semantic web primer. Second edition. Cambridge, MA: The MIT Press, 2008. xxi, 264 pp. ISBN: 978-0-262-01242-3. $42.00
This is the second edition of the very popular primer that introduces the semantic Web. Explaining what the semantic Web is involves not only contrasting it to the familiar 'everyday' Web, but also sorting out the various visions of the semantic Web suggested over the last decade. Explaining how it works involves relating a thicket of angle-bracket technology to semantics and logic processing. This book is an excellent introduction to both of these explanations.
Much of the first chapter, as well as the last chapter, are devoted to the important task of explaining what the semantic Web may be. Chapter one sketches The Semantic Web Vision while chapter eight exposes Four popular fallacies about the semantic Web. Chapter one distinguishes the semantic Web from the everyday Web, sketches the idea of mechanically harvesting meaningful information from the Web, explains the important role of metadata, suggests the use of logic in the acquisition of knowledge, and provides a pyramidal conceptual frame with Uniform Resource Identifiers at the bottom and 'Trust' of the knowledge harvested from the semantic Web at the top. The last chapter of the book dismisses fallacies such as the fear of the imposition of a top-down single meaning on the semantic Web, and the necessity of retro-fitting all extant Web pages with metadata. It suggests that repurposing many existing ontologies and employing natural language processing and machine learning techniques can be levers for creating a semantic Web.
All of this may or may not happen. The reason that so much space must be devoted to defining the semantic Web is that it remains a vision still being born. Thus, while this book is a primer on the semantic Web, it is also a snapshot of the continuing conceptual and technical development of the semantic Web.
The remaining six chapters are devoted to a description of how the semantic Web may work. These six chapters are further subdivided into three chapters introducing principal semantic Web technologies followed by three chapters introducing logic, real-world applications and ontology engineering. As the heart of the book, these six chapters represent a conventional vision of the semantic Web: Searchers would be aided by ontologies constructed with OWL (Ontology Web Language) to harvest the knowledge embedded in Web resources as RDF (Resource Description Format). If your destination is OWL, then the starting point of your voyage would be XML (Extensible Markup Language). A chapter is devoted to each of these technologies.
The tutorial treatment of these three technologies is one of the real virtues of the book, but also reveals the danger of a broad, but shallow, treatment. For example, the structural details of XML are presented so that one can understand elements such as its prolog, processing instructions and well-formed tags. Both DTDs and XML schemas are discussed. But if the novice reader wished to know what the acronym 'DTD' meant, he would discover that this primer lacks a glossary. A careful reading of page 28 would inform him that DTD stands for 'Document Type Definition', but the principal presentation of DTDs on page 34 doesn't inform him of this. The book's single index entry 'DTD' points the reader to page 34. This implies that, while the book bills itself as a 'primer', a certain threshold of technological awareness is necessary for its successful absorption. The casual or na´ve reader might never find out what DTD stood for, and would therefore have to possess a pretty high threshold for techno-speak to remain comfortable. Usefully, chapters end with a summary of points, suggested readings and exercises and classroom projects. The characteristics of a broad, but shallow, presentation of technology, however, could easily be overcome by employing this primer as a classroom textbook and supplementing it with lecture, discussion and examples.
The RDF Schema section of Chapter three reveals the authors' core interest in the application of ontologies to the semantic Web. This section begins with the question 'How do we describe a particular domain? which motivates the remainder of the book. Two example OWL ontologies are given: an African Wildlife ontology and a Printer ontology. OWL is so central to the authors' conception of the semantic Web that the book includes an appendix giving these two ontologies in a more readable abstract OWL syntax. Chapter seven brings us full circle on the creation of ontologies by stating that 'there is no correct ontology of a specific domain' (p. 226). Chapter seven provides a student project that could be done by a small team of students in several weeks.
Chapter six is particularly interesting because it sketches a series of semantic Web applications, and therefore provides us with a snapshot of semantic Web development circa 2007. An example would be the problem of an academic publisher possessing an information repository organized 'vertically' (e.g., each academic journal would be a deep information silo), but wishing to harvest information 'horizontally' across multiple silos. Screen shots make these examples realistic.
Any area of rapid conceptual and technological advance challenges the long-term relevance of a paper book presentation. For example, while the imprint date of this second edition is 2008, the W3C (World Wide Web consortium) RDFa Primer working draft of 20 June 2008 defines RDFa (e.g., using the attributes of XHTML to host RDF content) as 'bridging the human and data Webs. Obviously, if RDFa is successful, it would represent a considerable shift in both the conception and mechanics of the semantic Web presented here.
Also missing is any reference to the current developments of LOD - Linked Open Data, which shifts the unit of analysis from documents to data on the semantic Web. Points for a possible third edition of this primer might be Tim Berners-Lee's description of a 'Web of data' and 'How to publish linked data on the Web' by Tom Heath and others published 27 October 2008. The LOD Website estimates that by October 2007 open available datasets host over two thousand million RDF triples (two billion in U.S. parlance) which are interlinked by about three million RDF links. Perhaps only time will reveal the ultimate configuration of the semantic Web.
Terrence A. Brooks