Chu, Heting. Information representation and retrieval in the digital age. Medford, NJ: Information Today for the American Society for Information Science and Technology, 2003. xiv, 248 p. ISBN 1573871729. (ASIST monograph series). $44.50.
This monograph, written by Heting Chu, Ph.D. at College of Information and Computer Science, Long Island University, aims to give a comprehensive overview of the research field that is commonly known as Information Storage and Retrieval, or Information Retrieval (IR) for short. The author attempts to use an even more adequate nomenclature by calling it Information Representation and Retrieval (IRR), which points to the fact that this research field not only encompasses the problems surrounding physical storage of information, but also how it should be represented to facilitate retrieval. The addition "...in the Digital Age" seems redundant, since the kind of information retrieval described by the author is computer-aided, in other words "digital". It could also be discussed what constitutes the digital age. The presence of computers per se? The presence of a sufficiently high number of computers at work and in people's homes? The appearance of the Internet? This designation is as unclear as the buzz phrase "information society".
Compared to another overview, Modern Information Retrieval (1999) by Ricardo Baeza- Yates and Berthier Ribeiro-Neto, which has become one of the standard IR text books, the presentation style is markedly less detailed and formal. The occasionally personal and quite unacademic writing style gives the impression that the author strives to be "reader-friendly" and that this work is intended for a broader audience than undergraduate level in computer science. Actually, the structure of the book hints at the target audience being people working in the field of library and information science. This initiative, to bridge the technical core of IR and the practical use of it, in a non-technical fashion, is highly called for, because several introductions to the field suffer from indulging in mathematical-technical expressions, which alienates many potential readers.
The first chapters are devoted to a historical overview of the research field, followed by a presentation of the traditional methods of knowledge representation and organization: indexing, classification (here called categorization), and summarization. This is followed by a discussion of the nature and use of metadata, the problems of full-text as opposed to controlled language in IR, and the challenges of representing multimedia, i.e., other information formats than text. The author goes on to discuss linguistic issues in full-text IR (such as homonyny and synonymy) and various strategies for query formulation. Only after this "operational presentation", spanning six chapters, do we get a brief presentation of the major IR models, the theoretical foundation of the field. The development of the topics, from the concrete hands-on techniques of information representation and searching to the theoretical underpinnings, strongly indicates a focus on practice rather than theory, and an orientation towards the profession of the information specialist rather than the researcher in the field. The treatment of the IR models is superficial, bordering on over-simplification, but may still give the novice a picture of the mechanisms underlying the information systems. Unfortunately, the lack of formal descriptions leaves the definition of the concept of a model somewhat hanging in the air, which makes the treatments of the cognitive model and the natural language model unclear as to their relation to the "major" models (also called "classical" models)—whether they are models of the same kind or not. Some strange formulations in various locations of the book add to the impression of superficiality, for instance that Boolean operators (AND, OR, NOT) are used to express relations between the query terms (p. 104), that the vector space model is unable to mimic Boolean operations (this is even incorrect!), that there exists only one probabilistic model (called the "probability model") and that the measure 'precision' indicates the "discriminating ability of an IR system" (p. 70), rather than evaluates the signal-to-noise level of a search result. I would like to recommended this monograph to be accompanied by a more formal presentation, for instance, the one given in Modern Information Retrieval, to equip the reader with a more comprehensive understanding of the theoretical aspects of IR.
As an introduction to the field I think Chu's book works well. The reader gets a concrete and balanced picture of the "handicraft" of organizing and finding information, an understanding that these processes can be studied from both a system and a user perspective, and that information exists in several media such as text, image and sound, each having its own set of problems related to representation and retrieval. However, the brevity of some chapters gives the impression of an unfinished work and leads to some glaring simplifications of the material at hand. This is particularly evident in the chapter on the major IR models as well as the final chapter about IRR and artificial intelligence (AI). The aspects of AI treated in the last section of the book is mainly devoted to natural language processing (NLP) and intelligent agents. Since the author emphasizes the "digital age" in the title one could have expected much more from a chapter like this; for instance a presentation of the techniques of digitalization, pattern classification (e.g., using neural networks), as well as deduction and decision making. A thorough presentation of the Semantic Web is also lacking, and the position of the short description that exists in the book is not obvious (being part of the chapter on IRR and AI).
The concluding discussion is also quite shallow, which can be exemplified by the following quote (p. 234):
How far can AI go in information representation and retrieval? The answer to the question is yet to be found, chiefly because it is not a trivial task to build and program a machine that can successfully imitate the intellectual activities of human beings. This should be the most difficult nut to crack in AI research.
My thoughts on this book can thus be summarized: a noteworthy attempt to create a technically stripped-down presentation of IR that may be useful to give an initial orientation to the field (especially to students who are less than enthusiastic about mathematical formulae littering the presentation), but too superficial to give a thorough understanding of the mechanisms underlying the retrieval processes. Also, some material that could be expected from a work dated 2003, and aims to be comprehensive, is conspicuously missing.