Hersh, William. Information retrieval: a health and biomedical perspective (2nd ed.) New York, NY: Springer, 2003. 517 p. ISBN 0387955224 £89.95.
The updated edition of William Hersh's 'Information Retrieval' provides a thorough and extremely valuable description of the theory and context of information retrieval research in health and biomedicine. The book assumes no prior knowledge of the theory and concepts of information retrieval, or of an understanding of the fundamental principles and requirements of health and biomedical information, and provides a detailed account of these for those who may not be familiar with the field. It is helpful, therefore, to health and medical researchers and practitioners who have no formal training in information retrieval (IR) and also to experienced researchers in IR who are interested in applying their knowledge in what is a highly diverse and complex field.
The book is divided into three parts that describe the relevant basic concepts (Part 1), the current state of the art (Part 2), and the direction of future research (Part 3) in information retrieval in this area. The first part begins with a definition of terms relevant to IR, followed by a description of models and resources to which they are applied in the domain of health and medicine. The book has been updated to take account of the development of the Internet and World-Wide Web (WWW), and a valuable introduction to these is provided at the end of the chapter. The following chapter provides a general introduction to information and theories of information science before focussing on scientific and health information. The description of the latter is particularly informative in that it describes the peer-review process and discusses the limitations of primary and secondary health and medical literature. The inclusion of sections on the quality of information on the WWW, information needs in relation to health care and on the rise of evidence-based medicine (EBM) make this an invaluable chapter for both novices and experts, not least because it is very readable and likely to stimulate ideas for discussion and research. The final chapter in this section, describing the various systems for evaluation the performance of IR systems is a useful source of reference for researchers, with examples illustrating the application of the systems in health and biomedicine. The text of this first section is very readable and accessible for students interested in understanding the application of IR in health and biomedicine.
Having been given this comprehensive introduction to the concepts and theory of IR and health information, section two provides us with a thorough account of current IR applications in health and biomedicine, with a large amount of material on Web-based systems and interfaces. Chapter 4, on Content, provides an introduction to bibliographic databases, and describes Medline and other National Library of Medicine databases, as well as other health and medicine-related bibliographic databases and full-text electronic publications, and their access through the Web. An introduction to and account of Indexing are provided in the following chapter, together with a description of controlled vocabularies, including the MeSH (Medical Subject Headings) vocabulary. A detailed description of manual and automated indexing systems is provided, including systems used for the Web. Chapter 6, on Retrieval, provides a useful introduction to the principles of searching, including exact- and partial-match searching, which will be helpful to those without a background in IR. The section on Searching Interfaces is illustrated using Medline as an example, although full text searching interfaces and Web search engines are described in some detail. Chapter 7, The final of this section, describes different metrics for evaluating IR systems and commences with a discussion of the relatively small amount of evaluative research that has been conducted on the Web and the need to consider older IR systems as well as the Web-based systems. The chapter describes usage frequency, types of usage and user satisfaction studies, illustrated with examples of evaluation studies in health and medicine. The extensive section describing systems-oriented evaluations of bibliographic, full-text and web-based systems is followed by an equally extensive section on user-oriented evaluation measures. The chapter concludes with interesting and informative sections on current understanding of the factors affecting the success and failure of IR systems, their impact on users in health and medicine, and a summary of what is currently known about IR systems in this area. This provides a suitable backdrop for the final, and longest, part of the book describing the research approaches to IR.
The four chapters comprising Part 3 describe research methods used in IR and their application in general areas, with rather less emphasis on their application in health and medicine. The section is necessarily technical in places and certainly not for the faint-hearted, with detailed descriptions of complex research approaches. Different techniques for evaluating lexical-statistical systems of IR are described in Chapter 8, as well as their application, with particular reference to the TREC (Text Retrieval Conference) initiative and the results of evaluations. Chapter 9 concentrates on IR research for linguistic systems, describes the use of natural language processing for IR and provides helpful examples of their application in health and medicine. Alternative approaches, such as morphological analysis, word-sense disambiguation and semantic approaches are described in some detail and the final section provides a helpful introduction to cross-language retrieval and question answering, again with appropriate examples to illustrate these approaches. Approaches to developing better systems for users are described in Chapter 10, with detailed descriptions on research aimed at improving content, indexing, retrieval, devices and digital libraries. These sections commence with a description of previous approaches to these areas where appropriate as well as examples from medicine and health. The final chapter in the book outlines ways in which information is extracted from information sources and describes techniques and research aimed at processing clinical information, such as MedLEE (Medical Language Extraction and Encoding), Symtext and Snomed.
Overall, this is a very useful IR text for health and medical and IR researchers, as well as a useful introduction to IR for students, with plenty of detailed examples of the application of IR. Having a book that you can dip into as needed, and which does not require the reader to have read all of the preceding chapters, is very helpful. It is likely to become a useful reference for many in academic and clinical research and teaching, as well as a welcome addition to reading lists within information science. It is a pity that the price is likely to prevent its use as a course text.
The book has a Web site, where the author posts corrections and updates.
Peter A. Bath