Appendix 7. Papers submitted to journals
Appendix 7.3 - Information seeking and mediated searching. Part 3. Successive searching
Our project has investigated the processes of mediated information retrieval (IR) searching during human information-seeking processes to characterize progressive changes and shifts that occur during an information seeking process. This has included information seekers' situational contexts; information problems; uncertainty reduction; cognitive styles; and cognitive and affective states. We have also sought to characterize related changes over time, and examine changes in information seekers' relevance judgments and criteria, and characterize their differences. Few studies have investigated these issues. The research has involved observational, longitudinal data collection in the U.S. and U.K. Three questionnaires were used for pre-and post-search interviews: reference interview, information seeker post-search and search intermediary post-search questionnaires. In addition, the Sheffield team employed a fourth set of instruments in a follow-up interview some two months after the search. Related search episodes, with a professional search intermediary using the Dialog Information Service, were audio taped and search transaction logs recorded. The findings are presented in four parts. Part 1 presents the background, theoretical framework, models, and research design used during the research. Part 2 is devoted to results related to uncertainty. Part 3 provides results related to successive searching. Part 4 reports findings related to cognitive styles, individual differences, age and gender. Further papers will discuss findings from this complex research project.
As the Web, digital libraries and IR systems become more prolific for information access for many people worldwide, we need to learn more about users' interactions with IR technologies during their information-seeking behavior. The study reported in this paper seeks to contribute to the investigation of human information-seeking behavior in interactive environments and improve our understanding of users' behavior when seeking information from IR systems. In other words, users' interactions with IR systems are studied within the context of their information-seeking behaviors and the context of their search for information (Spink, 1996, 1999; Spink, Bateman & Greisdorf, 1999; Vakkari, 1999; Wilson, 1999). This paper provides data on characteristics of mediated successive online searches conducted by intermediaries using the Dialog online information service for 8 information-seekers present during the online search interaction. The reasons for successive searches, the frequency of successive searches, and characteristics of successive searches are identified. This study extends previous research that shows information-seekers often conduct successive mediated or unmediated searches over time on the same or evolving information problem (Spink, 1996; Spink, Bateman & Greisdorf, 1999).
Successive search episodes then become units for observation and analysis. A search episode is a user interaction with either a single or multiple digital information systems, i.e., CD-ROM databases, online databases, digital libraries, Web search engines, or OPACs separated by a time (hours, days, weeks, months) for evaluation of the previous search episode before embarking upon a new search episode. The modeling of users in successive searches is then successive user modeling.
Spink, Wilson, Ford, Foster and Ellis (2002) propose a theoretical framework for understanding IR interactions within an information-seeking context. The theoretical model depicts a user's situated actions within IR interactions over time. Time is represented in four categories: (1) interaction time, (2) successive searching time, (3) information-seeking time, and (4) problem-solving time. Successive searching currently receives little, if any, support from present IR interfaces and procedures, or from Web search engines. Largely, IR systems are built following a single search paradigm, i.e., they are designed and operate on the assumption that every search is an unrelated search to any previous or future searches by the user on the same or evolving topic. Some systems (such as Dialog or Lexis/Nexis) support saving searches for successive searching. However, research to improve support features for successive searching is in its formative stage.
Lin and Belkin (2000) propose a multi-dimensional conceptual model (MISE) for successive searching, including episodes, information seeking processes, information problem and problematic situation. Based on work by Schutz and Luckman (1973) they propose reasons for renewal of information seeking episodes or successive searches, including:
Different factors lead to the initiation, sustaining, halting and re-engagement of information seeking and searching processes. This line of research is significant as it goes beyond the one search approach generally adopted by IR researchers. The one search approach is limited by recent research that shows information-seekers with a broader problem-at-hand often seek information in stages over extended periods and use a variety of information resources. As time progresses, information-seekers' often search the same or different IR systems for answers to the same or evolving problem-at-hand. As they learn or progress in their work, or as they clarify a problem and/or question, or as their situational context changes, users come back to various IR systems for further related searches. The process of repeated, related searches over time in relation to a given, possibly evolving, information problem (including changes or shifts in beliefs, cognitive, affective, and/or situational states), is called a successive searching.
The studies reported in this paper are part of an ongoing project that seeks to investigate successive searching (Spink, Wilson, Ellis & Ford, 1998; Spink, Wilson, Ford, Foster & Ellis, 2002). The next section of the paper provides theoretical background and discusses research in IR interaction, information seeking and Web studies.
Successive Searching Studies
Recent studies highlight the weakness of research based on the single search approach and the need for studies that classify and categorize successive searching behavior. Studies show many IR system users conduct successive searches when seeking information related to a particular information problem.
Some early studies, exploring other issues related to IR systems interaction, noted that users were conducting more than one online search on a topic.
These studies, in the early 1990's, identified a phenomenon that had not been explored previously in IR research. The common approach in IR research at this time was to examine only one search conducted by an information seeker.
Successive Searching Studies
As the 1990's progressed, accompanied by the development of information-seeking studies and models, specific studies were conducted to investigate successive searching behavior by IR system users and later by Web users.
IR Systems Interaction
In the context of Web searching, Spink, Bateman and Jansen (1999) found that one-third of respondents to an interactive survey of Excite users were first time users, conducting their first search of Excite on their current topic; two-thirds reported a pattern of successive searches of between one Excite searches on their current topic; many reported more than five Excite searches on their topic; and 38 reported conducting more than 20 searches on their topic. Those who were beginning their information seeking reported mostly single searches.
Modeling the Successive Searching Process
Following the growth in successive searching studies, an NSF funded study by Amanda Spink (Spink, Wilson, Ellis & Ford, 1998), began a deeper exploration of the successive searching process reported in this paper. In addition to this study, various researchers began to conduct studies related to modeling successive searching.
In summary, a growing number of studies have begun to identify the characteristics of successive searches and model the successive searching process in the end-user and mediated search context. Previous studies by Vakkari, et al., imposed a requirement of three searches on study subjects during an information seeking process. The study discussed below collected data on mediated searches conducted at the request of an information seeker to further explore and model the successive search process.
The research questions we addressed during this study were:
Such research is the logical next step for research to further model the successive searching process; improve IR and Web system and interface design, and user education.
Three search intermediaries were recruited from University of North Texas graduate students at the School of Library and Information Sciences during the semesters from Fall Semester 1998 to Spring Semester 1999. Each intermediary had been trained in the Dialog Information Service. Each intermediary then worked with volunteer information-seekers to conduct as many Dialog searches as necessary to assist them to resolve their information problem. Many of the information-seekers were students, staff or graduate students.
The data collected during our research included: (i) search transaction logs, (ii) numerical data and responses to given questionnaires, (iii) texts retrieved and assessed relevance judgments. The research was conducted for eighteen months in the U.S. Clients were classified by broad discipline, i.e., humanities; 'pure' social sciences, such as economics, political science, sociology, etc.; applied social sciences, such as social welfare and social administration; pure science; medicine; and engineering. The numbers of humanities and medical clients were rather small and the former were incorporated into the pure social sciences group, while the latter were included in the pure science group. This gave four discipline categories.
Pre-Search Interview: In this first interview, a detailed description of the participant's problem was obtained, together with responses to interview questions and responses to a questionnaire, which covered, for example, problem stage, Kuhlthau's stages, feelings about the progress of the work, other information seeking activities, and uncertainty.
On-line Search and Post-Search Interview: During the search, computer logs were kept, together with audiotapes of the interaction between information seeker and the search intermediary. After the search, the participants completed another questionnaire on aspects of the search and, again, on their certainty/uncertainty with regard to different stages of problem resolution. The search intermediary also completed a search assessment instrument.
Three questionnaires were used to record various aspects of context that are connected to context and not record able in transactions: an information seeker pre-search (reference interview) and post-search, and search intermediary post search. The aim of the pre- and post-search questionnaires was to capture the information seeker's state in a number of areas before and after their search. This allowed the measurement of changes or shifts by information seekers resulting from their search.
We provide results related to the frequency of successive searches, the reasons for successive searches and characteristics of successive searching.
Frequency of Successive Searches
Table 1 lists the search topics, search frequency and reasons for the successive searches conducted for 8 information-seekers. Search topics ranged across the physical science, social sciences, humanities and medical issues.
The data in Table 1 shows a total of 18 mediated searches were conducted, including:
Successive searches were generally spaced over time with some information seekers requesting a second or third search within a week and some within a month. Many reasons were identified for requesting successive searches.
Reasons for Successive Searches
Table 2 takes the data from Table 1 to show the frequency of reasons cited for successive searches.
The major reason reported by the intermediaries for conducting successive searches was the information seeker's need to refine or extend the first search based on their evaluation of the previous search results or due to changes in their information problem - including the need to search different databases or use different search terms to find more information. In some cases, there were multiple reasons for conducting more than one search.
Refine and Enhance Search Using Results From a Previous Search
All but one information seeker requested a successive search to refine or enhance the results from the previous search. This may include the use of new search terms.
Information Seeker Requested More Information
In six cases information seekers requested another search to seek more information. This reason was often related to the need to refine or enhance the results from the previous search.
Search Different Databases
In four cases information seekers requested a search using the same or modified search terms on different databases.
Refine the Search - Too Much Data Retrieved in a Previous Search
In three cases the need to refine the next search resulted from too much data retrieved during the first search that was difficult for the information seeker to evaluate.
Refine the Search Due to Increased Problem Complexity Due to Previous Search Results
In two cases the information seeker reported that the results of the previous search increased the complexity of their information problem and necessitated another search.
First Search Only Exploratory
In one case an information seeker reported that they regarded the first search as exploratory and they wanted a more refined successive search.
In four cases information seekers requested successive searches either to:
Refining and enhancing previous search results relates to Lin and Belkin's (2000) process of "problem transmuting" or the modification of an information problem that necessitates a successive search. Further characteristics of successive searches were identified.
Characteristics of Successive Searches
Some characteristics of the successive searches were investigated, including: information-seeking stage of the information-seeker, sources of the search terms, and changes in the search terms and databases searched.
Table 3 provides data on the number of searches requested by the 8 information-seekers.
Number of Searches
All information seekers requested a second search on their topic, but only two information seekers requested a third search. Information seeker six requested three searches. The second search refined the first search strategy with new search terms and databases. A third search used new terms selected from the second search results to refine the search strategy. A similar situation emerged with information seeker seven.
Items Retrieved & Search Cycles
The mean number of terms used in second searches was significantly higher than for first searches, but higher than third searches. Second searches were characterized by more search items retrieved and more search cycles than first searches. Third searches were the longest in terms of search cycles, but lower items retrieved. First searches were often exploratory and their results were used to identify new search terms and identify new areas and databases. Even though second searches were described as "refining searches", they were more extensive and interactive in search cycles.
Table 4 provides results related to the search terms used during successive searches.
The mean number of search terms per search (with overlap) did not change significantly between first, second or third searches. With no overlap, the number of search terms used in second searches was significantly lower than for first searches. Second searches were characterized by more refined search terms within more search cycles and more items retrieved. The mediated search situation is obviously not too different from the end-users search situation is this respect. Vakkari, et al., (2000) found that search terms became more specific over end-users successive searches.
Changed Search Terms Over Successive Searches
Table 5 shows how the number of search terms changed over successive searches.
Overall, there was limited overlap in search terms used in first and second searches. About one in five search terms appeared in both searches, reflecting the major search term changes between first and second searches. By the third search, there was no overlap in search terms used from the second search. Information seeker six identified a number of terms from the previous results that were used in the third search.
Sequential Order of Search Terms Use Classified by Source
Table 6 provides the sequential order of search terms that were classified by the source taxonomy developed by Spink and Saracevic (1997):
Interestingly, none of the search terms were sourced from the search intermediary. All first searches and most searches included terms largely from the question statement and conversation between the information seeker and search intermediary. Spink and Saracevic (1997) found that most search terms during mediated searching were sourced from these sources. In two searches, a thesaurus was used to identify search terms. In two searches, term relevance feedback was used predominantly during a second and third search. Information seeker six identified many terms from the results of the second search that were used during the third search.
Information-seekers were the main source of search terms during the successive search process, although intermediaries did contribute search terms in more than 50% of searches. Spink and Saracevic (1997) also found that information-seekers were the major and most effective source of search terms during mediated online searching. Table 6 shows that most successive searches involved changes in search terms from the initial search.
Interestingly, in nearly one third of cases, successive searches involved the use of the same search terms, with no additions and deletions of terms from previous searches. This finding supports previous studies (Spink, 1996: Robertson & Hancock-Beaulieu, 1992) of OPAC end-users who frequently changed their search terms between successive searches. Spink, Bateman and Jansen (1999) also found a similar result with end-users conducting successive searches of the Web.
Table 7 shows how search operators were used across successive searches.
More search operators were used during first searches than later searches. The only exception was the NOT operator that was primarily used during second searches as part of the refining process. The mediated search situation is obviously different from the end-users search situation is this respect. Vakkari, et al., (2000) found that search operators were used by end-users later in their later searches. Search intermediaries are distinguished by their training in Boolean searching and seem more likely to use Boolean operators from the initial search.
Table 8 and Table 9 provide a summary of the commands analysis.
During mediated searches, the second search was characterized by greater use of more commands and tactics, except for the select command. The mediated search situation is similar to the end-users search situation is this respect. Vakkari, et al., (2000) found that more commands and tactics were used by end-users in their later searches.
Table 10 shows the number of successive mediated searches that involved a change in databases.
In six cases the databases used changed over successive searches. Interestingly the same databases were often repeatedly searched (with either the same or different search terms) over successive searches. Successive searches often included a change in both search terms and databases. From the data we can see that successive searches involved changes, refinements or extensions from the initial search. How the changes evolved depended upon the nature of the information problem and the information-seeking stages and changes experienced by the information-seeker due to the results from previous searches.
Information-Seeking and Problem Solving
Information-seekers were asked to indicate if their problem solving and information seeking processes before and after each search (Table 11).
The data shows that information seekers often experienced shifts in their information seeking and problem solving stages during and between successive searches. For example, after Search 1, most information seekers reported being in the same problem solving stage as before their search. Concurrently, more information seekers reported a change in the information seeking stage as a result of the first mediated search. Some information seekers reported shifting back to a previous stage or "problem rolls back" (Lin & Belkin, 2000). Previous studies (Spink, 1996; Spink, forthcoming; Spink & Wilson, 1999) show that information seekers experience shifts and changes in their information seeking and problem solving processes due to their interactions with IR systems and subsequent changes in their information problems.
Spink, Greisdorf and Bateman (1999) also identify relevance judgments as an important element in successive searching.
Table 12 shows data concerning the relevance of the retrieved items during successive searches.
An analysis of the data in Table 12 shows that:
The results of the study reported in this paper extend previous research by identifying additional characteristics of successive mediated searches. On average, information-seekers requested two searches, and some information-seekers requested three mediated searches. Successive searches often involve a refinement or extension of previous searches, with new databases searched or search terms changed, as the information-seekers' understanding and evaluation of results evolved over time from one successive search to the next. Some successive searches involved no change in databases.
Theoretically, most interactive IR models and studies of IR system use should take account of the reality that users not only iterate their queries, but also their searches over time. The integration of interactive IR models with human information behavior models is presented in Spink (1999) and can be extended by adding successive searching processes.
In practice, information-seekers and intermediaries should be trained to understand that many information problems are not resolved with one IR system search. The picture is more complex. On average, information-seekers may need to conduct more than one search or possibly 2-3 searches during their information-seeking process - just to provide focus and clarity to their information needs. Modifications to information-seeker and intermediary training will be required to account for the reality of this need.
CONCLUSION & FURTHER RESEARCH
This study highlights many issues that need further research in interactive information retrieval. We need to examine more characteristics and processes associated with successive searching. Does the stage of an information-seeking process that the seeker has reached, or the time elapsed, have an impact on the number of successive searches undertaken? Further research is also required to examine the factors more deeply that compare and characterize information problems that satisfy the user with a single search as opposed to those information problems that lead to successive searches. Currently, most IR systems and interfaces do not greatly assist users during successive search episodes.
We thank referees for their useful comments on the paper.
This is a draft of a paper published in Journal of the American Society for Information Science and Technology, Volume 53, No. 9, 2002, 716-727
David Ellis is now Professor, University of Wales, Aberystwyth
Uncertainty in information seeking, by Professor Tom Wilson, Dr. David Ellis, Nigel Ford, and Allen Foster
Library and Information Commission Research Report 59
ISBN 1 902394 31 3 ISSN 1466-2949
Grant number LIC/RE/019