Vol. 11 No. 2, January 2007
Much research on Internet search behaviour has been conducted on samples that were convenient to the researcher, such as undergraduate or graduate students at research universities. (Hargittai & Hinant 2006). This limited focus brings into question the applicability of any findings to the wider community of Internet users.
One way of overcoming this difficulty is to study the transaction logs of search engines (e.g., Broder 2002; Jansen et al. 1998, 2000, 2005; Selberg & Etzioni 1995; Spink et al. 1998). However, as Wang et al. (2000) note: 'The cognitive and holistic approaches of studying user behaviours require that researchers observe the "real" process as it happens, not merely the outcomes of a process'.
In this paper, we report on the early stages of a study that aims to observe and describe Internet searches by members of the general public. The paper serves three main functions: it presents a preliminary qualitative analysis of data gathered in the early stages of the project; it identifies a number of key factors that appear to be emerging; and it uses these factors to suggest a range of search dimensions that may be of value in measuring and comparing searches.
The project employs a combination of qualitative and quantitative approaches. Searches are observed and are subjected to detailed qualitative study. The results of this analysis are used to identify patterns of search behaviour that may be amenable to quantitative analysis. This paper presents the results of the first stage, based on a qualitative analysis of an initial small sample. Other stages of the project will entail large-scale quantitative testing using as wide a range of people as possible to ensure that such patterns are representative of search behaviour in general and are not an artefact arising from a narrow sample.
In selecting our sample, we have aimed to recruit volunteers who represent the demography of Sheffield as a whole. By the end of the project, it is anticipated that one hundred volunteers will have completed 400 to 500 observed and recorded searches.
This paper reports on findings of an initial qualitative analysis based on thirty-nine searches by nine volunteers (five women, four men). The volunteers ranged in age from 28 to 77 years (mean = 45). Length of Internet experience averaged about three years.
One of the aims of the project is to observe as wide a range of search behaviour as possible. In each session therefore, some time is spent watching volunteers search for topics of their choosing and some is spent watching them perform set tasks. Volunteers are asked to explain their actions and describe their thought processes.
Sessions are recorded using My Screen Recorder (Deskshare 2006), which creates a record of the volunteer's comments and actions. A back-up record of keystrokes and Websites is also created using SpectorPro
Volunteers are contacted beforehand to confirm arrangements. When contacted, they are asked to recall occasions when they have tried to find information using a search engine, but have encountered difficulties. They are asked to repeat these searches. After completing self-selected tasks, searchers are given two to three set searches (depending on the time available).
A task is deemed to have been completed, either when a volunteer feels that a satisfactory answer has been obtained, or when s/he wishes to stop searching. In all cases, to reduce pressure on volunteers, the researchers stress that the decision to end a search is the volunteer's and that he or she should not feel obliged to answer the questions.
The set tasks (Table 1) were carefully worded to avoid prompting volunteers with terms that could be of use in formulating a search.
|Closed||Open||Simple||Multi-stage||Implicit||Explicit||No. times search performed|
|Heads||1) What was written on Neville Chamberlain's piece of paper (see article below).||X||X||X||6|
|2) You have won a trip to Saga. Can you find out anything interesting about the place?||X
|Tails||1) You've received a postcard from friends who say they are abroad, visiting somewhere called Map. Where are they?||X||X||X||3|
|2) There are many opportunities to win things on the Internet. Find some that may be of interest to you.||X||X||X||2|
|Additional||3) Find the postcode of the tallest British building outside of London.||XX
(stages 1 & 2)
The tasks were designed to enable the effect of a number of factors to be taken into consideration. These are as follows:
Open vs. closed searches: open searches are those for which many answers may be found. Closed searches, by contrast, have a clear 'right' answer (Marchionini 1989).
Simple vs. multi-stage searches: as noted in the introduction, according to the evidence of transaction logs, the majority of searches are simple and consist of a query comprising a small number of search terms, with no subsequent modification to the query. To ensure that volunteers occasionally have to engage in more complex search behaviour, some of the set problems require them to carry out searches which need more than one piece of information. To complete such an exercise, a volunteer has to carry out a chain of linked searches, which may entail a combination of open and closed searches.
Implicit vs. explicit domain knowledge: one criticism of much search research is that, where queries are imposed (i.e., they emanate from a source other than the searcher (Gross 1999)), there are implicit assumptions made about the searchers’ domain knowledge (Madden et al. 2006). In this study, many of the searches carried out were ones that the volunteers suggested themselves, so no such assumptions are necessary. Those that were imposed were designed to tax the searchers in the event of their chosen searches proving to be simple. All the imposed searches (with one exception) assume only basic knowledge. For the exception, the required knowledge is made explicit by presenting volunteers with the necessary information in a short paragraph (Appendix), from which adequate search terms can be derived.
After searchers have completed the tasks that they bring with them, they are assigned two of the tasks shown in Table 1. A coin is tossed to decide which ones. If the same search tasks are selected more than three times in a row, future volunteers are given the alternative until balance is restored.
If there is sufficient time available and if the volunteer is willing, an additional search is carried out.
One of the aims of the project, is the identification of patterns and structures within searches. In order to achieve this aim, it is necessary to identify elements of a search that can be quantified. Some dimensions are introduced and discussed below. However, to focus solely on quantifiable elements of a search would result in much useful qualitative data being ignored. The analysis presented below therefore, draws on both qualitative and quantitative data.
Transcriptions were made of the recordings and quotations were coded. These provided an insight into the level of understanding that volunteers had of their chosen search engine and of the functions they were using. They also help to illuminate the process by which volunteers selected the Web sites they wished to examine more closely.
Most of the quantitative data presented below are based on the number of mouse-clicks made in each search. These statistics were easy to obtain from the screen recordings and were used to calculate search length and search depth (discussed below as Search dimensions ).
Preliminary impressions from the recordings however, suggested that the length of time spent perusing results could prove to be a key factor in whether or not a search was successful. Recordings were therefore re-analysed and times between clicks were noted. Occasionally, these timigns were adjusted. Volunteers would often be struck by a thought relating to their search practice and stop searching in order to discuss it. Their remarks, while interesting and often relevant to the project, were digressions from the search being timed. They were therefore measured and deducted from the overall time between clicks. A delaying digression was deemed to have occurred if:
Time was only deducted if both these conditions applied.
Google was the preferred choice for thirty-three of the thirty-nine searches. All volunteers used it at least once; but three used more than one search engine within a search (Table 3). One searcher attempted the same search on four different search engines.
Previous research (e.g., Bilal 2002; Madden et al. 2006) had suggested that, when asked to select a search topic, most people would choose an open one. This proved to be the case: seventeen of the 39 searches reported on here were selected by the volunteers (Table 2). Of these seventeen, thirteen were open and four were closed. By contrast, the majority of search exercises set by the researchers are closed (Table 1).
|1||1*: Details of the film 'Summer Storm'.||X|
|2: Why will 'Natasha Kaplinsky' not be on Breakfast TV for the next six months?||X|
|2||1: The 'Sheffield Property Shop' Website.||X|
|2: Information about Council Housing in Nottingham .||X|
|3: French property news.||X|
|3||1: The policy of the UK Kennel Club on white boxer dogs.||X|
|4||1: Driver for a Canonscan 300 scanner.||X|
|2: Free audiobook downloads.||X|
|5||1: Information about the 'Sheffield Pals' battalion in WWI.||X|
|6||1: Tropical fish tanks.||X|
|2: Addresses of Bed & Breakfast establishments in York .||X|
|7||1: Timetables for Vietnamese trains.||X|
|2: Information about Voluntary teaching abroad.||X|
|8||1: Atomic Rooster.||X|
|2: Charlie Poole.||X|
|9||1: Directions for getting to Lincoln Cemetary.||X|
|2: Complaints about the 'DVD giveaway' from the Daily Express newspaper.||X|
|*Search number. See Table 3 for details of strategies employed to complete these searches.|
|Category of search|
|Syntactic modification||Semantic modification
|1||3*(Chamberlain)||1, 2, 4 (Saga), 5 (Tower)||5 (Tower)|
|2||1, 2, 3, 4 (Chamberlain),
|3||1, 2, 4 (Prize)||3 (Map)|
|4||4 (Saga)||1, 2||3 (Map)||3 (Map)|
|5||1, 2 (Chamberlain),
3 (Saga), 4 (Tower)
|1, 2 (Chamberlain)||1|
|6||1, 3 (Chamberlain),||2||4 (Prize)|
|7||2, 3 (Chamberlain), 4 (Saga)||1,5 (Tower)|
|8||1, 2||3 (Chamberlain)|
|9||1, 4 (Saga)||2, 3 (Chamberlain)|
|No. of searches
Many of the searches that began as open however, became closed. This happened for one of two reasons.
Most of the volunteers' own searches did not need refining. Although they were asked to carry out searches with which they had previously had difficulties, in general, they encountered few problems when trying to repeat the searches. Those who had difficulties did so for two main reasons. Some were confused by sponsored links. Often however, searchers would enter a large number of terms into the search box with the result that the search was too specific and no useful sites were retrieved. One of the more extreme examples of this was volunteer 4 (Table 2), who began his search for online audio books by entering into Lycos the phrase: 'where can i find Audio books that i can download to my IPOD/pc?'
This was most clearly illustrated by the searches for the city Saga and for accompanying information. Only three of the six volunteers searching for 'Saga' were successful in locating the city. All three used Google and the search terms they entered were (Volunteer no. is given in parentheses. See Tables 2 and 3):
Having found a suitable link to a site that dealt with the city, these volunteers then stopped using Google and continued their search from the chosen site.
Two of the three unsuccessful volunteers searched for:
The third began with Saga . However, unlike the other three volunteers who began with short searches, he examined the results for seven seconds then modified his search to saga 'location'. He ended his search (unsuccessfully) with:
The fastest of the three searchers who located Saga spent fifteen seconds examining the results. These three were also the only volunteers to succeed in finding the contents of Neville Chamberlain's piece of paper.
Of the four people searching for the tallest UK tower outside of London, two used Google's 'NOT' operator (i.e., '-') and a third expressed a desire for such a function whilst talking through her choice of search terms:
One of the two users of '-' had used it in earlier searches and had also used 'OR'. None of the other volunteers used any Boolean operators, though use of quotation marks was common. However, it was clear that much of this usage was done with little or no understanding of the effects. This is most clearly illustrated by a search for:
Not surprisingly, no Web sites were retrieved.
Observations of the thirty-nine searches discussed in this paper suggest a number of measures that may be of value in describing and comparing searches in future (Table 5). These are based, in part, on similar measures used by Nichols et al. (2004).
The most obvious measurement of a search is the number of clicks that the searcher makes before he or she either concludes or abandons the search (Table 5).
Clicks made using the initial search engine of choice are shown in a separate column from clicks made on other sites, making it possible to calculate the proportion of each search that would be recorded on the search engine's transaction log.
Maximum depth is defined as the largest number of clicks made on a page found using a search engine. So for example, Volunteer no. 7, looking for information about Saga City, quickly retrieved the City's official Web site. She used nine clicks ( depth=9 ) to navigate within and from this Website, without returning to the search engine. By contrast, when she attempted to discover the postcode of Britain 's tallest building outside of London, she examined several sites, one after another, but did not consider any to be helpful. She therefore returned frequently to the search engine. Her search depth on this exercise was just one.
The time that searchers spent studying Web pages (including pages of retrieved results) differed considerably. At one extreme, searchers would enter a search term, spend a few seconds scanning the resulting hits, then modify the search term. Their searches therefore had a lot of steps, but were carried out quickly. At the other extreme were searchers who would slowly and carefully study each page, before taking any action.
Search intensity is defined as the mean length of time spent studying retrieved material before a decision is made either to continue searching (indicated by a mouse click), or to conclude the search.
As noted in the introduction, a common way of studying searches by the general public has been through the analysis of log transaction data. Such studies clearly have value and have, for example, revealed patterns in the use of Boolean operators (Whittle et al. 2006). However, observations made to date in this study, suggest that such analyses will tend to underestimate the complexity of searches for three reasons, all of which can be inferred from the observations made above.
|Volunteer||Initial search||Related concept|
|1||tallest building uk outside london||hilton deansgate Manchester|
|4||Where in the world is a place called Map||microsoft world encl ( sic ) (=encyclopaedia)|
|5||Sheffield batallion ( sic )||Richard Sparlling ( sic )|
|neville chamberlin I have in my hand ( sic )||munich pact|
|Volunteer||Search no.||Length on search engine (On)||Length off search engine (Off)||On (On+Off)*||Depth||Intensity: Mean (s.d.)|
|*Proportion of search carried out on the search engine. On+ Off = total length of search.|
Many of the studies referred to earlier, which report on the analysis of transaction logs, stress the brevity of searches. However, as Jansen (2000) noted in a study involving five search engines, there is often little advantage in using complex queries. Indeed, our study so far appears to endorse this and also suggests that they may, on occasion, be counter-productive. Amongst the searches observed for this study, one common reason for failure has been the use of too many search terms. Some of this was due to inappropriate use of Boolean operators of the kind noted by Jansen et al. (1998, 2000). In other searches however, users were far too prescriptive, which caused the search engine to exclude many potentially useful sites.
The tools developed to search electronic databases prior to the advent of the World Wide Web searched well-ordered collections of documents, in which considerable human effort had been employed to remove ambiguities and where minimum standards were in place to ensure the appropriateness of included material. A Web search engine has a more difficult task, because many of the search terms that might be used have a wide range of meanings. If a search term has a meaning which is particularly common within the Web-using community, it will be reflected in the ranking of the hits. Often however, more than one meaning is popular and this diversity is apparent within the first page of hits.
Observations to date suggest that, rather than beginning a search with a complex enquiry, a more effective strategy is to begin with a simple search, but then to scrutinize the results closely, so that the range of meanings can be fully appreciated. In other words, it is better to search with intensity rather than at length.
From the findings of this study so far, it seems that often, the best search strategy is a combination of simplicity and scrutiny. Those volunteers who entered a few search terms but then carefully studied the results, appear to be more successful than those who attempt to be prescriptive and enter a long series of terms. It was also interesting to note the extent to which searching took place away from search engines. It is hardly surprising that the studies of Internet searching have focussed on the use of search engines. However, it is clear from the findings of this study to date that, while search engines have an important role to play in information seeking on the Internet, often, the major part of a search takes place elsewhere.
The authors would like to acknowledge the support of the Arts and Humanities Research Council, which funds this project. We would also like to thank staff at Sheffield's City Learning Centres, the Forum 100 and Sheffield University 's Institute for Lifelong Learning who have put us in touch with potential volunteers. Finally, we would like to express our gratitude to our volunteers for their time.
What was written on Neville Chamberlain's piece of paper (see article below)
The Right Honourable Arthur Neville Chamberlain (18 March 1869-9 November 1940) was Prime Minister of the United Kingdom from 1937-1940. Chamberlain is perhaps one of the most ill-regarded British Prime Ministers of the 20th century, largely due to his policy of appeasement towards Nazi Germany regarding the abandonment of Czechoslovakia to Hitler at Munich in 1938.
Chamberlain on his return from Munich, waves the infamous piece of paper containing the resolution to commit to peaceful methods signed by both Hitler and himself. He said:
'My good friends, for the second time in our history, a British Prime Minister has returned from Germany bringing peace with honour. I believe it is peace for our time.'
Extract from Wikipedia
|Find other papers on this subject|
© the authors, 2006.
Last updated: 16 December, 2006