vol. 13 no. 4, December, 2008 Contents \| Author index \| Subject index \| Search \| Home

Analysis of the similarity of the responses of Web search engines to user queries: a user perspective

J.V. Rodríguez Jr., F.J. Martínez and J.V. Rodríguez.
Department of Information and Documentation, University of Murcia. Campus of Espinardo - Murcia, Spain

Abstract

Introduction. We report an investigation into the similarity of the responses of Web search engines. We discuss the coincidence of documents and their position in the response, in order to develop a measure closest to the user's context by incorporating several aspects and reflections established within this field of study.
Method. We introduce a method of calculation based in the cosine function, slightly adapted. Initially, calculations were performed manually and to simplify the process we have implemented a meta-searcher.
Analysis. We applied a first set of thirty questions to these systems during different periods of time. Secondly, we examined the similarity of the documents returned in the top ten, twenty and thirty positions. Subsequently, we conducted another round of searches (with fewer terms in the equations) to verify whether it affected the average values of similarity.
Conclusions. The estimated similarity is low, between 15% and 18%, and it has not changed significantly over the past five years. The number of terms used in the searches does not affect these averages.

Introduction

Web information retrieval systems have been evaluated almost from the origin and implementation of the Web. This should not surprise us because the need for evaluation has always accompanied the development of information retrieval systems. A broad set of qualitative and quantitative aspects have been evaluated. We can group them into four fields of interest: (i) performance, usually based in the effectiveness as we can see in Oppenheim (2000) and Martínez & Rodríguez (2003), with a supplementary set of works that avoid the use of the precision ratio to measure the effectiveness because it is affected by human subjectivity (Hersh et al. 1995, Landoni and Bell 2000); (ii) the proposal of Berners-Lee et al. (2001) and the World Wide Web Consortium for a semantic Web based upon an extensive use of metadata, which has been evaluated by Zhang and Dimitroff (2005a) (2005b); (iii) how users search the Web (Spink et al. 1998), (Spink and Jansen 2004), (Jansen and Spink 2006) and (iv) the system-user interaction (Schlichting and Nilsen 1997),(Johnson et al. 2001) and Marchionini (2004).

Leaving aside the nature of these engines, what could be the origin of this interest? We suppose that it may due to questions like 'Dear Professor, what is the best Web retrieval system?' This question might be answered (by the professor) saying, 'Probably the answer depends on the point of view of the persons who will make the analysis'. It is logical that, if they rate effectiveness as more important, they will consider measures such as time of response, precision, recall and overlap. To improve precision, they could measure the influence of metadata, a matter not sufficiently resolved today. Perhaps, the researcher will prefer to know the trends of Web usage, then analyse Web queries, the use of Boolean operators, Web query reformulation, the use of relevance feedback and viewing results. Finally, if they consider the interaction between systems and people, they will focus their study the interface design and other usability aspects that could affect information retrieval.

While there is great diversity among these lines of work, we think that it is possible to integrate some of them for adding an user's perspective to the traditional evaluation of performance. More concretely, we propose to incorporate data about the position a document in the search output like a measure of the overlap between the Web retrieval systems, with the goal of transforming a simple measure of coincidence of documents in a response to an extent of the overlap of documents in the useful response for users. It seeks to combine different viewpoints with the idea of bringing added value to the ratios that have so far been used.

It is a fact that the Web retrieval systems index large collections of documents. It is also true that the similarity between these indexes is very small. It has been traditionally assumed that this similarity is approximately 15%. If this is correct (usually this kind of comments is not supported by any study), it would mean an enormous diversity in the composition of these indexes which, in some way, most people would see when performing the same search in different systems (which, incidentally, is becoming less frequent). Nevertheless, another factor intervenes in the diversity of the reply, which is the ranking algorithm the systems use to provide the response. Each system implements a different algorithm; consequently, the level of divergence of the response the users can see is even greater.

Which of the levels of similarity -or divergence- is the most interesting for our analysis? From the point of view of the end user, we would choose that of the given response and, from the point of view of the administrator of a Web retrieval system, it would also be interesting to analyse the composition of the indexes, but for the users of the Web this is less important. We should not forget that it is very difficult to have access to all of these collections in order to analyse them. We can only find data on this concordance in the works of Losjland (2000a; 2000b) and Martínez (2002), besides the data on the overlap of Web retrieval systems which Notess (2002) took into account for the Search Engine Showdown blog (data not updated) and more recently Jansen and Spink (2006). So, it is difficult to find more sources of updated information on this subject.

For this reason, we find it interesting to bring about a method enabling us to carry out a periodical analysis of this factor and to use it in the response of the most used Web retrieval systems in the current World Wide Web.

Our original proposal

Our initial research was included in the field of the performance assessment. The most quoted projects are those related to effectiveness, particularly the work of Chu and Rosenthal (1996), which analyses more features. We should also mention the contributions of Leighton and Svristava (1999), and finally, the work of Gordon and Pathak (1999), who analysed more systems than anyone else. One of the first experiments developed in Martinez (2002) determined the level of similarity in the responses of six Web retrieval systems (Google, Altavista, All the Web, Terra, MSN, and Wisenut). Thirty queries (in Spanish) were carried out in these systems and the level of similarity was analysed among the first ten, twenty and thirty documents of the response. In order to determine this similarity, and inspired by Lojsland (2000a), we used a the 'cosine' similarity function (Salton & Harman, 1983) with the necessary adjustments to the context of the experiment, since this function was created in order to determine the similarity of two one-dimensional vectors and, in this case, we had bi-dimensional vectors: the Web search engine and its reply to a given question.

Why the limit of thirty documents, which is obviously a very small portion of the possible response?. Spink and Jansen (2004) give us the justification:

'…from 1996 to 1999, for more than 70 percent of the time, a user only viewed the top ten results. On average, users viewed 2.35 pages of results (where one page equals ten hits). Over half the users did not access results beyond the first page. Jansen et al.(1998) found that more three in four users did not go beyond viewing two pages. By 2001, only roughly one-thirs of users looked beyond the second page of Web sites retrieved.

We can find similar actions in Tang and Sun (2003) where the authors, basing their work on that of of Jansen et al.(1998):

'…decided to collect only the top 20 links among the thousands retrieved in light of previous studies showing that 80 percent of users view only the first two pages of results'.

Jansen and Spink (2006) affirm that “the first result page represents the top results that an engine found for a given query and therefore is a barometer for the most relevant results an engine has to offer”. It seems clear that thirty is the maximum number that most users are willing to consult. It is the useful response, the rest of the documents are ignored. In practice, it is possible that we are setting too high a value to this limit.

As discussed above, we introduced several changes in the 'cosine' similarity function to approach it to the context of the experiment, since this function was created in order to determine the similarity of two one-dimensional vectors and, in this case, we had bi-dimensional vectors: the Web retrieval system and the reply to a given question. Obviously, it was necessary to resize the results before using this function but it was not the unique change introduced into this calculation. We consider the position of documents in the response, favouring those that appeared in the top positions. Thus, the weight of each resulting document depending on its position in the response vector of the Web retrieval system (known as the 'relevance factor') was taken into account. This idea is based on the 'first 20 full precision' idea introduced by Leighton and Srivastava (1999) which gave an added value to the ability to place relevant documents within the first twenty delivered in response to the user. This function measured at the same time, accuracy and the capacity to show the relevant documents before the irrelevant (something very important for the user). If you regard inactive and duplicate documents as irrelevant, you favour those search engines that are up to date through the refreshment of their indexes. This factor enabled us to assess not only the coincidence of the documents in the response (that is, the overlap), but also in trying to assess the similarity of the useful response (those the user actually reads) taking into account the order these documents in the response.

Before calculating the similarity, we have to resolve an additional question, the type of operators to be used in the query formulation. We used (1) Boolean operators (more concretely, the 'AND' operator) and (2) the 'AND' operator combined with the 'exact phrase'. The obtained results were very similar. Noting further that the use of Boolean operators is growing on the Web (Spink and Jasen 2004), we consider that it is appropriate to experiment only with such operators.

An example.

To exemplify the method, suppose that a search has been carried out in the Web retrieval system A and in the Web retrieval system B. The results are represented in the following Table 1 (the coinciding URLs are in bold). The column on the right shows the weight given to each URL (in bold) depending on its position (the relevance factor):

**Table 1:** **Example of calculation of the cosine function adapted to our experiment**
Web retrieval system A	Web retrieval system B	Weight
http://www.first.com	http://www.first.com	1
http://www.second.com	http://www.two.com	0.99
http://www.third.com	http://three.com	0.98
http://www.fourth.com	http://www.second.com	0.97
http://www.fifth.com	http://www.fifth.com	0.96
http://www.sixth.com	http://www.sixth.com	0.95
http://www.seventh.com	http://www.third.com	0.94
http://www.eighth.com	http://www.eight.com	0.93

As previously mentioned, there is a distribution of elements with two characteristics: URL and 'weight' or relevance factor of the objective. In order to determine the similarity it is necessary to reduce the distribution achieved to a common n-dimensional space where 'n' is the number of coinciding URLs for each of the pair of vectors of the results plus the number of the relevant URLs found by each separate search engine. In order to do this, the initial vectors of the results become two individual vectors composed of the values of the relevance factor presented by each URL in the original search engines. This transformation of the vector space leads to Table 2 which represents the vector of global result and the vectors V (Web retrieval system A) and V (Web retrieval system B):

**Table 2:** **Transformation of the initial vectors of our experiment.**
Result vector	V(Web retrieval system A)	V(Web retrieval system B)
http://www.first.com	1	1
http://www.second.com	0.99	0.97
http://www.third.com	0.98	0.94
http://www.fourth.com	0.97	0
http://www.fifth.com	0.96	0.96
http://www.sixth.com	0.95	0.95
http://www.seventh.com	0.94	0
http://www.eighth.com	0.93	0
http://www.two.com	0	0.99
http://three.com	0	0.98
http://www.eight.com	0	0.93

Now we can calculate the cosine function with these two new vectors:

**Table 3: Final calculation of the similarity function**
Result vector	V(A)= V(Web retrieval system A)	V(B)= V(Web retrieval system B)	V(A)•V(B) (scalar product)	[V(A)]²	[V(B)]²
http://www.first.com	1	1	1	1	1
http://www.second.com	0.99	0.97	0.96	0.98	0.94
http://www.third.com	0.98	0.94	0.92	0.96	0.88
http://www.fourth.com	0.97	0	0	0.94	0
http://www.fifth.com	0.96	0.96	0.92	0.92	0.92
http://www.sixth.com	0.95	0.95	0.90	0.90	0.90
http://www.seventh.com	0.94	0	0	0.88	0
http://www.eighth.com	0.93	0	0	0.86	0
http://www.two.com	0	0.99	0	0	0.98
http://three.com	0	0.98	0	0	0.96
http://www.eight.com	0	0.93	0	0	0.86
Total			4.7	7.44	7.44

The result is obtained by dividing 4.70 within a value of 7.44 which gives a value of 0.63. This means that the systems A and B coincide in 63% for this search.

Our original results.

Applying this method, the obtained results of our original experiment are shown in Table 4.

**Table 4: Average similarities obtained in our experiment of 2002**
Pos	AW-GO	AW-MS	AW-TE	AW-WI	AV-AW	AV-GO	AV-MS	AV-TE	AV-WI	GO-MS	GO-TE	GO-WI	MS-TE	MS-WI	TE-WI
1-10	0.21	0.10	0.12	0.16	0.13	0.10	0.05	0.05	0.12	0.15	0.17	0.15	0.3	0.13	0.15
11-20	0.22	0.10	0.14	0.16	0.14	0.12	0.06	0.06	0.11	0.17	0.17	0.18	0.3	0.15	0.16
21-30	0.23	0.12	0.15	0.17	0.14	0.13	0.07	0.07	0.12	0.19	0.20	0.20	0.3	0.17	0.16
Pos: set of analysed documents. AW: All the Web. GO: Google. MS: Microsoft Network. TE: Terra Networks. WI: Wisenut. AV: Altavista.

The greatest similarities, of about 30%, were obtained by two second level (in terms of number of documents indexed and proportion of Web searchers) Web retrieval systems: Terra and MSN. At this time, these systems shared the same search technology (Inktomi), and have a high proportion of indexed documents in common, thus this high level of similarity in comparison to the rest. The average value of the obtained similarity for the first ten and twenty documents was 0.15, and 0.16 for the first thirty documents. These values confirmed the general idea of a coincidence level of 15%.

What has happened in the World Wide Web in recent years?

So many changes have happened in the Web in the last five years that many people, such as O'Reilly (2005), say that we are in the age of Web 2.0. In the field of Web retrieval systems, many developments have occurred also which we can summarise as follows:

The overlap of the search engines has been poorly evaluated. The data from the Search Engine Showdown blog (Nottes 2005), mentioned above, are not updated and we have only the data from Jansen and Spink (2006). The main analysts, such as Nielsen/Netratings, Comscore or searcheenginewatch.com, offer a different kind of information. It is difficult to locate updated data about the similarity of the response of Web retrieval systems.
An important set of new developments has occurred in Web retrieval systems. Through mergers and takeovers, Yahoo! has integrated the Altavista and All the Web technology and it became a search engine in February 2004. Microsoft has replaced MSN by the portal live.com. These two systems, together with Google, dispute the size of their collection of indexed documents, which already surpasses twenty billion documents.
The users basically use three Web retrieval systems: Google (58%), Yahoo! (18%), live.com (12%) and exceptionally American Online (4.5%) and Ask (3%). The data vary slightly from one source of information to the other (Nielsen/Netratings (2007) and Comscore (2008), and it will vary more depending on the part of the planet under analysis (for example, in Europe, Google reaches much higher percentages, around 75% of the searches). The most important fact is that these five systems concentrate 90% of the daily searches of the Web. The degree of importance of the rest is residual.

Bearing this information in mind, we can extrapolate the obtained results in 2002 by allocating the results of MSN to live.com, and the best values of similarity reached by All the Web or Altavista with each engine to Yahoo!, which is the system which, to a degree, has replaced them. Finally, the engines that people do not use will be removed because they represent only ten percent of searches carried out routinely by Web users. As a result, we have the adapted Table 5.

**Table 5: Average similarities obtained in our experiment of 2002**
Pos/Sim	Yahoo! - Google	Yahoo! - live.com	Google - live.com
1-10	0.21	0.10	0.15
11-20	0.22	0.10	0.17
21-30	0.23	0.12	0.19
Pos: set of analysed documents. Sim: similarity

September 2007 (our second experiment).

Since it is necessary to have updated data and it is not possible to continue doing this type of analysis manually, we decided to develop a meta-searcher, which could perform the search equations of the systems that are the objective of our study and, also, which could automatically determine the similarity of the obtained results. Consequently, we simplify this kind of experiment and are able to update the results.

The implementation of the meta-searcher.

Our development is based on taking the Application Programming Interfaces (API), which each search engine provides and to integrate them within one system. Through the analysis of each of these modules, we confirm that on the way towards supremacy in the market for information retrieval on the Web, the main retrieval systems establish their own rules. Some, such as Google, seriously restrict the number of results obtained (until the beginning of December 2007 it offered eight documents, but now it offers 32 results), or they force the use of proprietary programming languages, as in the case of live.com. Currently, Yahoo! is the search engine which enables the least restricted use of its indexes. As explained previously, the new Google API restricted the number of obtained results for each consultation to eight. This was a constraint in the first phase of our study, carried out in September 2007. It was only possible to take eight samples of each engine for each query. Nevertheless, we do not think that this factor devalues the results obtained at the end of the study since, in any case, we would have achieved a comparison of almost all of the first page of the results given by each engine. The next step in the process requires the implementation of the necessary codes in order to interrogate each engine and to retrieve the result sets given by each, in order to create a database of samples, which will be studied and analysed.

In view of the diversity of the APIs given by each system, it is necessary, as far as possible, to establish a programming criterion in which to place the core of the meta-searcher and, from this, to lay out the rest of the parts of this tool. The end result will be a tool in which the code is partitioned depending on its functionality. In short, we have the following:

Navigation core based in the PHP and Javascript programming languages, and in the HTML code.
Search engine. We differentiate the fragments for each search engine. Google: carries out the communication through JavaScript and using AJAX technology. Yahoo!: uses communication through REST queries using PHP code. live.com: has a Web Service which can be accessed through a SOAP client written in PHP.
Database interface. It is written in PHP, and it is responsible for the communication with the MySQL database.

Interface of our implemented meta-searcher

Figure 1: The interface of our meta-searcher operating with Google

The database of our meta-searcher allows the results of each search to be stored. It compiles the URL, title, description, ranking, and query-engine relation of each result. It also has a table for storing the statistics of each query.

The repetition of the experiment.

In order to test whether the current situation differs from 2002, the experiment was repeated by entering in the meta-searcher the same queries used at that time:

**Table 6: Set of queries used in the experiments**
Turismo rural en la Sierra del Segura	Alquiler de apartamentos en Málaga
Historia del Camino de Santiago	Curso a distancia de Programación en PHP
Principio de incertidumbre de Heisenberg	Diseño de sistemas multimedia para el aprendizaje
Academias de idiomas en Valencia	Discurso del Método de Descartes
Diseño accesible a páginas Web	Recetas de cocina y dieta mediterránea
Teoría de la Evolución de Darwin	Semana Santa en Murcia
Bibliografía de Miguel de Unamuno	Estrategias de Representación del Conocimiento
Galerías de Arte en Murcia	Empresas de fabricación de calzado en Alicante
Influencia de la televisión en los niños	Apuntes de Sistemas Digitales
Apuntes de Estadística Descriptiva	Modelos pedagógicos para la educación a distancia
Principio de Conservación de la Energía	Librerías de antiguo en España
Apuntes de Historia del Arte Barroco	Temario de Oposiciones de Matemáticas en Secundaria
Recopilación de Legislación en Derecho Civil	Historia de la ciudad de Ceuta
Compra-Venta de automóviles de segunda mano en Madrid	Evaluación de la calidad de la enseñanza universitaria
Literatura Española en el Siglo de Oro	Plan de Estudios de Licenciado en Comunicación Audiovisual

The similarity results of the first eight documents for each system in this second experiment (September 2007) are shown in Table 7.

**Table 7: Average similarities obtained in our second experiment in 2007**
Pos/Sim	Yahoo! - Google	Yahoo! - live.com	Google - live.com
1-8	0.17	0.15	0.11
Pos: set of analysed documents. Sim: similarity

Assuming that there may be a slight difference between these results, if we would have been able to determine the similarity of the ten first documents instead of the first eight documents, we can compare these results with those obtained in 2002 and we can see that the average similarity values in the response decrease very little (from 15% to 14%). This means that the similarity between the first results of each system is stable. This is surprising, since the size of the indexes has increased considerably and consequently there are many more documents on any subject, so the coincidence is much more complicated (or so we thought). Even if a dozen of documents is a very small sample in order to evaluate the similarity of the reply, we cannot ignore the large number of Web users who only read the first page of the system output. Within this context, coincidence value gains much more significance and importance.

The present.

We have previously commented that at the end of 2007, the Google API announced that it had enlarged the maximum number of documents obtained with each query from eight to thirty-two. This obviously helped our experiment and we introduced a set of changes in the meta-searcher. This was not as easy as we expected, because Google presented the results in the form of four pages of eight results. This made the analysis of the response more difficult, since the other APIs give the results in an unique list. Nevertheless, we managed to make the necessary changes. Currently, the meta-searcher can determine the similarity values up to the first thirty-two documents obtained by the three systems under analysis. In this case, the searches were made in two languages, English and Spanish in order to verify whether the similarity is influenced by the language. The following results were obtained by repeating the experiment with the new limit:

**Table 8: Average similarities obtained with the limit of thirty-two documents per reply**
Pos/Sim	Yahoo! - Google	Yahoo! - live.com	Google - live.com
1-32 (Spanish)	0.19	0.21	0.14
1-32 (English)	0.15	0.11	0.10
Pos: set of analysed documents. Sim: similarity

In comparison to the results obtained in 2002, there are variations in the similarity engine by engine but not in the average value, which is 0.18 in both cases. This value is slightly higher than when only the first eight documents of the response are analysed. The current experiments seem to indicate that Google and live.com offer the most different responses, although in fact the differences are still small. The behaviour of the questions made in English is very similar although the average values are lower (0.12). We think that it is logical because the space in which to locate documents is broader (in the Web context there are more documents written in English than in Spanish).

Complementing the experiment with 'short queries'.

One of the more repeated experiments by researchers on information searching is to determine the average length of the searches. Spink and Jansen (2004) give more information about it, noting that the average is small (between one and two words for a search). Our previous searches intended to simulate the behaviour of students interested in some subject ('Heisenberg Principle', 'Einstein Theory of Relativity', 'History of Ceuta', etc.) or the behaviour of a simple person who wishes to pass wonderful holidays in Málaga or to buy a second-hand car. The truth is that as our average search length exceeds the normal ratio, it does not reflect the general user's behaviour.

It would be interesting to verify if the average values of similarity vary with shorter searches. In order to replicate the behaviour of current Web users, we extracted the terms for the new queries from the 2006 Top 10 US Search Ranking (Kopytoff 2007), choosing the ten most commonly used terms in each of the three Web retrieval systems, forming a new group of thirty queries closer to the Web context and their users.

**Table 9: Terms most searched in USA during the year 2006.**
Google	Yahoo!	live.com
Bebo	Britney Spears	MySpace
MySpace	WWE	Dictionary
World Cup	Shakira	Games
Metacafe	Jessica Simpson	Cars
Radioblog	Paris Hilton	Food
Wikipedia	American Idol	Song lirycs
Video	Beyonce Knowles	Poems
Rebelde	Chris Brown	New York
Mininota	Pamela Anderson	Baby names
Wiki	Lindsay Lohan	Music

In the same direction, we only calculated the similarity of the ten first results of each Web retrieval systems. As we can easily see in the next table, the changes in obtained average values of similarity in this new experiment with respect to the previous ones are barely significant (0.14 against 0.18), although, like novelty, in some particular cases they appear null values of similarity (the query 'American Idol' does not have any common document in the Yahoo! and Windows Live responses, for example), this did not occur frequently in the previous searches, which used longer search formulations.

**Table 10: Average similarities obtained with 'sort queries'**
Query	Yahoo! - Google	Yahoo! - live.com	Google - live.com
American Idol	0.2590	0.0000	0.2604
Baby names	0.0000	0.0000	0.5060
Bebo	0.1262	0.3800	0.1328
Beyonce Knowles	0.2512	0.1315	0.1288
Britney Spears	0.1315	0.1235	0.3773
Cars	0.2383	0.0000	0.0000
Chris Brown	0.0000	0.0000	0.0000
David Beckham	0.2643	0.3827	0.1328
Dictionary	0.1249	0.2563	0.1301
Food	0.1301	0.1173	0.0000
Games	0.2446	0.0000	0.0000
Jessica Simpson	0.1315	0.0000	0.2578
Lindsay Lohan	0.2576	0.1315	0.1315
Metacafe	0.0000	0.0000	0.2604
Mininova	0.0000	0.0000	0.1342
Music	0.0000	0.2434	0.0000
MySpace	0.1315	0.1315	0.1288
New York	0.0000	0.0000	0.0000
Pamela Anderson	0.2525	0.0000	0.1315
Paris Hilton	0.1342	0.2590	0.2564
Poems	0.0000	0.0000	0.1211
Radioblog	0.2510	0.3786	0.0000
Rebelde	0.3787	0.0000	0.0000
Shakira	0.3699	0.1211	0.3827
Song lyrics	0.0000	0.0000	0.2526
Video	0.0000	0.0000	0.0000
Wiki	0.3892	0.1315	0.2500
Wikipedia	0.2656	0.3945	0.3906
World Cup	0.1274	0.1342	0.1185
WWE	0.0000	0.1302	0.2656
Averages	0.1486	0.1149	0.1583

The immediate future

We have conceived two lines of work for the immediate future. The first, more focused on measuring performance effectiveness, intends to overcome the limit of thirty-two documents in each analysed query, reaching a total of 100 documents. The second line is closer to the user's behaviour: we shall try to incorporate a set of parameters related to other emerging patterns in Web search, such as Web query reformulations, the distribution of search terms and the use of relevance feedback. If we are able to incorporate some of these aspects in our work, we will be able to show the real Web context. We can also repeat these experiments using only one language, instead of both Spanish and English as reported here: the results of the two experiments on search query length could be of interest when the two languages are compared.

Another objective is to extend the number of Web retrieval systems studied. Initially we considered incorporating the API of Ask (Antezeta 2007) or of any of its associated systems, although, after a subsequent study, we have opted for a comprehensive redesign of our meta-searcher because we only can use a small number of available APIs and they are very limited by their owners. So, we are working in the initial implementation of a new version of our meta-searcher, more complete and powerful, and more independent of the Web retrieval systems's design.

This extension of the reach of our meta-searcher has another objective: to determine the distance between the reply of an Web retrieval systems to an individual question and the ideal reply for this information need. This reply would be based in the semantics of the reply given by each of the Web retrieval systems and by using the techniques of decomposition of singular values with a similar approach to the automatic allocation of the best article reviewers sent to a scientific magazine proposed by Dumais and Nielsen (1992). We would need to make use of a wider and stronger information basis, so that we may increase the number of analysed documents. Increasing the number of sources will also be essential.

Conclusions

We think it is possible to approach the performance evaluation of the Web retrieval systems from the perspective of information searching, incorporating several aspects of Web users' trends and habits into the design of an evaluation methodology. The indexes of the Web retrieval systems are very different. Each system seems to have indexed different spaces in the Web, with very little overlap, mainly in the first documents of the replies (which is unlikely to lead to user satisfaction). This overlap is is slightly lower with documents written in English with those written in Spanish. Certainly, the different criteria for implementing the ranking algorithms contribute to this. Our study confirms that there is little similarity between the responses of these systems. From the analysed Web retrieval systems, Google and live.com have least overlap and for an exhaustive information search, it is necessary to employ several Web retrieval systems at the same time.

The number of search terms does not introduce significant differences in the similarity of the reply, it is not a decisive factor for search engines. It may be that comparing sets of thirty documents is too small, given the magnitude of the indexes of the Web retrieval systems. These conclusions need to be confirmed by extending the number of documents of the analysed sample. Having more information and a consistent basis for calculation (for the number of documents obtained and the analysed sources of information) can help us to create an ideal response for a given information request. If we involve in the calculation factors that are close to the users, we are calculating the ideal response to a user. Undoubtedly, this would be a significant achievement.

Acknowledgements

This work and our stay in Vilnius would not have been possible without the invaluable help of Tom Wilson and Elena Maceviciute. Our sincere acknowledgement to them.

References

Antezeta Internet Marketing. (2007). Decrypting Ask's Web search API. Milan: Antezeta Internet Marketing. Retrieved 9 December, 2007 from http://www.antezeta.com/ask/decrypting-ask-web-search-api.html (Archived by WebCite® at http://www.webcitation.org/5cy0sz7R2)
Berners-Lee, T., Hendler, J. & Lassila, O. (2001). The semantic Web. Scientific American, 284(5), 34-56. Retrieved 9 December, 2008 from http://www.sciam.com/article.cfm?id=the-semantic-web (Archived by WebCite® at http://www.webcitation.org/5cy10NNFR)
Burns, E. (2007) U.S. search engine rankings, September 2007. New York, NY: Incisive Media. Retrieved 9 December, 2008 from http://searchenginewatch.com/showPage.html?page=3627654 (Archived by WebCite® at http://www.webcitation.org/5cy5AWkOw)
comScore (2008) comScore data center: measuring the digital world. Reston, VA: comScore. Retrieved 9 December, 2008 from http://www.comscore.com/press/data.asp (Archived by WebCite® at http://www.webcitation.org/5cy1C60y1)
Chu, H. & Rosenthal, M. (1996). Search engines for the World Wide Web: a comparative study and evaluation methodology. In Global Complexity: Information, Chaos and Control: ASIS 1996 Annual Meeting - October 19 - 24 1996. Electronic Proceedings. Silver Spring, MD: American Society for Information Science. Retrieved 9 December, 2008, from http://www.asis.org/annual-96/ElectronicProceedings/chu.html (Archived by WebCite® at http://www.webcitation.org/5cy1i3cuD)
Dumais, S. & Nielsen, J. (1992). Automating the assignment of submitted manuscripts to reviewers. In Nicholas Belkin, Peter Ingwersen & Annelise Mark Pejtersen, (Eds.) Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval. (pp. 233-244). New York, NY: ACM Press.
Hersh, W.R. et al. (1995). Towards new measures of Information Retrieval Evaluation. In Edward A. Fox, Peter Ingwersen & Raya Fidel, (Eds.) Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval 1995, Seattle, Washington, United States. (pp. 164-170). New York, NY: ACM Press.
Gordon, M. & Pathak, P. (1999). Finding information on the World Wide Web: the retrieval effectiveness of search engines. Information Processing and Management, 35(2), 141-180.
Jansen, B.J. Spink, A. Bateman, J. & Saracevic, T. (1998). Real life information retrieval: a study of user queries on the Web. SIGIR Forum 32(1), 5-17.
Jansen, B. & Spink, A. (2006). How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Information Processing and Management, 42(1), 248-263.
Johnson, F.C., Griffiths, J.R. & Hartley, R.J. (2001). DEVISE: a framework for the evaluation of Internet search engines. Manchester, U.K.: Manchester Metropolitan University, Centre for Research in Library and Information Management. (Library and information commission research report 100) Retrieved April 20, 2008 from http://bit.ly/iD8X (Archived by WebCite® at http://www.webcitation.org/5cy2Seqv7)
Kopytoff, V. (2006, December 25). Year's top search terms. San Francisco Chronicle, Retrieved 9 December, 2008 from http://bit.ly/rjo9 (Archived by WebCite® at http://www.webcitation.org/5cy2bvkQZ)
Landoni, M. & Bell, S. (2000). Information retrieval techniques for evaluating search engines: a critical overview. Aslib Proceedings, 52(3), 124-129.
Leighton, H.V. & Srivastava, J. (1999). First 20 precision among World Wide Web search services (search engines). Journal of the American Society for Information Science 50(10), 870-881.
Ljosland, M. (2000a). Evaluation of Web search engines and the search for better ranking algorithms. Paper presented at the SIGIR99 Workshop on Evaluation of Web Retrieval, Aug 19, 1999. Retrieved 9 December, 2008, from http://www.aitel.hist.no/~mildrid/dring/paper/SIGIR.html (Archived by WebCite® at http://www.webcitation.org/5cy36Ff3n)
Ljosland, M. (2000b). Evaluation of twenty Web search engines on ten rare words ranking algorithms. Retrieved 9 December, 2008, from http://www.aitel.hist.no/~mildrid/dring/paper/Comp20.doc (Archived by WebCite® at http://www.webcitation.org/5cy3Ehu14)
Marchionini, G. (2004). From information retrieval to information interaction. In Sharon McDonald & John Tait, (Eds.) Advances in information retrieval. (pp. 1-11). New York, NY: Springer-Verlag. Retrieved 9 December, 2008 from http://ils.unc.edu/~march/ECIR.pdf . (Archived by WebCite® at http://www.webcitation.org/5cy3fQ5xA)
Martínez Méndez, F.J. (2002). Propuesta y desarrollo de un modelo para la evaluación de la recuperación de información en Internet. Unpublished doctoral dissertation, Universidad de Murcia, Spain. Retrieved 9 December, 2008 from http://www.cervantesvirtual.com/FichaObra.html?Ref=10010&ext=pdf&portal=0 (Archived by WebCite® at http://www.webcitation.org/5cy3y6WDb)
Martínez Méndez, F. J. & Rodríguez Muñoz, J.V. (2003). Síntesis y crítica de las evaluaciones de la efectividad de los motores de búsqueda en la Web. Information Research, 8(2), paper 148. Retrieved Jan 18, 2008, from http://InformationR.net/ir/8-2/paper148.html (Archived by WebCite® at http://www.webcitation.org/5cy4BQPKY)
Nielsen/NetRatings. (2007). Nielsen online reports topline U.S. data for November 2007. Retrieved Jan 7, 2008, from http://www.nielsen-online.com/pr/pr_071210.pdf (Archived by WebCite® at http://www.webcitation.org/5cy4OCJKd)
Notess, G.R. (2005) Search engine statistics. Retrieved 9 December, 2008 from http://www.searchengineshowdown.com/statistics/ (Archived by WebCite® at http://www.webcitation.org/5cy4UWxLZ)
Oppenheim, C., Morris, A., McKnight, C. & Lowley, S. (2000). The evaluation of Web search engines. Journal of Documentation, 56(2), 190-211.
O'Reilly (2005) What is Web 2.0? Design patterns and business models for the next generation of software. Retrieved Jan 10, 2008, from http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-Web-20.html (Archived by WebCite® at http://www.webcitation.org/5cy4j7tYQ)
Salton, G. & Mc Gill, M.J. (1983) Introduction to Modern Information Retrieval. New York, NY: Mc Graw-Hill Computer Series.
Schlichting, C. & Nilsen, E. (1997). Signal detection analysis of Web search engines. Redmond, WA: Microsoft Corporation. Retrieved 9 December, 2008, from http://www.microsoft.com/usability/Webconf/schlichting/schlichting.htm (Archived by WebCite® at http://www.webcitation.org/5cy4sTIWL)
Spink, A., Bateman, J. & Jansen, B.J. (1998). Searching heterogeneous collections on the Web: behaviour of Excite users. Information Research, 4(2). Retrieved 9 December, 2008, from http://informationr.net/ir/4-2/paper53.html (Archived by WebCite® at http://www.webcitation.org/5cy5GqqKc)
Spink, A. & Jansen, B.J. (2004). A study of Web search trends. Webology, 1(2). Retrieved 9 December, 2008, from http://www.Webology.ir/2004/v1n2/a4.html (Archived by WebCite® at http://www.webcitation.org/5cy5XxXY3)
Tang, M. & Sun, Y. (2003) Evaluation of Web-based search engines using user-effort measures. LIBRES, 13(2). Retrieved 9 December, 2008, from http://libres.curtin.edu.au/libres13n2/tang.htm (Archived by WebCite® at http://www.webcitation.org/5cy5dSGI3)
Zhang, J. & Dimitroff, A. (2005). The impact of Webpage content characteristics on Webpage visibility in search engine results (Part I). Information Processing and Management 41(3), 661-690.
Zhang, J. and Dimitroff, A. (2005). The impact of Webpage content characteristics on Webpage visibility in search engine results (Part II). Information Processing and Management 41(3), 691-715.

Find other papers on this subject

Check for citations, using Google Scholar

Bookmark This Page