Information Research, Vol. 8 No. 4, July 2003
This paper reports results of a user study conducted in the UK to evaluate the digital information services and projects of the Joint Information Systems Committee's Information Environment, JISC's IE, (formally known as the Distributed National Electronic Resource, DNER) from an end-user perspective. The study was undertaken as part of the EDNER project (Formative evaluation of the DNER), a three year project funded by the JISC. Test criteria for the user study draw upon Quality Attributes which were first posited by Garvin in 1987 and subsequently applied to information services by Brophy in 1998. They have been further modified for this context.
The Joint Information Systems Committee (JISC) Information Environment (IE), a development from the DNER - Distributed National Electronic Resource, http://www.jisc.ac.uk/dner/development/IEstrategy.html) is intended to help users in the UK academic sector maximize the value of electronic information resources by developing a coherent environment out of the confusing array of systems and services currently available.
The EDNER Project (Formative Evaluation of the DNER, http://www.cerlim.ac.uk/edner) is funded to undertake evaluation of the developing IE over the full three years of the JISC 5/99 Learning & Teaching and Infrastructure Programme, that is, from 2000 to 2003. The EDNER Project is led by the Centre for Research in Library & Information Management (CERLIM) at the Manchester Metropolitan University with the Centre for Studies in Advanced Learning Technologies (CSALT) at Lancaster University as a partner. This paper reports on work in progress and some of the initial findings of the evaluation team.
Most previous evaluation activity funded by the JISC has either been internal or summative in nature, such as the JUBILEE and JUSTEIS work reported later in this section, and as such the funding of the EDNER project as a formative evaluation over the full three years of the 5/99 Learning and Teaching and Infrastructure Programme is unusual. The EDNER project's evaluation is intended to work with the programme to analyse its ongoing development in addition to its outcomes and to feed back to the programme development team observations and recommendations that enable the programme to be steered whilst it is ongoing. As a result EDNER operates at programme level and is not concerned with evaluation of individual projects, although information from the projects provides vital data for EDNER, particularly in relation to user evaluations of IE services. An issue of formative evaluation is that it may seem to be overly critical of development activity. This is because it is seeking to identify areas where advances or improvements can be made and therefore it would be inappropriate to name any of the services evaluated during the user testing in public documents. Most of the services being evaluated have developed further since this research was undertaken and results reported. Instead detailed reports are submitted primarily to the JISC Development Office which in turn is informing development of the projects and programme.
Evaluation research traditionally makes use of a variety of research methods (Robson, 2002) and the EDNER project has implemented many different research strategies, including:
To this end results of the formative evaluation are continually fed back into the work of those involved with the development of the IE (including IE development projects, government agencies, publishers and commercial hardware/software developers) and members of the Higher Education community who are using it to help with learning and teaching (notably teachers and resource managers).
Other evaluation work recently undertaken for the JISC includes the JUBILEE and JUSTEIS projects. A Joint Information Systems Committee Circular (1999) sought to develop a framework that would complement the work already undertaken by JISC through its Technical Advisory Unit (TAU) and Monitoring and Advisory Unit (MAU). The framework specifically focuses on the development of a longitudinal profile of the use of electronic information services and the development of an understanding of the triggers of and barriers to use (Rowley, 2001). The JUSTEIS project (JISC Usage Survey Trends: Trends in Electronic Information Service) was contracted to survey trends in electronic information service usage and the JUBILEE project (JISC User Behaviour in Information Seeking: Longitudinal Evaluation of EIS) to undertake a longitudinal study of electronic information service use.
JUBILEE and JUSTEIS found that undergraduate students mainly use electronic information systems for academic purposes connected to assessment, although some leisure use was reported, and use of search engines predominated over all other type of electronic information systems. Research postgraduates' pattern of use differed from that of taught postgraduates with some of the postgraduates using JISC-negotiated services and specialist electronic information systems more than undergraduates. Use of electronic journals by both academic staff and postgraduates was relatively infrequent. Patterns of use of electronic information systems varied among subject disciplines and academic staff were found to exert a greater influence over undergraduate and postgraduate use of electronic information systems than library staff. In addition, friends, colleagues and fellow students were also influential. Different models of information skills provision and support were found in the different institutions and different disciplines participating in these studies, and Rowley and her colleagues suggest that patterns of use of electronic information systems become habitual.
Evaluation of comparative systems has a long tradition of improving the state of the art in information retrieval (IR) technology. The criterion for the evaluation of performance effectiveness has largely been based on the overall goal of a retrieval system, that is, the retrieval of relevant documents and suppression of non-relevant items. Such evaluations adopt the Cranfield experimental model based on relevance, a value judgement on retrieved items, to calculate recall and precision. These dual measures are then presented together where recall is a measure of effectiveness in retrieving all the sought information and precision assesses the accuracy of the search.
Many and varied criticisms and concerns have been levelled at the validity and reliability of a Cranfield approach to IR evaluation (e.g., Ellis, 1984). The core concern centres on the compromise necessary in the definition of relevance for such experimentation. That is, it is necessary to assume that relevance judgements can be made independently and that a user will assess each document without being affected by their understanding of any previous items read. In addition, there is a basic assumption that relevance can ignore the many situational and psychological variables that in the real world affect it (Large et al., 1999). Despite such concerns this approach to IR evaluation has become the traditional approach of retrieval testing, embodied more recently by TREC (Text REtrieval Conferences.
However, the appropriateness of this traditional model is also being questioned when used in the Internet environment, the major limitation being that Internet search engines and the Internet do not provide for a controlled environment. As a result many studies in this area use the precision measure only (Leighton and Srivastava, 1999; Clarke and Willett, 1997). Further difficulty is observed in the lack of standardisation of the criteria for relevance judgements (Tomaiuuolo and Packer, 1996; Chu and Rosenthal, 1996). Furthermore, there is a requirement that queries are kept constant across search engines, which results in test queries of the most basic form that may not reflect real use.
Jansen et al.. (2000: 208) posit other concerns, primarily that whilst 'Internet search engines are based on IR principles, Internet searching is very different from IR searching as traditionally practised and researched in online databases, CD-ROMs and OPACs'. They base their argument on findings of transaction logs of Excite users where they reported that there was a low use of advanced searching techniques and users frequently do not browse results much beyond the first page. In addition, they found that the mean number of queries per user was 2.8 with a number (not specified) of users going on to modify their original query and view subsequent results. The actual queries themselves were short in comparison to searches on regular IR systems, on average a query contained only 2.21 terms. Further to this Jansen (2000) ran analyses that compared query results with use of advanced techniques, on the one hand, to results without on the other, and found that on average only 2.7 new results were retrieved. From this he posits, 'use of complex queries is not worth the trouble. Based on their conduct, it appears that most Web searchers do not think it worth the trouble either'. He also points out that the behaviour of Web searchers follows the principle of least effort (Zipf, 1949). This has also been recorded by Marchionini (1992: 156) who stated, "humans will seek the path of least cognitive resistance."
Further discussions on the study of student searching behaviour can be seen in Griffiths and Brophy (2002).
The aim of the EDNER user testing was twofold: 1) to develop an understanding of users' searching behaviour when looking for information to satisfy an academic query and, 2) to establish student perceptions of the IE by asking them to assess the quality of IE services according to a range of defined criteria (Quality Attributes, see section 3.1). This was achieved by undertaking two searching days, the first to assess how students locate information (results of which are reported by Griffiths and Brophy, 2002) and the second to identify user evaluations of IE services (reported here).
Test searches were designed (one for each of the fifteen services to be used by the participants) so that they would be of sufficient complexity to challenge the user without being impossible for them to answer. Participants were recruited via Manchester Metropolitan University's Student Union Job Shop and twenty-seven students from a wide course range participated. Each student was paid for his or her participation. One third of the sample consisted of students from the Department of Information and Communications studying for an Information and Library Management degree, while the remaining two thirds of the sample were studying a wide variety of other subjects. The students were at various stages of their courses. No restrictions were placed on them having computer, searching or Internet experience. Testing was conducted in a controlled environment based within the Department of Information and Communications.
Each participant searched fifteen academic electronic information services, thirteen of which were services that formed part of the Information Environment and as such were partially or wholly funded by JISC. Two additional services from commercial sectors were included, one of which was a search engine. Participants were asked to search for information to satisfy the query for each individual service and to comment upon their searching experience via a post search questionnaire, for example, their satisfaction regarding the relevance of the documents retrieved and satisfaction with the currency of information retrieved. Data gathered via the questionnaires was analysed in two ways, 1) quantitative data was analysed using SPSS (Statistical Package for the Social Sciences), and 2) open response question data were analysed using qualitative techniques.
It should be stressed that this study focused entirely on user-centred evaluation. EDNER is also concerned with expert evaluation, but this aspect of the work will be reported elsewhere.
Garvin's quality attributes have been modified and applied to information services by Brophy (1998). Garvin (1987) originally identified eight attributes that can be used to evaluate the quality of services, and with some changes of emphasis, one significant change of concept and the introduction of two additional attributes (Currency and Usability) they apply well to information and library services. The following describes the quality attributes as further modified for the context of evaluation of Information Environment services.
Performance is concerned with establishing confirmation that a service meets its most basic requirement. These are the primary operating features of the product or service. In Garvin's original formulation, 'performance' related to measurable aspects of a product such as the fuel consumption of a car. However, the concept is fundamentally concerned with identifying basic aspects of a product or service, which users will expect to be present and on which they will make an immediate judgement. One way of thinking about this (following Kano, 2003) is to consider which aspects are capable of causing immediate dissatisfaction but whose presence is almost taken for granted. Sometimes these will be variables, sometimes absolutes. A good example might be the ability of an electronic information service to retrieve a set of documents that matched a user's query: this is fundamental and without it the service is unlikely to be considered further by users. The most basic quality question is then 'Does the service retrieve a list of relevant documents?'. In this study the performance attribute was measured using the criteria 'Are you satisfied that the required information was retrieved' and 'Are you satisfied with the ranking order of retrieved items?' and is primarily concerned with eliciting information about the user's relevance assessment of the items retrieved.
With Conformance the question is whether the product or service meets the agreed standard. This may be a national or international standard or a locally-determined service standard. The standards themselves, however they are devised, must of course relate to customer requirements. For information services there are obvious conformance questions around the utilisation of standards and protocols such as XML, RDF, Dublin Core, OAI, Z39.50 etc. Many conformance questions can only be answered by expert analysts since users are unlikely to have either the expertise or the access needed to make technical or service-wide assessments, as such users of this study did not evaluate this attribute.
Features are the secondary operating attributes, which add to a product or service in the user's eyes but are not essential to it, although they may provide an essential marketing edge. It is not always easy to distinguish performance characteristics from features, especially as what is essential to one customer may be an optional extra to another, and there is a tendency for features to become performance attributes over time; inclusion of images into full text databases are an example of a feature developing in this way. The attribute was measured by asking participants which features appealed to them most on each individual service and by identifying which search option/s they used to perform their searches. The qualitative data collected regarding features rated most highly by participants will be summarised in the Results section.
Users place high value on the Reliability of a product or service. For products this usually means that they perform as expected (or better). For electronic information services a major issue is usually availability of the service. Therefore, broken links, unreliability and slow speed response can have a detrimental affect on a user's perception of a service. Users were asked if they found any dead links whilst searching each service and, if so, whether these dead links impacted on their judgement of the service. Participants were also asked if they were satisfied with the speed of response of the service, a measure which has previously been reported as being important to users by Ding and Marchionini (1997) who stated that 'response time is becoming a very important issue for many users'.
Garvin uses the term Durability, defined as 'the amount of use the product will provide before it deteriorates to the point where replacement or discard is preferable to repair'. In the case of electronic information services this will relate to the sustainability of the service over a period of time. In simple terms, will the service still be in existence in three or five years? This is more likely to be assessed by experts in the field than by end users (although they may have useful contributions on the assessment of the attribute based on comparisons with similar services), and as such was not evaluated during this testing.
For most users of electronic information services an important issue is the Currency of information, that is, how up to date the information provided is when it is retrieved.
Serviceability relates to when things go wrong and is concerned with questions such as 'How easy will it then be to put things right', 'How quickly can they be repaired?', 'How much inconvenience will be caused to the user, and how much cost?' For users of an electronic information service this may translate to the level of help available to them during the search and at the point of need. The availability of instructions and prompts throughout, context-sensitive help and usefulness of help were measured in order to assess responses to this attribute.
Whilst Aesthetics and Image is a highly subjective area, it is of prime importance to users. In electronic environments it brings in the whole debate about what constitutes good design. In a Web environment the design of the home page may be the basis for user selection of services and this may have little to do with actual functionality. A range of criteria were used to measure user responses to this attribute, these being satisfaction with the interface and presentation of features, familiarity with the interface or elements of the interface, and how easy was it to understand what retrieved items were about from the hit list.
Perceived Quality is one of the most interesting of attributes because it recognises that all users make their judgments on incomplete information. They do not carry out detailed surveys of hit rates or examine the rival systems' performance in retrieving a systematic sample of records. Most users do not read the service's mission statement or service standards and do their best to bypass the instructions pages. Yet users will quickly come to a judgement about the service based on the reputation of the service among their colleagues and acquaintances, their preconceptions and their instant reactions to it. Perceived Quality in this study related to the user's view of the service as a whole and the information retrieved from it. This was measured twice, before using the service during the test (pre-perceived quality, where participants were aware of the service prior to testing) and after using the service (post-perceived quality).
The addition of Usability as an attribute is important in any user-centred evaluation. User-centred models are much more helpful when personal preferences and requirements are factored in and as such participants were asked how user friendly the service was, how easy it was to remember what the features/commands meant and how to use them, how satisfied they were with the input query facility and how satisfied they were with how to modify their query.
Figures 1, 2 and 3 represent results of three of the services evaluated during testing, showing the responses for all of the quality attributes measured during this research. Pre- and post-testing results are also given for Perceived Quality (user's perception of the quality of the service and the information retrieved).
The final measure presented here is Overall Assessment. On completion of each task, participants were asked to give an overall rating indicating their satisfaction for each service, taking into account the results retrieved, ease of retrieval, satisfaction with the interface, number of errors made during searching, response time etc. The use of a single overall assessment measure against which other measures can be compared or correlated has been used in other studies in this field (e.g. Su 1992, Johnson et al. 2001, 2003).
All criteria were measured on a Likert type scale apart from those criteria to which a 'Yes' or 'No' answer was recorded. All responses were analysed using SPSS and are presented here as percentages.
Service A shows very high levels of satisfaction (92%) that the required information was retrieved and high levels of satisfaction with the ranking of information retrieved (76%). All participants found information through clicking on links as opposed to searching via a simple or advanced option. Few dead links (20% of respondents) were found and none of the participants viewed the service as unreliable. Very high levels of satisfaction with speed of response were recorded (96%). Users were largely satisfied with the currency of the information (64%).
The majority of participants were satisfied that instructions and prompts provided were helpful (68%) and none used the Help facility. A large number of participants were satisfied with the interface (80%), despite half of them (52%) not being familiar with the interface and most were able to understand items from the hit list retrieved (72%). In addition, 84% felt that the interface was user friendly and 96% responded that it was easy to remember what features meant and how to use them. Lower levels of satisfaction were recorded on facilities to input and modify queries (50% and 17%), which is hardly surprising given that all participants used the click-on-link facility to navigate to information rather than a simple or advanced search.
Of the participants who were aware of the service prior to testing 67% perceived the quality of the service and the information contained within it as being of high quality. Following use this rating had risen to 87%. In addition, with regard to their overall assessment of the service and the information they retrieved 88% of students reported that they were satisfied.
In the case of Service B levels of satisfaction that the required information was retrieved were high at 80% but satisfaction with ranking was much lower, than for Service A, at 46%. The majority of users navigated to information by clicking on links (76%), with only 12% using the simple search option and 12% using a combination of techniques. A small number of participants (12%) found dead links and of these 4% felt the service was unreliable as a result. Satisfaction with speed of response was very high at 96%. Low levels of satisfaction were recorded on the Currency attribute (35%).
Instructions and prompts were found to be helpful by 35% of participants and the Help facility was not used in any of the cases. Sixty-nine percent reported that they were familiar with the interface, or elements of the interface, 62% of participants were satisfied with it. 65% felt that the hit list was understandable. On the Usability attribute 65% of students felt that the service was user friendly and 77% found features and commands easy to remember and use. Fifty percent of participants reported that they were satisfied with the facility to input their query and only 31% were satisfied with the facility to modify their query. As with Service A the preferred method of obtaining information was by navigating via click on link as opposed to engaging with a search option.
Perceived Quality, pre- and post-searching, remained static at 69% and overall satisfaction was recorded at 58%.
Service C recorded very high levels of satisfaction on the Performance attribute, with 100% of participants expressing satisfaction that the required information was retrieved and 72% reporting satisfaction with ranking order. The majority of participants (48%) used a simple search with very few participants using advanced search (4%). Some participants navigated to information by clicking on links (28%) and 20% used a combination of techniques. No dead links were found and none of the participants felt that the service was unreliable. All participants were satisfied with the speed of response and 78% were satisfied that the information was current.
Almost 69% of participants felt that instructions and prompts were helpful. The Help facility was used in very few cases (4%), but where it was it was found to be of little use. Very high levels of satisfaction were recorded on both the Aesthetics and Usability attributes, with particularly high responses to satisfaction with the interface (96%) and user friendliness (96%). Sixty-nine percent responded that they were familiar with the interface, or elements of it and 77% felt that the hit list was understandable. Eighty-nine percent of the students found the features of the interface easy to remember and 80% were satisfied with the facility to input their query. The facility to modify queries obviously posed some problems to participants, with only 39% satisfied.
Students' satisfaction pre-searching was recorded at 50%, this rose dramatically to 96% post-searching, and overall satisfaction was also reported at 96%.
Table 1 summarises the results for the three Services and highlights those attributes for which participants record a particularly high satisfaction rating of 80% or more, which is indicated by bold-face numbers..
|Attribute||Service A||Service B||Service C|
|Required information retrieved
Satisfaction with ranking
|Preferred search option
|Dead links found
Dead links = unreliable
Satisfaction with speed of response
|Instructions and prompts helpful
|Satisfaction with interface
Familiar with interface
Hit list understandable
Easy to remember features
Satisfaction - inputting query
Satisfaction - modifying query
This feature received the highest number of comments from participants indicating the importance of this attribute to the user. Responses were divided between comments about the search option(s) available and features which users particularly liked and added value to their experience of the system. Features liked by users included 3D models of settlements, guided tours around geographical areas, animated gifs, interactive maps, moving image gateway, case studies, e-mailing records to ones self, map search, customising features useful especially current awareness and conference listings and streaming audio. It will be interesting to see how many of these Features become Performance attributes over time.
One of the main aims of the Information Environment is to provide a managed quality resource for staff and students in higher and further education. During discussions with various stakeholders involved with the development of the system it became clear that common definitions of what is meant by quality electronic resources could not be assumed. Therefore, during testing, participants were asked to indicate what quality meant to them in terms of information available through electronic services, but they were not asked to relate their responses to any one particular service. Four criteria were presented to them, with which they could either agree or disagree. They were also asked to add any additional criteria that were important to them. Table 2 presents their responses.
Additional criteria listed were: 1) links to related areas; 2) understanding language used; 3) resources relevant; 4) speed of response; 5) resources useful; 6) resources valuable; 7) clear information; 8) source; 9) accessible; 10) timeliness; 11) presentation and, 12) references.
These results indicate that participants are confused about the meaning of quality when it comes to assessing academic resources. Viewed in the light of the findings of Cmor and Lippold (2001), who stated that students will give the same academic weight to discussion list comments as peer-reviewed journal articles, it would seem that students are poor evaluators of the quality of academic online resources. The original premise of the Perceived Quality attribute is that users make their judgments about a service on incomplete information and that they will come to this judgment based on its reputation among their colleagues and acquaintances and their preconceptions and instant reactions to it. If the notion of quality conveys so many different meanings to students it poses something of a challenge to the academic community in encouraging students to understand and use quality-assured electronic resources. It is also apparent that, from a methodological perspective, further work is needed to explore the meaning of Perceived Quality and the interpretation of user responses to this area of enquiry. Fundamentally different understandings of information quality could otherwise lead to questionable conclusions being drawn by researchers and service providers.
In a previous article reporting more results of this study Griffiths and Brophy (2002) stated that students either have little awareness of alternative ways of finding information to the search engine route or have tried other methods and still prefer to use Google (a situation we now refer to as the Googling phenomenon). Further to this, even when students are able to locate information it is not always easy for them to do so (even when using Google), and, with a third of participants failing to find information, user awareness, training and education needs to be improved. Further work needs to be done to equip students with the awareness and skills to use a much wider range of academic information resources and services.
The services evaluated in this study form part of the developing Information Environment and as such consist mainly of electronic databases containing information on a variety of academic subjects, all of which are directly aimed at staff and students in higher and further education. The responses of the students to the services under consideration varied a great deal but through the use of the Quality Attributes approach it is possible to begin to identify criteria that enhance or hinder students' searching experiences.
Thus, it can be seen that users' perceptions of quality are driven by factors other than just the performance of a system. Looking at the results from Service C, users expressed an increase in post-search Perceived Quality from 50% to 96%, a dramatic increase in satisfaction level. This may be due to the high levels of satisfaction across all of the Attributes, particularly Performance, Reliability, Aesthetics and Usability and resulted in an Overall Assessment of satisfaction of 96%.
In the case of Service A, users expressed an increase in post-search Perceived Quality, from 67% to 87%. The Performance, Reliability, Aesthetics and Usability Attributes all scored very highly and Overall Assessment scored at 88%.
In the case of Service B, Perceived Quality pre- and post-searching remained static at 69%, despite high levels of satisfaction recorded on the Performance and Reliability Attributes. The Overall Assessment of satisfaction level was also comparatively low for Service B at 58%.
It is also interesting to note that satisfaction ratings of instructions and prompts (Serviceability Attribute) on Services A and C were comparatively high (68% and 69% respectively), but only 35% of participants were satisfied with the instructions and prompts on Service B. The availability and usefulness of instructions and prompts has been found to influence the success of searching (Griffiths, 1996) and the low level of satisfaction with this attribute, coupled with the reluctance to use Help, may have had an impact of participants' satisfaction with this service.
The students in this study do not see Performance as the prime factor in their evaluations of the services examined in this research. The results reported above, coupled with the many qualitative comments made about the Feature, Usability and Aesthetics Attributes confirm that other criteria matter at least as much as this more traditional measure of system effectiveness. They particularly valued good visuals, clean uncluttered screens and interesting extra features, such as 3D maps and interactive images. They demonstrated a preoccupation with search engines, preferring these as search tools even when using academic electronic resources. They also frequently commented on search input boxes, even when they found the information by navigating to it by clicking on links. There were many comments made about speed of response for example, 'quick', 'fast', 'straight to info'. This is an interesting criterion that has been shown to be important by other studies (Johnson et al., 2001, 2003). Users seem to be adopting a 'get in, get information and get out' approach. This will, however, be different for those services offering tutorial instruction.
Critically, students are confused about the concept of quality, with a variety of criteria being put forward by the users participating in this study. It would be interesting to see if differences in subjects studied impacts on these differing definitions, and whether different subject disciplines have alternative quality requirements and standards.
The usefulness of Quality Attributes as evaluation criteria is that they allow investigation of how a user's perception of a service changes pre- and post-searching, and show that whilst preconceived notions of a service can be negative it is possible to change these perceptions if the service performs well across a number of the Attributes. If pre-search perceptions do not alter it is possible to identify which aspects of a service need to be improved by examining those Attributes that users have scored lower. In addition, because these results demonstrate that measures other than Performance play an important role in student evaluation it is vital that service developers, providers and educators understand that a range of Attributes affects service evaluation. This will be essential if we are to develop electronic resources that become truly embedded into the core of higher and further education. From a methodological standpoint, while the use of a technique to 'unpack' the concept of quality has proved useful, it is apparent that further work is needed to relate users' own definitions of quality in the context of information objects and information services to the modified Quality Attributes set.
|Find other papers on this subject.|
Griffiths, J.R. (2003) "Evaluation of the JISC Information Environment: student perceptions of services" Information Research, 8(4), paper no. 160 [Available at: http://informationr.net/ir/8-4/paper160.html]
© the author, 2003.