published quarterly by the university of borås, sweden

vol. 22 no. 1, March, 2017

Proceedings of the Ninth International Conference on Conceptions of Library and Information Science, Uppsala, Sweden, June 27-29, 2016

Mediation machines: how principles from traditional knowledge organization have evolved into digital mediation systems

Kim Tallerås and Nils Pharo

Introduction. We discuss digital information systems’ ability to mediate cultural resources. Mediation techniques embedded in search and recommendation systems are compared with those activities developed for mediating culture heritage in libraries, archives and museums, or so-called LAM-institutions.
Method. Digital mediation systems are examined in light of theories and techniques from knowledge organization, exemplified with implementations of such theories and techniques in public libraries.
Analysis. Our analysis sheds light on similarities between the digital mediation in recommendation systems and libraries’ mediation of culture, but also reveals some important differences.
Results. We find that the digital mediation systems follow many principles and techniques of traditional knowledge organization such as those related to classification and metadata. Further they mimic the librarian who knows her users, knowledge organization systems and collection. An important challenge is the mechanical rationality embedded in the computation of recommendations, which may limit the exposure of material of interest to the user that the system finds irrelevant.
Conclusion. Digital mediation systems have implemented traditional theories and techniques of knowledge organization, and they can be interpreted as “mediators” in a LAM context. However, their mechanical approach to information behaviour risk to be inconsistently adaptive to users emotional needs and to not facilitate serendipitous discoveries very well.

Mediation is an essential task in all kinds of libraries, archives and museums, so-called LAM-institutions, holding a collection of resources to be made available to an audience. In library schools librarians are trained to become skilled intermediaries, being able to analyse information or recreational needs, and to connect those needs to relevant resources. Meanwhile, Google claims that their “mission is to organize the world’s information and make it universally accessible and useful.” Their vast suite of search systems, databases and other services have helped them come far in realizing this goal. Does this mean that Google is a mediator, performing the same kind of mediation as LAM-institutions? Although the users’ interaction with Google is “faceless” and based on algorithms, these algorithms use knowledge about human information behavior and needs as a starting point. At the same time companies like Amazon and Netflix use sophisticated algorithms to create tailored recommendations of books and movies for their customers. They can thus be considered as digital intermediaries, in a communication of filtered and targeted information, from machines to users.

However, systems for “faceless” mediation are not a new phenomenon, neither in LAM-institutions. Different systems and techniques have been used to guide users to the relevant documents – from analogue list of books in antiquity and card files in the twentieth century to contemporary online catalogues. This is the tradition of knowledge organization which the digital systems mentioned above partly build upon. In this paper, we aim to discuss knowledge organizational aspects of digital mediation of cultural products. We will start by showing how analogue principles of knowledge organization are implemented in physical libraries. Thereafter we discuss the concept mediation in light of system-based processes. In the third part, we present digital retrieval and recommendation systems before we discuss some limitations of “mediation machines”.

In the paper, we take mediation of culture in the public library as a starting point and provide some initial examples from this domain of practice. The term “kulturformidling” is used in the Scandinavian library and information science (LIS) literature to denote what we have translated into “mediation of culture” (Grøn, 2010; Tveit, 2004). The concepts are under continuous debate and no English terms directly cover the activities described by the Scandinavian term. Central in common definitions is the intermediary as a person helping users to find books or other culture products that satisfy their informational or recreational needs.

Mediation in libraries

In a traditional public library, fiction is organized alphabetically on the shelves according to the authors’ last name. Fiction in Norwegian is placed separately from fiction in other languages. The scientific literature is organized according to subject using Dewey’s Decimal Classification (DDC). Thus books that the librarians have assessed as being about birds are placed under the DDC number 598, whereas “To kill a mockingbird” by Harper Lee is placed alphabetically on “Lee” in the section of English fiction.

The book on birds will be surrounded by other books on birds, animals and natural history whereas Lee’s books will all (or both) be placed together. These examples of providing order by shelving reflect how principles of mediation are embedded in the knowledge organization principles used in physical libraries. Such principles can be general, such as the alphabetical systems, or they may be based on well-established classification standards like the DDC.

In addition to offering direct access to the physical documents, libraries provide searchable access points to their collections via metadata, “data about data”, collected in catalogues. The catalogue has had many forms; organized chronologically in book form; as cardboard cards filed under titles, subjects and author names; and currently in digital form available via the Internet, facilitating access via all metadata recorded for the documents.

Within literary science different kinds of “meta” information is often referred to as “paratexts”. The Danish library and information scientist Jack Andersen (2002) discusses how metadata in the form of bibliographic records can be interpreted in light of Genette’s (1997) paratext concept and the consequences of such an interpretation on information retrieval and reading:

For instance, the initial relevance judgments happen when a user is confronted with the bibliographic record. What decisions a user makes as to its relevance are based on the paratextual elements present. That way the bibliographic record affects the reading activity of the user. (Andersen, 2002, p. 59)

The bibliographic record, continues Andersen, viewed as a text, does not provide mere “access” to the document, but is “rather, a matter of indicating what kind of intellectual content is to be expected” (Andersen, 2002, p. 59). Reading starts in the bibliographic record, which guides the document selection and subsequent the reading activity. Thus, metadata do not only facilitate documents retrieval, but represents an adjustment and an initial mediation of the documents’ content.

Cataloguing principles as formulated by Cutter (1876) and the International Federation of Library Associations and Institutions (2009) (IFLA) concretely express such adjustments, stating e.g. that the catalogue shall enable the user:

The principles are developed to ease library users’ access to the documents, e.g. letting the users choose books based on their format (point 4.3). In addition, the principles recommend concrete adaption of the content stating metadata should couple all resources that belong to a particular “work” (point 4.1.2).

Point 4.5 describes a type of navigation across documents, which was difficult in a card-based catalogue, but has become much easier with the documents being digitized and made available with the help of web technology. An experiment at Oslo public library illustrates how literature mediation has gained from digital technology. Metadata representing works by a selection of important Norwegian authors were analyzed in order to see how well they fit the functional requirements for bibliographic records, the so-called FRBR-model, specified by IFLA (International Federation of Library Associations and Institutions Study Group on the Functional Requirements for Bibliographic Records, 1998). A central part of this model, the so-called group 1-entities, represents different document “conditions”. A “work” represents the intellectual or artistic creation (e.g. Shakespeare’s Macbeth), an “expression” is the form the work takes when it is realized (e.g. the newest Norwegian translation of Macbeth), the “manifestation” is the physical embodiment of the expression (e.g. a pocket book edition), and an “item” being one single copy of the manifestation. The analyzed metadata, created following old cataloguing rules developed for the card catalogue, had a manifestation focus. Queries in the metadata resulted in very complex result lists from the OPAC (Online Public Access Catalog). A query on the author Knut Hamsun returned 585 hits, separately listing, e.g., all editions of the same works, all parts of collected works and all translations. In a physical shelf-based system, this way of knowledge organization was a necessity. With some restructuring of the data querying the same data set in an experimental system reduced the result list to 40 genuine Hamsun works (Westrum, Rekkavik, and Tallerås, 2012). The interface facilitated user navigation between different translations and editions of the works. Some studies have indicated that the FRBR-model reflects the users’ mental models of the bibliographic universe (e.g. Pisanski and Žumer, 2010).

After these experiments, Oslo public library has decided to transfer their bibliographic metadata to a system based on new standards and technologies since the currently used standards do not solve the needs for literature mediation in a large public library. The standards, the Anglo American Cataloguing Rules and the Machine-Readable Cataloging (MARC) were developed at the very beginning of digitization in the late 1960s. Both standards were, however, constrained by the leading knowledge organization technology: the card catalogue.

In addition to having technologically fallen behind, the knowledge organization systems have been criticized from philosophical and societal perspectives, including questions such as: How did classification systems end up with their particular classes? In what worldview have classification systems been developed? How do the classes in the system influence the use? Although DDC has been revised several times since it was first released, the basic classes developed by Melville Dewey in the late 19th century are still used.

Radford (2003) uses library classification as an example of what Michel Foucault’s calls “discursive formation”:

Consider the choices made by a cataloger when allocating books to a subject heading, a call number, and a particular place on the library shelf. How does the cataloger do this task? What is the nature of the preexisting subjects (discursive formations) to which a new book can be assigned a place? What are the rules by which a book is assigned to Philosophy and not to History or Language? (Radford, 2003, p. 4)

According to Radford the classification system, considered as an intermediary in a mediation situation, is based on rules with a discursive potential. Similarly, to librarians in a physical library the classification system conveys one out of several potential world-views. Such discursive formations not only characterize analogue systems for knowledge organization, but all kinds of mediation systems based on rules and principles of categorization. Systems developed in a digital context included.

“There is no shelf”

In a digital library there are no shelves. The straightjacket requiring that a book physically can only stand in one place is off. Files that contain texts, images, sound and video are retrieved directly or via metadata describing them. A user searching for the author Neil Gaiman will also include works Gaiman co-created with other artists, e.g. Terry Pratchett, as well as documents mentioning Gaiman. Books about Norway and World War II are found when both terms are combined in a query. Some systems will know that this is an interest area of yours and will “mediate” it as a result of the simple query “Norway”. The limitations of shelving is replaced by an infinite number of orders of succession.

In principle there are no limitations on what kind of (meta)data that can be used to retrieve documents and information. A user may be interested in audio books in Swahili recorded with female voices, and if the information system stores and indexes metadata representing such characteristics, it will be simple to retrieve matching documents. The same user may also be interested in books that are liked by students in sociology. This is another type of information that systems have started to collect and which can be used in a mediation process. Automatic indexing of whole documents can make all terms in a document potential retrieval endpoints. This has an enormous potential for retrieval, but at the same time raise a lot of challenges for literature mediation. Will a user be interested in being presented all texts containing the term “Gaiman”?

The flexibility and new possibilities offered by digitization are overwhelming. To retrieve and mediate digital collections a whole new set of techniques have been developed that partly build upon the analogue techniques described above and partly are based on analysis of context and user preferences.

Quality assessment in knowledge organization and information retrieval

The purpose of systems for knowledge organization and information retrieval (IR) is to secure that their users find “documents” (including books, images, music, video, archival records and other media used for representing ideas and knowledge) that may help them solve a task, satisfy an information need or satisfy a need for recreation.

In order to evaluate how good IR systems work, the measures recall and precision are commonly used (Baeza-Yates, 2011). Recall is defined as the number of relevant documents in a retrieved set of documents divided by the number of relevant documents in the collection. Precision is the number of relevant documents in the retrieved set divided by the total number of retrieved documents. Typically, precision and recall measures are used in experimental evaluation processes following the procedure of the so-called Cranfield experiments. The goal of these experiments was to measure the efficiency of indexing systems (Sparck Jones and van Rijsbergen, 1976). Originally, the indexing systems were different types of classification systems or other manual systems. Today the same evaluation model is used for measuring the efficiency of algorithms used for search engines.

Criticism raised against the Cranfield model is based on it having as point of departure an “objective” assessment of relevance. The critics claim that relevance is individual and context dependent, making it a “fluent” measure of quality which changes over time. Defendants of the Cranfield model, on the other hand, claims that it is a good tool for securing consistent comparison of different systems since they are compared under equal conditions with controlled variables.

In the late 1960s a counter movement to the system oriented paradigm that Cranfield represented emerged. American pioneer of information science, Robert S. Taylor (1968) pointed out that the information seeker does not necessarily choose optimal strategies when trying to solve his or her needs. This is an important condition for the recall/precision based evaluation methods. Taylor refers to an empirical study conducted by Victor Rosenberg (1966) to support the claim that “’ease of access’ to an information system is more significant than ‘amount or quality of information’ retrievable” (Taylor, 1968, p. 181). In other words, it is not necessary for the information seeker to invest lots of time and effort to find the “perfect document” as long as she is able to find “good enough” answers. Some years later Nick Belkin developed a “cognitive viewpoint” pointing out it being unreasonable to equalize information needs with document content:

“[t]he assumption that expression of information need and document text are functionally equivalent also seems unlikely, except in the special case in which the user is able to specify that which is needed as a coherent or defined information structure. A document, after all, is supposed to be a statement of what its author knows about a topic, and is thus assumed to be a coherent statement of a particular state of knowledge. The expression of an information need, on the other hand, is in general a statement of what the user does not know” (Oddy, Belkin, and Brooks, 1982, p. 64)

Marcia Bates was, with her “berrypicking” model (Bates, 1989), among the first to develop an alternative model of user-system interaction. In her model she emphasizes that several types of search behavior may satisfy the user’s information needs. Not all of these can be evaluated using precision and recall. A user may, e.g., browse different potential sources, pick a little bit of information from each source, look at reference lists, and get some ideas from colleagues while continuously reformulating her information need, dependent on what is found. The “berrypicking” metaphor is based on such shifts between “berry patches”. It is a good model to explain how users construct their information needs through iterative processes.

When evaluating how well a system is for mediating cultural resources, user-centered approaches such as Bates’ model, are invaluable. As we shall see, technological development has made it possible to develop more sophisticated IR systems that take into account user models. One example of such systems is recommendation systems. We shall discuss how recommendation systems build upon user knowledge and problematize the challenges of such systems.

Recommendation systems

In full-text IR systems, such as Google, the distribution of terms in the documents has played the most important part in indexing algorithms. The term frequency–inverse document frequency (tf-idf) weight (Sparck Jones, 1972) was developed to reflect how important a term was for representing a document. Other components have been included in the retrieval algorithms, but most of these have been document centric. This also includes Google’s PageRank (Brin and Page, 1998), which was inspired by citation networks and, put very simply, gives weight to web pages depending on their number of ingoing links. Relevance feedback (Salton and Buckley, 1990) represents an attempt at implementing user preferences in the retrieval process. Users assess the relevance, explicitly or implicitly, of the retrieved documents and the system uses content in the relevant documents to retrieve documents similar to these. Pseudo-relevance feedback (Efthimiadis, 1996) is particularly interesting, since it is based on the assumption that the highest ranked documents in the retrieved list are relevant and thus the system automatically uses these documents’ content to retrieve the final result set. Web based IR systems also often use “cookies” to collect data about the user in order to build profiles to tailor and personalize query results.

Recommendation systems try to predict what documents the users are interested in. Companies like Amazon and Netflix have been in the forefront in developing recommendation techniques, but such techniques are also used by non-commercial services. At Oslo public library, the service Aktive hyller (Active shelves) use elements based on this technology when it recommends related books based on a patron’s current selection. The service collects rating data from three different sources (Goodreads, Bokelskere, and NoveList) and suggest books that are assessed as similar in topic and genre. The concept “recommendation system” is used to describe everything from simple top 10-lists based on general consumption frequency (“the 10 most read news articles”) to personalized recommendations based on complex forms of social profiling and network analysis.

Basically there are two types of recommendation systems, content-based filtering and collaborative filtering systems (Ricci, Rokach, and Shapira, 2011). Content-based filtering systems base their recommendations on comparing the characteristics of the documents’ content, e.g., their genre, topicality and format. A user that previously has liked crime books in audio format where the action is located in Oslo will probably be interested in other books with the same characteristics. Content-based filtering may be based on traditional knowledge organization techniques such as cataloguing and classification. The content must be described in a precise, consistent and exhaustive way to facilitate the best possible filtering. These description may be created by experts, but in some cases users will be co-creators of the metadata, e.g. when adding un-controlled keyword (‘tags’). Oslo public library’s experiments with “FRBRizing” their collection, which we described above, makes it possible to accumulate recommendations on the work level and reduce the so-called cold-start problem (Schein, Popescul, Ungar, and Pennock, 2002) which is caused by having to few recommendations per item.

Collaborative filtering uses characteristics of the user and the user’s digital “neighborhood” with other users. Data used in collaborative filtering systems can be “self-exposure” in the form of purchases, library loans, ratings, reviews, wish lists and other forms of assessments. In addition systems may register demographic data, such as age and gender, and implicit data from systems logs that register clicks, navigation patterns and consumption techniques (e.g. from e-book readers or streaming services). These data are used for user profiling and the idea is that users that have similar profiles have an overlapping taste in books, music, films etc.

Often the different techniques are combined in hybrid recommendation systems. User A, who has a certain profile, may give a book with specific formal and literary characteristics a high rating. In this way, the book increases its coupling with other books with similar characteristics that other users, with profiles that are similar to User A, have rated highly. The combined approach is another way of reducing cold-start problems for new items.

Recommendation systems may be evaluated using techniques similar to those used in experimental IR, i.e. the Cranfield model. When Netflix organized a competition to improve their recommendation system the goal was to improve the accuracy of their own “Cinematch” algorithm with more than 10 %. Suggested algorithms used a training set of Netflix data consisting of 100 million user ratings given by 480 000 users on 17 700 films. In the competition data set (2.8 million films) ratings were removed and the goal was to use the training set to recreate or predict the ratings of the films in the competition set. This is parallel to the Cranfield-test collection method where queries are matched against relevance-assessed documents. The “best” algorithms are those best at retrieving the documents assessed relevant for the queries (Sparck Jones and van Rijsbergen, 1976). There are, however, also attempts at involving users more directly in evaluating recommendation systems (Shani and Gunawardana, 2011).

Digital mediation of literature with recommendation systems is a promising idea. In many ways, the recommendation systems mimic the librarian who knows her user and collection. Interviewing the user and knowing their book borrowing history and how other patron have used the library, the librarian come up with suggestions. The automatization of such processes and the digital library’s lack of shelves raise some issues. Of these, the meeting between the rationalizations embedded in computer algorithms and the users’ various forms of needs, is among the most important. Related to this we will also discuss how recommendation systems probably decrease the chances of serendipity.

As mentioned above, users often may be satisfied with answers that are good enough when searching for information. Denise Agosto (2002) discussed the concept “satisficing”, originally coined by Herbert Simon, and how it relates to information searching and Web-based decision making. Of particular interest is Agosto’s reference to Kuhlthau’s (1991) work on the “information seeking process” and how this is not purely a cognitive process, but also has an affective dimension. Nick Belkin points out that “there has been almost no serious research effort in understanding the role of affect in the information seeking situation in general and the IR situation in particular, nor in IR system design.” (Belkin, 2008, pp. 50–51). IR algorithms are less capable of implementing emotional than cognitive aspects. In particular, this is evident when the algorithms are not only used for solving informational needs, but also to satisfy users’ needs for affection and recreation. We do not think that emotion retrieval (ER) will take place separately from IR systems, since users will also express their emotional needs with terms that can be matched with an IR algorithm. Thus, ER may be performed with IR algorithms in combination with other techniques. Probably it is possible to adjust weights that better take into account user preferences, e.g. in the form of “likes”, in recommendation systems. In his doctoral thesis Moshfeghi (2012) tested out “emotion information” in two collaborative filtering systems and found that they perform better when taking emotion features into account compared to when only rating information is considered. Considerable work is necessary to meet user’s affectional and recreational needs.

The “rationality” of IR systems and recommendation system also may affect serendipity, i.e. finding something by chance. In a physical library, the user is exposed to shelves of books, magazines, posters on the wall and many other “irrelevant” information sources that may influence him or her. Björneborn (2008) identified ten factors in the library that may be a source of serendipity, including “explorability” and “browsing”. Elaine Toms suggests four approaches to research in order to facilitate serendipity in IR:

  1. Role of chance or ‘blind luck’: implemented via a random information node generator.
  2. Pasteur principle („chance favours the prepared mind“): implemented via a user profile.
  3. Anomalies and exceptions: partially implemented via poor similarity measures.
  4. Reasoning by analogy: implementation is unknown at the moment. (Toms, 2000)

André, Schraefel, Teevan and Dumais (2009) points out that serendipity consists of two different aspects, the first being “its accidential nature and the delight and surprise of something unexpected (e.g., the synthesis of copper phthalocyanine)”, whereas the second is “the breakthrough or discovery made by drawing an unexpected connection – the sagacity (e.g, using copper phthalocyanine as dye)”. The focus of system designers, they claim, has been to try to facilitate the former whereas the latter has been ignored. Therefore they argue that a more holistic picture of serendipity and have several suggestion on paths to follow, including the support of domain expertise, creation of common language models and facilitation of networks.

Summary and conclusions

We have shown how mediation of culture has been embedded in knowledge organization systems since when they were analogue up until the rather sophisticated recommendation systems of today. In her book on mediation of literature (“litteraturformidling”) Åse Kristine Tveit states that to “index is to mediate” (Tveit, 2004, p. 17) (our translation). However, she draws a distinction between this kind of “technical” mediation and a more personal mediation, which requires a direct initiative from the intermediary. The distinction is seemingly in contrast with our description of analogue and digital systems for mediation. One could perhaps argue that knowledge organization represents a second-order mediation (inspired by the terminology of Weinberger (2007)), extending mere (first-order) accessibility of material with systematized metadata. However, interpreted as paratexts one could also argue that metadata facilitates direct (third-order) mediation, by guiding cultural consumption. Today, when search and recommendation systems have connected typical LAM-metadata to user data and “mined” them algorithmically, with customized recommendations as a result, they are definitively close to adapt the mediation performed by flesh-and-blood-librarians. Thus, we argue that such “mediation machines” do facilitate direct interaction between cultural products and their potential users, and that they can be interpreted as a mediator of culture in line with the modern practices of LAM-institutions.

This adaptation is not free of challenges. Modern information systems, or “mediation machines”, have the capability to accurately match users’ information need. Such systems, however, face the challenge of becoming too “rational” and not facilitate serendipitous discoveries. Although attempts have been made to address these problems, the ideas are mainly theoretical. Implementing emotion retrieval and serendipity-sensitive retrieval has proven to be difficult.

It should also be noted that the motivation of Google and other commercial vendors of digital mediation services differs a lot from the purposes served by LAM-institutions. The latter have specific social responsibilities, are often funded by public money and regulated by laws. This stand in contrast to the commercial business models of the former. The two types of services we have compared thus may have very different understanding of mediation as a concept, and further in the realization of mediation techniques. This would be an interesting topic for further investigation.

About the authors

Kim Tallerås is a research fellow at the Department of Archivistics, Library and Information Science at Oslo and Akershus University College of Applied Sciences. He can be contacted at: kim.talleras@hioa.no.
Nils Pharo is Professor at the Department of Archivistics, Library and Information Science at Oslo and Akershus University College of Applied Sciences and can be contacted at nils.pharo@hioa.no.


How to cite this paper

Tallerås, K. & Pharo, N. (2017). Mediation machines: how principles from traditional knowledge organization have evolved into digital mediation systems. Information Research, 22(1), CoLIS paper 1654. Retrieved from http://InformationR.net/ir/22-1/colis/colis1654.html (Archived by WebCite® at http://www.webcitation.org/6oTPowyn2)

Check for citations, using Google Scholar