header
published quarterly by the university of borås, sweden

vol. 22 no. 1, March, 2017



'Just Google it' – the scope of freely available information sources for doctoral thesis writing


Vincas Grigas, Simona Juzėnienė and Jonė Veličkaitė.


Introduction. Recent developments in the field of scientific information resource provision lead us to the key research question, namely,what is the coverage of freely available information sources when writing doctoral theses, and whether the academic library can assume the leading role as a direct intermediator for information users.
Method.Citation analysis of doctoral theses was conducted in the summer of 2015. A total of thirty-nine theses (with 6,998 references) defended at Vilnius University at the end of 2014 was selected (30 per cent of all defended theses). Theses were randomly chosen from different research fields: the humanities, social sciences, biomedical sciences, technological sciences, and physical sciences.
Analysis.The research team was tasked with identifying whether certain resources could be found in the eCatalogue of an academic library, its subscribed databases, freely available online (through Google or Google Scholar), or whether the resources from the library`s subscribed databases are identical to those which are freely available. The data gathering process included such resource categories as journal papers, printed and electronic books or book chapters, and other documents (legal reports, conference papers, newspaper articles, Websites, theses, etc.).
Conclusions. Library collections and subscribed databases could cover up to 80 per cent of all information resources used in doctoral theses. Among the most significant findings to emerge from this study is the fact that on average more than half (57 per cent) of all utilised information resources were freely available or were accessed without library support. We may presume that the library as a direct intermediator for information users is potentially important and irreplaceable only in four out of ten attempts of PhD students to seek information.

Introduction

The emergence of Web search engines changed the way scientific information is searched for (Ortega, 2014). Library-owned search engines, as well as database search engines, are no longer the first choice for information users in searching for scholarly literature (Cothran, 2011; Jamali and Asadi, 2010; Rowlands et al., 2008; Sapa and Krakowska, 2014). Google has the highest impact on Web searches as it is the most visited Website globally (Alexa, 2016), whereas Google Scholar indexes the highest amount of scholarly literature globally (Khabsa and Giles, 2014). Google and Google Scholar index full texts or metadata of all kinds of scholarly literature across an array of publishing formats.

Google Scholar is growing in size year by year. As of August 2010, Google Scholar could contain 86 million documents (Aguillo, 2012); in English only, as of January 2013, it contained 99.3 million documents (Khabsa and Giles, 2014); as of December 2013, 109.3 million documents (Ortega, 2014); and as of May 2014, 111.15 million documents (Orduna-Malea, Ayllón, Martín-Martín and Delgado López-Cózar, 2015).

Google and Google Scholar aid in finding two types of the most often used scholarly literature, for example, peer-reviewed journals and so-called grey literature. Grey literature covers documents which are not formally published by academic publishers, but can be important in systemic and evidence-based reviews. It includes various kinds of reports, working papers, white papers, evaluations, government documents, theses, conference proceedings, pre-prints, post-prints, newsletters, and laboratory research books. A full list of document types featured in grey literature is offered on the GreyNet International Website (GreyNet International, 2016). As a matter of fact, in most cases where grey literature is on the Web it is freely available. Grey literature plays an important role in the communication of scholarly information as it is available and accessible at a great scale owing to widespread scholarly social networks and institutional repositories.

The Web of Science database is limited in its ability to represent the full extent of grey literature because of its restricted scope, therefore Google Scholar and Google evidence the use of more up-to-date information available on the Web to a larger extent than citation in Web of Science will reveal (Hutton, 2009, p. 11). It has been detected that Google Scholar results contain moderate amounts of grey literature, with the majority of its instances presented on page eighty on average. It has also been ascertained that when searched for specifically, most of the literature identified using Web of Science could also be found using Google Scholar (Haddaway, Collins, Coughlin and Kirk, 2015).

There are arguments with regard to the quality of the pre-print versions which are freely available on the Web (Klein, Broadwell, Farb and Grappone, 2016). Comparison of the published scientific journal papers with their pre-print versions revealed that generally there were few changes in the content of the scientific papers as compared to their pre-print versions.

An important aspect of Google Scholar is that almost 50 per cent of its content is available off-campus (Khabsa and Giles, 2014). Another study revealed that out of sixty-four thousand highly cited documents in Google Scholar approximately 40 per cent of it can be accessed freely (Martín-Martín, Orduña-Malea, Ayllón, Delgado López-Cózar and López-Cózar, 2014). A recent study published in 2015 suggests that 61.1 per cent of full-text scientific papers found with Google Scholar were freely available off-campus (Laakso and Lindman, 2015). The latest research revealed that approximately 60 per cent of all published scientific papers were found to have an open-access copy available .

The number of journals offering unrestricted access to their content has also been growing. For instance, analysis of delayed open access journal papers exhibited that 77.8 per cent of these papers became open access within twelve months from publication, with 85.4 per cent becoming available within twenty-four months (Laakso and Björk, 2013). Björk, Laakso, Welling and Paetau (2014) established that a synthesis of previous studies indicated that green open access coverage of all published journal papers was approximately 12 per cent, with substantial disciplinary variation. Another study in this field, carried out by White (2014), suggests that approximately 30 per cent of papers are freely accessible in their year of publication, rising to nearly 40 per cent in the following years, and repositories are responsible for approximately 50 per cent of freely available papers. As of April 2014, more than 50 per cent of the scientific papers published in 2007, 2008, 2009, 2010, 2011, and 2012 can be downloaded for free on the Web (Archambault et al., 2014). The latest research revealed that approximately 60 per cent of all published papers have an open access copy available (Laakso and Lindman, 2016). This may be a positive result of the increasing interest in promoting open access to scientific publications and research data, thus resulting in the growing free of charge access to scientific publications for any user (Archambault et al., 2014; European Commission, 2016)

The increasing use of Web-based search engines and widespread freely available full-text literature on the Web are indications of increased possibilities to have access to scholarly literature without using library subscribed databases or library local collections. Recent developments in the provision of scientific information resources lead us to the key research question, namely what is the coverage of the freely available information sources when writing doctoral theses, and whether the academic library can assume the leading role as a direct intermediator for information users.

Specifically in this exploratory pilot study we seek to address the following questions:
RQ1. How important are library information resources in compiling material for doctoral theses?
RQ2. What types of information sources are most often used when writing doctoral theses?
RQ3. What are the potential ways of getting certain information resources cited in doctoral theses?

Literature review

Many scholars hold the view that libraries act as an intermediator between information resources and information users and the key role of the library is to serve the users’ needs (Brophy, 2000; Miežinienė and Prokopčik, 2000; Wilson, 1998). As suggested by the Generic Library Model (Brophy, 2000) (model is reproduced in Figure 1), the library is viewed as an intermediator between the user and information resources which are potentially available to that user, as expressed through Information Use and Access Processes (centre green rimmed box). The intermediation may be defined as a process where the library enables particular users (User Population) through a User Interface to gain access to the required information (Information Population) through a Source Interface.

Figure 1: Generic Library Model by Brophy

Figure 1: Generic library model (Source: Brophy, 2000).

This paper offers a critical examination of the assumption that the library may serve as an intermediator for information users in the process of information use and access. According to Brophy (2000), a much debated question is whether the library and information services, in any recognisable form, will be in demand in the new millennium and whether access to information resources will remain among the reasons for visiting libraries, as the preceding studies suggest that information and communication technologies have a significant impact on the library and the information sector.

With the role of the academic library defined, let us move on to the discussion of whether changes in the access to freely accessible full-text information resources may challenge the role of the library as an intermediator, because use of information has undergone certain changes.

Van Noorden (2014) surveyed more than 3,500 responses from ninety-five different countries. His research suggests that more than 60 per cent of researchers representing science and engineering, and more than 70 per cent of those representing social sciences and the humanities, are aware of and regularly visit Google Scholar, and more than 90 per cent of them have knowledge of Google Scholar. Another study implemented by Bøyum and Aabø (2015)concludes that Google Scholar is perceived as a highly convenient instrument and has been extensively used among business PhD students in Norway. Google and Google Scholar help people find information across the Internet and are free of charge. As a result, the latter is widely used when searching for scholarly information, whereas Google is often seen as a starting point and it is quite usual for PhD students to end up their search on Google as well (Connaway, White, Lanclos and Le Cornu, 2013; Jamali and Asadi, 2010). The research suggests that science and technology students are more likely to use Google Scholar than their peers representing the humanities and social sciences (Wu and Chen, 2014). Moreover, it has been established that the scientific community has been widely using social media to obtain scientific papers directly from colleagues (Kjellberg, Haider and Sundin, 2016; Laakso and Lindman, 2016)

A study conducted by Vezzosi (2009) suggests that PhD students rely heavily on the Web for their research work, however, their use of the library is limited to document delivery and interlibrary loan. Talking about PhD students’ information practice, as Carpenter (2012) has identified, it is typical of them to be satisfied with the abstract where they cannot get the full-text scientific paper. Interestingly, Gullbekk, Rullestad and Carme Torras Calvo (2013)observed that PhD students indicate easy access to full-text scientific papers as the most important aspect when choosing information resources, they are keen on using freely available electronic full-text information sources (use Google a lot), and have reduced utilisation of printed information sources. Another interesting finding suggests that PhD students cite conference proceedings and journal papers more often than the faculty does (Larivière, Sugimoto and Bergeron, 2013), for example, PhD students are apt to cite a large variety of formats, including conference papers, technical reports, and government documents (Condic, 2015). The said types of documents are more easily accessible using Google Scholar than subscribed databases, for instance, Web of Science (Khabsa and Giles, 2014).

When colligated, the above results support the idea that the printed collections and subscribed databases of the academic library are gradually decreasing in their importance for information users because more and more full-text information resources may be found on the Web using generic search engines such as Google and Google Scholar. It supports the idea that library collections and subscribed databases could potentially be replaced by freely available full-text information sources accessed through generic search engines.

For decades, collection development and management was among the key roles of the academic library. Today, however, the situation is far from being stable, as the infosphere is undergoing rapid changes, thus reshaping our traditional ways of information behaviour (Floridi, 2014). This point of view is supported by (Delaney and Bates, 2015) who write that increased competition from other information providers, such as Google and Amazon, decline in the use of the Online Public Access Catalogue, changes in user activities, people's engagement and interaction with the library and its resources, are but a few potential challenges to the academic library.

Librarians have started looking for new ways to act. Petraitytė (2013) showed that it is obvious that the academic library is actively searching for its place in the chain of scientific communication and information, and its future scenarios are being discussed. In her study, Petraitytė (2014) highlights that a number of authors dwell on the significance of the role of strategic partnership and cooperation; describes the role of the academic library as a proactive disseminator of innovation within the mother institution positions the academic library as the leader of the usage and application of information technologies at a university; and discusses the functions of publishing,scientific data curation and dissemination assumed by the academic library.

The said roles of the academic library are rather new and not all members of university staff accept them. Petraitytė (2013) points out that the traditional point of view on the library’s role as an information source provider is still viable. Recent developments in scientific communication have heightened the need for the research which would disclose whether researchers have the option of obtaining the necessary information sources without using library collections or library subscribed databases. There are no published data on how many of the freely accessible full-text information sources PhD students could potentially use in their main written assignment without availing themselves of their university library services.

With a view to answering the above-posed question, the authors resolved to implement an exploratory study of information resources utilised when writing doctoral theses. The authors of this pilot study have opted for citation analysis to discover how PhD students could successfully write their theses without using any library services. Citation analysis is a well-established approach in social sciences.

In recent years, two different approaches have been employed for citation analysis: a) to measure the use of library collections (Enger, 2009; Feyereisen and Spoiden, 2009; Kumar and Dora, 2011; Tonta and Al, 2006); and b) to assess citation habits (Echezona, Okafor and Ukwoma, 2011; Emerson, 2015; Kaczor, 2014; Keogh, 2012; Kuruppu and Moore, 2008; Sudhier and Kumar, 2010). Our idea was to use citation analysis to evaluate how useful freely available full-text information sources can be when writing PhD theses. The said measuring could help us determine to what extent the academic library may be important to PhD students as an information resource provider, and to collect further evidence on how strong the role of the academic library as an intermediator could be.

Method

With a view to addressing the research questions, citation analysis of thirty-nine doctoral theses (30 per cent of all theses defended at Vilnius University at the end of 2014) was conducted in the summer of 2015. These theses were randomly selected from different fields and branches.
Social sciences. Twelve out of the thirty-nine defended theses were selected for the research representing seven different branches: management (two theses), political science (two), communication and information (one), law (one), economics (two), sociology (two), and psychology (two).
Biomedical sciences. Ten out of the thirty-five defended theses were selected for the research, representing four different branches: biophysics (two), botany (one), medicine (five), and biology (two).
Technological sciences. Two out of the six defended theses were selected for the research, representing one branch: computer engineering (two).
Physical sciences. Ten out of the thirty-four defended theses were selected for the research, representing six different branches: mathematics (one), chemistry (two), biochemistry (two), physical geography (two), informatics (one), and physics (two).
The humanities. Five out of the sixteen defended theses were selected for the research, representing two different branches: philosophy (two), and philology (three).

The total of 6,998 bibliographical references was collected. Thesis reference lists were used to identify the cited resources. Every item from the lists was subjected to dual analysis – on-campus and off-campus to establish the quantity of utilised freely available resources. Twelve criteria, which fall into two groups, were employed to analyse each item on the lists.

Part one. The research team tried to identify whether certain resources can be found in the library’s eCatalogue (i.e. whether the library is in possession of those particular resources), in the library’s subscribed databases (for example, journal papers, e-books), or whether these resources in full text can freely be found online off-campus by merely using Google or Google Scholar.

Part two. The data gathering process also included identifying resource categories such as peer-reviewed papers, printed and electronic books or book chapters, reports and studies, conference papers, newspapers and other not peer-reviewed papers, Websites, theses (postgraduate degree theses, including master’s and doctorates), and other (any search record that could not be categorised according to the above classification).

Descriptive statistical analysis for qualitative variables was employed (percentage was calculated). The calculation procedure was as follows:

It should be noted that the Potential ways of accessing information sources section of the Results lists research data duplicates, as identical information sources were available in library eCatalogs, subscribed databases and on the Web. This suggests that one and the same information source could at the same time potentially appear in all three categories.

The statistical analysis was performed using SPSS software (version 21). The Shapiro–Wilk statistical test was employed to evaluate the normality of data, whereas the differences in the means of the independent groups were analysed applying the Kruskal–Wallis H test and the One-way Anova method.

Results and discussion

Note. The amount that was not covered by whole numbers was measured in decimals. In an attempt to implement a consistent description of research results all numbers were left fractional.

Types of information sources used.

The Shapiro–Wilk test was employed to assess the normality of data. Data on the types of information were not normally distributed among all types of information sources – significance value of the Shapiro–Wilk Test was lower than 0.05. A nonparametric test, the Kruskal–Wallis H test, was used to find out if there were statistically significant differences between the types of information sources. The Kruskal–Wallis H test revealed significant differences between fields and utilised information sources. It supports the findings of Fry, Spezi, Probets and Creaser (2015) and Jamali and Nicholas (2008) that different disciplines may potentially exhibit different information users’ behaviour.

The Mean Rank numbers resemble the results as provided in the percentage format, therefore the latter form will be used for a more convenient reflection of results. Significant differences in information sources (significance value lower than 0.05) are observed in peer-reviewed papers, e-books, printed books, reports, studies, and conference papers.

As indicated in Table 1 the most popular types of information resources are scholarly electronic journals – 49.81 per cent (a very small number of all used journals were printed ones), books – 26.46 per cent, and other not peer-reviewed periodicals – 6.05 per cent. Less popular source types are as follows: e-books – 4.61 per cent, conference papers – 3.99 per cent, Websites – 1.88 per cent, reports or studies – 1.42 per cent, and theses – 1.09 per cent. The term Other covers sources which include legal documents, maps, companies’ reports, blogs, etc. which make up 4.69 per cent of all information used in all theses. These numbers correlate with previous research findings (Larivière et al., 2013) where it was established that other theses were also the least cited information resource in the process of thesis writing. Moreover, there is a significant difference between electronic and printed resources which suggests that the electronic format is more common in most research fields.


Table 1: Distribution by information source types (percentage)
Scientific fields Information source types
Peer-reviewed papers e-books Printed books Reports, studies Conference papers Newspapers and other papers Websites Theses Other
Social Sciences 42.68 3.15 28.47 3.79 1.97 9.14 2.58 1.02 7.20
Humanities 14.77 8.51 66.93 0.16 0.96 4.17 0.32 1.61 2.57
Technological sciences 33.47 9.92 17.77 2.07 14.05 9.09 3.31 1.65 8.68
Physical sciences 69.89 0.53 15.45 0.85 2.70 4.60 2.50 0.92 2.56
Biomedical sciences 88.22 0.96 3.69 0.23 0.27 3.23 0.68 0.27 2.46
Arithmetic mean 49.81 4.61 26.46 1.42 3.99 6.05 1.88 1.09 4.69

Use of peer-reviewed papers

As indicated in Figure 2, our research helped establish that peer-reviewed papers were the most often used type of information, which correlates with findings presented in other studies(Carpenter, 2012; Niu, Hemminger, Brown, Powers and Tennant, 2010; Rowlands et al., 2008; Tenopir et al., 2010; Tenopir, King, Christian and Volentine, 2015;; Tenopir, King, Edwards and Wu, 2009). PhD students of biomedical and physical sciences are substantial users of peer-reviewed journals: 88.22 and 69.89 per cent respectively, of all information utilised in their theses was derived from electronic journals.

This reflects to the work of Nicholas, Clark, Rowlands and Jamali (2009), which suggests that researchers of life sciences are among the most frequent users of journal papers. The percentage of electronic journals utilised by students representing the field of social sciences makes up 42.68 per cent. They are followed by representatives of technological sciences with 33.47 and the humanities at 14.77 per cent.

The percentage pertaining to electronic journals suggests that students of technological sciences rely on books more than those of biomedical or physical sciences. Multiplicity of resource types used in this scientific field may be accountable for the fact. The available data suggest that PhD students representing technological sciences gather necessary information from conference papers (14.05%), Websites (3.31%), and reference theses (1.65%) more actively than those who opted for other sciences, which suggests that students of technological sciences are likely to make use of more diverse resources, from various sources, than their peers.

There is nothing surprising in these numbers as they are consistent with earlier and current research from around the world. In 2002, King and Montgomery (2002), scientists of Drexel University, Philadelphia, conducted a research study aimed at finding out who read more electronic journals – the faculty community or PhD students. The gathered data helped establish that average reading per person and average time spent on reading publications was almost 20 per cent higher among PhD students. Moreover, PhD students not only read more electronic journals, but also cited them more in their papers than other faculty members (Larivière et al., 2013). A recent study corroborated the above-described statistics which manifest steady increase in the numbers. Researches helped disclose that the most active electronic journal readers are PhD students from all disciplines (exclusive of the humanities) (Mohammadi, Thelwall, Haustein and Larivièère, 2015). They read almost four times more than representatives of any other academic group.

Figure 2. Use of peer-reviewed papers (percentage)
Figure 2: Use of peer-reviewed papers (percentage)

Use of books and e-books.

Figure 3 exhibits the findings of our research, which indicate that books and e-books are highly prevalent among PhD students of the humanities, while for other disciplines the percentages are lower. These results correlate with those of the research implemented by Brown and Swan (2007). Among the limitations of books as a type of information source is their format, as users tend to show considerable preference for electronic full-text offerings (Brown and Swan, 2007; Tenopir et al., 2010, 2015).

Figure 3. Use of books and e-books (percentage)
Figure 3: Use of books and e-books (percentage)

It should be noted, however, that the use of e-books is still relatively insignificant. We can presume that the key problem resulting in the meagre use of e-books is related to the insufficient quantity of high quality texts published in this format. Several issues pertaining to the publishing of academic e-books were highlighted by the librarians of Alabama University (Walters, 2013). They were tasked with drawing up the mandatory literature list for medical students consisting exceptionally of e-books. Apparently that was impossible because most of the required books were not issued in any electronic format. In the paper, Walters (2013 also highlights other problems related to the supply of and demand for academic e-books: there is an apparent shortage of high quality academic literature from the publishers’ side and most e-books are published from three to eight months later than their paper versions, which is too long, given the rate of issue and quantities of new scientific production in some of the research fields. Hence, we can assume that this is the reason why the role of e-books in thesis writing is so insignificant in such fields as physical (0.53%) and biomedical (0.96%) sciences. Today, the most active users of e-books are PhD students of technological sciences. This correlates with printed books which the latter use more frequently than students of physical sciences and over four times as frequently as PhD students of biomedical sciences. In the field of technological sciences the difference in the utilisation of printed books and e-books is also the most insignificant – a mere two times. To perceive the relative insignificance of this number we can compare it with that of physical sciences where the difference between printed books and e-books is as much as twenty-nine times. The gathered data suggest that books are actively used in technological sciences which is labelled as one of the fastest evolving sciences, however to find out the precise reasons why e-books in particular are so usable in this field rather than in other fields, demands further analysis.

Potential ways of accessing information sources.

Three potential ways of accessing information resources have been subjected to analysis: library eCatalogues, subscribed databases, and freely available full-text information resources. The assessment of the normality of data was based on the Shapiro–Wilk test. The test revealed that the groups of library eCatalogue (Sig. 0.000) and unknown sources (Sig. 0.001) were distributed normally, however, subscribed databases (Sig. 0.318) and freely available sources (Sig. 0.540) were not normally distributed. For further analysis of the results non-parametric and parametric methods were employed through the application of the Kruskal–Wallis H test and the One-way Anova test. The authors’ decision to resort to the parametric test, namely the One-way Anova test, for further analysis was determined by the fact that the results of the analysis based on this test were more consistent. The parametric test was employed to determine whether there were any statistically significant differences in the potential way of getting access to information sources.

The following statistically significant differences in the potential ways of accessing information sources were determined – eCatalogue (Sig. 0.000); subscribed databases (Sig. 0.000); freely available (Sig. 0.001) and unknown sources (Sig. 0.077). These results suggest a significant difference between the use of printed and electronic resources. With a view to analysing the disciplines in which the differences were most prominent, the post hoc comparison was conducted by means of the Tukey HSD test. The test indicated that though the difference in social, technological, physical, and biomedical sciences was insignificant, it became rather significant as compared to the humanities. Therefore, PhD students representing the humanities are among the most active users of the library’s printed sources. Analysis of electronic resources, however, revealed opposite results – students of all fields were actively using electronic resources provided by the library with insignificant differences, with the exception of those representing the humanities – the latter being the least active group.

Figure 4. Potential ways of accessing information sources (percentage)
Figure 4: Potential ways of accessing information sources (percentage)

Over 14 per cent of information sources used by PhD students could potentially be found in the library eCatalogue. As indicated in Figure 4, PhD students representing the humanities were able to find the biggest share of needed information in the library eCatalogue (41.16%), as compared to students of other research fields – technological (6.81%), physical (6.51%) and biomedical sciences (3.05%). This correlates with the type of most popular information resources in each field – representatives of the humanities mostly use paper books, which means that they are more likely to find necessary books in library stacks searching for them through the eCatalogue, while PhD students of biomedical science are apt to find more relevant information in electronic format, therefore the percentage of paper books and eCatalogue usage here is notably smaller. Physical and technological sciences are renowned for the constant update of information which makes it a challenge for libraries to follow.

Over 27 per cent of information sources used by PhD students (mostly peer-reviewed papers) could potentially be found through Vilnius University Library subscribed databases. Almost half (43.16%) of the necessary information PhD students of biomedical sciences could potentially find in subscribed databases; a similar situation was with those representing the physical science (40.61%). In constrast, for the humanities, the main source of information was printed books (41.16%) searched for through the eCatalogue.

Over 35 per cent of information sources used by PhD students could potentially be found free off-campus. Free off-campus access was potentially available to almost 57 per cent of all information sources used by PhD students of technological sciences and more than 40 per cent of those utilised by PhD students of biomedical sciences. In the latter case it almost equals the share of information sources which they could potentially access using subscribed databases.

On average more than half (57%) of all utilised information resources were freely available or could be accessed without using the collections of the home library (unknown potential ways of accessing information sources). It is approximately 40 per cent of the higher quality information resources (excluding Websites and articles from newspapers or blogs) used.

Our research revealed that PhD students of the humanities are most frequent users of books, thus subscribed databases are not of high importance and freely available information sources relevant for them are rather sparse – making up only 18 per cent of all used information sources. There are far fewer freely available scholarly books than peer-reviewed papers. As Montgomery (2013) pointed out in the open access debate, the role that open access might play in helping a deeply inefficient system of publishing scholarly books was bestowed little attention. The initiative of the Directory of Open Access Books is making the first hard steps and numerous unsolved problems still exist (Whitford, 2014).

A mere 25 per cent of all necessary sources for PhD students of physical sciences could potentially be found freely available. A closer inspection of the electronic journals utilised by PhD students of physical sciences in the selected theses revealed that most of the journals had high impact factors or were in the top 25 per cent of journal titles in a given subject listed in Thomson Reuters’ Journal Citation Reports. Most of these journals and their articles can be found in highly protected and expensive databases, therefore they are not as easily available as free open access resources.

Percentage of freely available information sources identical to those found in subscribed databases.

The smallest rate of resource overlapping between resources provided by Vilnius University Library and freely available resources was detected in the field of physical sciences. As it has already been mentioned, the reason behind this fact is that PhD students of physical sciences were keener on using papers from journals with high impact factors or the top 25 per cent of the journal titles in a given subject listed in Thomson Reuters’ JCR.

Figure 5. Percentage of freely available information sources identical to those found in subscribed databases

Figure 5: Percentage of freely available information sources identical to those found in subscribed databases

The situation in the field of social sciences is the opposite: data featured in Figure 5 suggest that the overlapping of information resources is 69 per cent. If we assume that doctoral students start their search using the Google search engine, there is a 69 per cent chance that they will find what they need without turning their attention to a library eCatalogue or subscribed databases. A few examples of earlier research conducted in Lithuania before 2010 indicate that almost 85 per cent of doctoral students from various scientific fields use Google to find full-text papers (Tautkevičienė, Duobinienė, Kretavičienė, Krivienė and Petrauskienė, 2010). Interestingly, Catalano (2013) discovered that more doctoral students of social sciences than any other study programme make use of library resources.

The peculiar fact suggested by these data is that 58 and 50 per cent of freely available sources have identical content with the subscribed databases in the fields of technological and biomedical sciences (respectively). Overall, research results revealed that the overlapping between Vilnius University Library resources and freely available resources is as high as 47 per cent.

However, as indicated in Figure 4, a considerable number of referenced resources remained unknown – 22 per cent on average. Types of resources lying under the term unknown where analysed separately for each subject field. Enquiry into all subject fields did not disclose a clear way of potential access to peer-reviewed papers and printed books, thus suggesting that these resources in full-text were not available in the university library eCatalogue, subscribed databases or freely online. Hence we can assume that students had to use other libraries, purchase or use social contacts to obtain needed information. In-depth analysis of the unknown resources in each field revealed equal distribution of the most used types of resources and the unknown ones. For example, in biomedical sciences the most popular type of information was peer-reviewed papers (88%) which resulted in the biggest number of these papers hiding under the unknown category (79%). A similar situation is observable in physical sciences – peer-reviewed papers were the most frequently used (67%) and made up most (44%) of the resources in the unknown category. The social sciences and the humanities manifest an analogous situation with printed books. The only exception is technological sciences where students used more peer-reviewed papers, however, more printed books than papers were listed under the category where the way of potential access is unknown, cf. 52 per cent of all the unknown resources were printed books and less than 18 per cent were peer-reviewed papers. Reports and studies made up the smallest percentage of unknown resources ranging from 0 to 1.7 per cent throughout all the subject fields.

The authors assume that the reasons behind this fact could be as follows: PhD students use information resources during their internships or use resources provided by libraries in other countries; they use interlibrary loan services offered by libraries; at times paper abstracts are sufficient to review the examined field and full-text access is optional. The research data suggest that combining freely available and unknown sources (figure 6), PhD students of all the fields could be able to access more than half of the necessary information without using the collections of the home library. In the case of technological sciences the percentage is as high as 75 per cent.

Figure 6. Percentage of freely available information sources and sources whose way of accessing is unknown

Figure 6: Percentage of freely available information sources and sources whose way of accessing is unknown (no free access and no access through library subscribed databases or eCatalogues)

Conclusions, limitations and further research

This exploratory study was aimed at answering research questions regarding the significance of freely accessible information resources to PhD students when writing their doctoral theses. Moreover, the authors were concerned about interpreting given results to determine whether academic libraries and their collections are relevant for PhD students when writing doctoral theses and whether the academic library can assume the leading role as a direct intermediator for information users.

Readdressing the key research question posed at the beginning of this study, it is now possible to state that library collections and subscribed databases could potentially cover up to 41 per cent of all information resources used in doctoral theses. One of the more significant findings to emerge from this study is that on average more than half (57%) of all utilised information resources were freely available or could be accessed without using the collections of the home library. Approximately 40 per cent of the higher quality information resources (excluding Websites and articles from newspapers or blogs) relied on when writing theses could be potentially accessed without using the collections of the home library. We may presume that the library as a direct intermediator for information users is potentially important and is irreplaceable only in four out of ten attempts when PhD students seek information.

Below are the findings with regard to each of the questions:

RQ1. How important are library information resources in compiling material for doctoral theses? On average the printed collections of Vilnius University Library can potentially meet not more than one sixth of the PhD students’ information needs. Only PhD students representing the humanities find printed library collections an important source of information (41% of all used information sources). The research strongly suggests that printed library collections are the least important information source for PhD students. It should be noted, however, that these results are very local and depend heavily on that particular library, therefore further analysis would be helpful in finding out how Google Books, open access books and other freely available sources overlap with in-house collections and with information utilised when writing doctoral theses.

We can conclude that electronic journals are the most popular information source in most research fields. However, on average electronic journals from subscribed databases meet only one quarter of the total of PhD students’ information needs. It also strongly depends on the number of subscribed databases in that particular library. On the other hand, not all content of subscribed databases is irreplaceable (with freely accessible information resources), as part of the content overlaps with freely accessible information sources (we discovered that on the average the overlapping reaches 47 per cent). Having this in mind, it is important to establish the quantity of information sources provided by subscribed databases that could be accessed freely off-campus to get a broader picture of how vital library information resources are for doctoral students.

RQ2. What types of information sources are most often used when writing doctoral theses? Research results suggest the importance of peer-reviewed papers (almost 50% on average). Usage of books (e-books and printed books) on average made up 30 per cent of all information resources. The remaining almost 20 per cent of the used information resources typically are not collected by the library and in most cases are freely available. They include Websites, conference proceedings, newspaper and blog articles, maps, statistical data from statistical information departments, etc. The above-listed results indicate that potentially the library, as an information source provider, could meet almost 80 per cent of all information needs, if it had sufficient funds to procure all necessary books and peer-reviewed journals which require subscription.

RQ3. What are the potential ways of getting certain information resources cited in doctoral theses? Analysis of the potential ways of information collection revealed that on average more than half (57%) of all utilised information resources are freely available or could be accessed without using the collections of the home library. It should be noted that some of the freely available resources are not key literature for thesis writing, as approximately 10 per cent of the used resources were Websites and articles from newspapers or blogs. Thus, a conclusion can be drawn that approximately 40 per cent of higher quality information resources used in thesis writing could be potentially accessed without using the collections of the home library. This is true speaking of all research fields. Another important aspect is that about a half (47%) of the content of subscribed databases is identical to that which is freely available.

As to the validity of the results, it has to be emphasised that citation analysis was carried out within half a year after the theses had been defended. It is possible that some of the used information resources (e.g. publisher`s versions) were not freely available during the thesis writing period owing to embargo periods, but were freely available when our research was carried out. On the other hand, we should take into account the amount of grey literature which is freely accessible before the point when the publisher's version becomes freely available after the embargo period. It is hard to distinguish whether doctoral students have used publisher's versions of a paper or its pre-prints or post-prints, however in their theses reference was made to the publisher's version. Some researchers insist that use of grey literature could be an important source of information, as in a scientific community it is a common practice to self-archive members' papers (most often the revised manuscript instead of the PDF formatted by the journal – publisher's version) and to upload them to accounts at ResearchGate.net or Academia.edu. As Sitek and Bertelmann (2014) suggest, grey literature has always played a role in scholarly communication. Haines, Light, O’Malley and Delwiche (2010) have found that science researchers rely a lot on a network of peers who can be treated as information sources. Another important aspect is that more than 70 per cent of all information resources utilised in theses were published two years before the theses were defended. Thus, the conclusion may be drawn that the validity of the given results are satisfactory for an exploratory pilot study.

The research was not aimed at grasping the full situation as to how PhD students seek information. We tried to find the possible ways of accessing information used in doctoral theses and to measure how important the library could be as an information resource provider to PhD students.

In future, we intend to carry out similar research covering bachelor and master degree theses citation analysis. This will provide further insights into the role of the academic library as an intermediator and will help us understand the importance of freely available information resources for students.

Acknowledgements

The authors would like to thank the whole team involved in data collection. Big thanks go to our colleagues from Vilnius University Library: Eglė Valiuškevičiūtė, Indra Andrijauskaitė, Elena Gadišauskaitė, and Ieva Danieliūtė. The authors also would like to thank to the editor in chief Prof. Tom Wilson and deputy editor Prof. Elena Macevičiūtė for their feedback on earlier drafts of this paper, as well as the regional editor and the anonymous reviewers for their constructive comments and suggestions. We really appreciate the help you gave us to become better writers of scientific papers.

About the authors

Vincas Grigas is a lecturer at the Institute of Library and Information Science of the Faculty of Communication, Vilnius University. He is the Head of User Service Department at Vilnius University Library. Vincas’ research interests include information seeking and retrieval and evidence based practice. He holds M.S. and PhD degrees in communication and information sciences. He can be contacted at vincas.grigas@mb.vu.lt
Simona Juzėnienė is a lecturer at the Institute of Library and Information Science of the Faculty of Communication, Vilnius University. She is the Deputy Head of User Service Department at Vilnius University Library. Simona’s research interests include library roles and discourse analysis. She holds M.S. and PhD degrees in communication and information sciences. She can be contacted at simona.juzeniene@mb.vu.lt
Jonė Veličkaitė is an oriental studies subject librarian at Vilnius University Library. Her research interests include information seeking and retrieval. She holds a B.S. degree in communication and information sciences. Jonė Veličkaitė can be contacted at velickaite.light@gmail.com

References

How to cite this paper

Grigas V., Juzėnienė & S., Veličkaitė J. (2016). 'Just Google it' – the scope of freely available information sources for doctoral thesis writing. Information Research, 22(1), paper 738. Retrieved from http://InformationR.net/ir/22-1/paper738.html (Archived by WebCite® at http://www.webcitation.org/6oGbvQyHa)

Check for citations, using Google Scholar