header
vol. 21 no. 2, June, 2016


Twitter conversation patterns related to research papers


Gustaf Nelhans and David Gunnarsson Lorentzen


Abstract
Introduction. This paper deals with what academic texts and datasets are referred to and discussed on Twitter. We used document object identifiers as references to these items.
Method. We streamed tweets from the Twitter application programming interface including the strings "dx" and "doi" while simultaneously streaming tweets posted by and to the authors of the tweets captured. By doing so we were able to capture tweets referring to a digital object as well as the replies to these tweets.
Analysis. The captured tweets were analysed in different ways, both quantitatively and qualitatively. 1) Bibliometric analyses were made on the digital object identifiers, 2) the thirty of thesee most mentioned and retweeted were analysed and 3) the conversations with at least ten tweets were analysed using content analysis.
Results. Research from the natural sciences was most prominent, as was research published in open access journals. Different types of conversations relating to the digital objects were found, both when looking at them qualitative and their visual structure in terms of nodes and arcs. The conversations involved academics but were not always academic in nature.
Conclusions. Digital object identifiers were mainly referred to for self-promotion, as conversation starters or as arguments in discussions.


Background

The Web and especially social media have increasingly become an arena for communication amongst researchers, complementing the journal article as well as social interaction such as the seminar or conference meetings. The notion of using this information as indicators of research output and metrics has increasingly gained scholarly interest (Haustein, Sugimoto and Larivière, 2015). Altmetrics, sometimes referred to as social media metrics (Haustein, Sugimoto and Larivière, 2015), is then used as means for measuring research impact based on online metrics as an alternative to citation analysis. It has been suggested that with the altmetric counts still being at a low level, and the positive correlation with citations is weak, the indicators are at best a complement to citations (Costas, Zahedi and Wouters, 2015). Blog citations however were found as an alternative source to citations (Shema, Bar-Ilan and Thelwall, 2014).

One source for altmetric data is Twitter (e.g., Hammarfelt, 2014; Haustein et al., 2014; Thelwall, Haustein, Larivière and Sugimoto, 2013). Tweets comprise the most common provider to the commercial Altmetric.com service (Costas, Zahedi and Wouters, 2015). Twitter mentions are generally regarded as having low value as an impact measure since tweets are easily manipulated. Moreover, one study has found large volumes of tweets created by automated bots (Haustein et al., 2016). There is also a known bias in Twitter data as it is commonly used for marketing by publishers and authors, rather than as communication about the actual research. The correlation between Twitter mentions and citations has been found to be non-existent (e.g., Bornmann, 2015b). In this paper, we shift the focus slightly to also include conversations as units of analysis. Gonzales (2014) viewed tweets including a conference hashtag as a conversation. Contrary to this view, we define conversations as chains of tweets that are linked together through replies to previous tweets. The rationale for focusing on conversations rather than on single tweets in an altmetric study is that if there is an interaction there is sign of interest and possible communication between users. Tweets for marketing or spam are seldom replied to and, therefore, are expected to be much lower in a study focussing on conversations. The length and characteristics of Twitter conversations with possible bifurcations and dead ends are arguably more relevant aspects to measure in terms of impact than the sheer number of singular tweets. This is an empirical question that will be pursued using an exploratory approach in this study.

Previous related research on scholarly activity has been performed in different ways. Holmberg and Thelwall (2014) tracked the activities of a list of predefined scholars from different disciplines. A related study was made by Haustein, Bowman, Holmberg, Peters and Larivière (2014) who focused on the behaviour of thirty-seven astrophysicists. A negative correlation between tweets per day and number of publications was found, and there was no correlation between retweet and citation rates in either way (i.e., retweets made and citations made and retweets received and citations received). Using the same dataset, Holmberg, Bowman, Haustein and Peters (2014) focused on the conversational connections of the studied users. Among their findings the most relevant to us is that there was little interaction in regards of directing messages to other users (@mention). Thus tweeting behaviour was more about information sharing than having conversations.

Searching for references to entities related to the academia has previously been studied by Orduña-Malea, Torres-Salinas and Delgado López-Cózar (2015), who performed a link analysis of tweet references to a sample of 200 university Web sites and found a correlation between tweet links and Web links. Finally, Thelwall, Tsou, Weingart, Holmberg and Haustein (2013) used content analysis of tweets linking to journals and digital libraries. The far most common tweet included the title or a brief description of the article, and positive or negative comments were rare. The authors conceded a limitation in that the discussion of the articles was not captured by their method and suggested 'a deeper future analysis might be able to assess the extent to which this occurs'. This study fills this gap, while it also includes bibliographic and bibliometric analyses of academic Web sources such as articles and data-sets referred to by digital object identifier.

Aim and research questions

Aim

The aim of this study is to gain an understanding about the characteristics of Twitter conversations about objects of research such as research papers identified by digital object identifier references. The digital object identifier is a reasonable identifier for research publications and is suitable for the exploratory approach used here. Since Twitter is an interactive media platform, we argue that its relevance for social media metrics should be valued not by the numbers of tweets. Instead its merits should evaluated by its prospects of capturing relevant correspondence about the research that is mentioned. The analysis will be done using both network analysis tools to study their development and content analysis of a selection of actual conversations with regards to content and style of communication. While hashtag or keyword based studies of scholarly activity on Twitter has been published previously, this is the first paper to investigate Twitter conversation in altmetrics data.

Research questions

Our overarching questions are:

RQ1: Which types of academic source are most often mentioned?

RQ2: Which types of academic source are most often retweeted?

We then investigate the threaded conversations with at least ten tweets from these questions:

RQ3: Which disciplines and topics are the articles referred to representing in the selected conversations?

Finally, we investigate fourteen chosen threads using content analysis:

RQ4: What are the characteristics of conversational threads emanating from a reference to an article?

RQ5: How are research papers referred to in a threaded conversation?

Methods

Data collection

We used the Twitter streaming API to filter tweets containing the strings 'dx' and 'doi' or including an embedded dx.doi.org URL. A drawback with traditional Twitter research is that when hashtags or keywords are used for data collection, the replies to these tweets that do not match the search criteria are missing. This unknown conversation has been labelled follow-on tweets or follow-on communication (e.g., Bruns, 2012; Bruns and Moe, 2013), and is so far under-researched (e.g., Bruns and Moe, 2013; Lorentzen and Nolin, in press). In order to capture the conversation around these tweets we simultaneously filtered the stream for replies to tweets in our dataset by tracking the most active participants in the conversation during the current week, using a method similar to the one used by Lorentzen and Nolin (in press). This means we tracked both Twitter users posting digital object identifier tweets and users replying to tweets in the dataset. As in Lorentzen and Nolin (in press), we identified tweets replying to tweets not captured and queried the API for the missing tweets using the endpoint GET statuses/lookup. Data were collected during April 2015.

The Twitter API returns a subset of the tweets matching the search criteria and their associated metadata. Of this metadata, two entities are of particular interest to us: the URL and the reply field. A URL embedded in the tweet is represented by three versions; one shortened (e.g., http://t.com/H2QYbr6SkU), one for display (e.g., dx.doi.org/10.1037/xge000) and one expanded (e.g., http://dx.doi.org/10.1037/xge0000043). A reply is denoted by an ID of a tweet a given tweet is replying to. This reference to another tweet is used to build conversational threads from the data, here defined as two or more tweets connected through the reply metadata field. A thread is thus comprised of a start tweet and a chain or tree of replies.

From the text of the tweets we can derive whether a tweet is a retweet of another tweet or not. Unfortunately, not all retweets can be identified as the API only denotes manual retweets as retweets and not button retweets (e.g., Bruns and Moe, 2013). The manual retweet typically includes the original message but with 'RT', 'MT', or 'via' added to the user name of the original tweet author. The former is the far most common in this dataset with only sixteen instances of 'MT' and two of 'via' found, compared to 7,173 tweets starting with 'RT'. A button retweet is made by clicking the retweet button and by doing so copying the original tweet. Given that the former option is arguably a more conversational one as it is possible to edit while retweeting (e.g., Highfield et al., 2013), we here focus on this type of retweet. Similar to Haustein et al. (2014), we identified tweets starting with 'RT' as retweets.

In the analysis, we ranked all the digital object identifier references based on the number of retweets and the number of mentions. The most retweeted and mentioned papers and the conversations around these were analysed qualitatively using a combination of visualisation techniques and content analysis.

Data analysis

Data were analysed in three steps. In the first, a bibliographic/bibliometric (Sections 4.1 and 4.2) study of the tweeted articles with a digital object identifier identified in Web of Science (WoS) was performed. From the total dataset of 15,731 individual tweets, all digital object identifier references were identified. A total of 4,499 unique and valid identifiers were found. These URLs were matched to their source publications in WoS using the advanced interface and searching using the DO field tag. The searches had the following form:

DO=(10.1371/journal.pbio.0040105) or
DO=(10.1371/journal.pbio.0050018) ...

Search parameters used were:

Indexes=SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, Timespan=All years.

Searches were performed on the 10th September, 2015.

In the second step, the thirty most mentioned and retweeted digital object identifier references were analysed, including any type of publication and not just research papers. The problem of the retweet button outlined above entails that mentioned references not starting with 'RT' and thus not recognised as retweets could include button retweets. Here, we noted the title, research area and source of the publication. Finally, in the third step, a network and content analysis of the collected set of tweets and follow-on conversations was made. In this step, we analysed the threads with at least ten tweets. Only threads including at least one tweet referring to a digital object identifier were chosen. In total, twenty-nine threads matched these criteria. For content analysis, a final purposive sample of fourteen threads was made based on number of participants, volume of tweets, velocity, structure (tree, chain or hybrid) and length (in time).

As correlations between citation counts and social media activity are difficult to interpret, Bornmann (2015a) suggested content analysis of references to research articles. To investigate how the referenced papers were talked about, we used a grounded-theory-influenced, qualitative content analysis (Glaser and Strauss, 1967; White and Marsh, 2006). Different aspects of the comments made in the tweets, as well as the conversation structure were coded using a bottom-up approach. Whenever an interesting aspect was identified, this was marked down for the thread and correspondingly, a legend of different codes was constructed. This was done in an iterative process, so that codes that were invented later in the process were applied to earlier threads upon occurrence. Additionally, codes were harmonised so that passages with similar interpretation were brought under the same code, if applicable. Not every tweet message was coded, instead, relevant passages that were found appropriate for the description of the threads were used for coding. In the end, the codes were sorted typologically based on what kinds of code had been found.

The data collection method captured some non follow-on tweets not including valid digital object identifier strings. There were a few examples of digital object identifier related tweets not using an identifier URL such as '[username] cite the kindle DX type and include DOI number or where was downloaded from' and '[username] Smith, B. (2010). My Life. (Kindle Dx Version). DOI number'. Some tweets written in other languages also included both strings as part of the tweet or usernames mentioned. In our analysis, we used only tweets with valid digital object identifier strings. All tweets regardless of language were used for the digital object identifier collection in the bibliometric part of the study, while only conversations predominantly having English language content were used for the qualitative part.

Results

As noted in Table 1, 4,549 unique digital object identifier strings were found in the tweets. Of these, 4,499 where valid identifier strings searchable in Web of Science. Of these unique identifers, a total of 2,992 papers were identified in Web of Scienvce and their entries were harvested for the bibliometric analysis below. Digital object identifier not matched in Web of Science could either belong to papers in journals not indexed by Web of Science, misspelled identifiers or identifiers pertaining to data sets or other material that is not indexed in the citation database. These 2,992 papers were published within 814 different journals. The total number of authors was 17,603. Although most of the URLs were digital object identifier URLs, some other sources were also linked to. Most of these were references to other tweets (37), nature.com (9) and YouTube (5). Below we give an aggregated overview of the publications referenced by digital object identifier, we also zoom in on the most mentioned and retweeted articles in the set. Finally, we focus on a selection of conversations found in the collected Twitter material.


Table 1: Data collection. Number of tweets collected and additional methods of creating the set of tweeted DOI URLs.
Tweets15,731
Tweets matching 'dx AND doi'13,242
Tweets with a digital object identifier URL12,967
Follow-on tweets1,039
Tweets added through gap filling1,450
Directed tweets2,758
Retweets7,173
Non follow-on tweets not referencing digital object identifier73
Unique digital object identifier URLs4,549
Unique non-digital-object-identifier URLs442

Bibliographic data on tweeted articles

All tweets were collected during a single month and from the publication year of the papers that were referenced in the tweets it is shown that the majority of the tweeted articles in the set are very recent (Table 2). In fact, 80 per cent of the articles were published in the same year as the collection was made, and an additional ten per cent was published the year before. This is in stark contrast with the citation impact of the articles, which is highest about ten years back (around 2005 with a mean citation rate of about 20 citations per year, not counting individual outliers in years with only singular articles tweeted in the set).


Table 2: Publication years, number of papers published, citations received, and mean citation rate of articles per year (base year is 2015). (* To calculate mean citation score for this year, an age value of 1 was used to avoid division by zero).
Publication
year
No. of
papers
Mean
citations/article
Mean
citations/year
196311,135.021.8
1980100
198312,487.077.7
19851130.4
19891200.8
19901180.7
199111687
199211335.8
1994133015.7
19963153.38.1
1998260.53.6
20002636.542.4
200122.50.2
20033100.8
20044217.519.8
200510209.320.9
200612136.315.1
2007649.76.2
20081149.77.1
20092120.93.5
20102666.213.2
20114551.112.8
20127318.66.2
20139716.78.4
20142754.24.2
20152,3910.60.6*

In Tables 3 and 4 we compare tweet numbers vs. citation impact based on journals with the largest number of individually cited papers within the set. In Table 3 we can see journal names of the most tweeted journals, while in Table 4 we note that journals with many publications are not among the most cited (on average). While to a certain degree this is because a lot of the material is very newly published and, therefore, has not yet attracted many citations, since this bias is (at least in principle) systematic, meaning that all journals have the same disadvantage, one could argue that journals with many tweeted papers are not necessarily having a high impact status. Similar to the findings of Shema et al. (2014), journals such as various PLOS titles, Nature, Cell, Lancet and Journal of the American Chemical Society stand out here. Our findings indicate a dominance by the natural sciences, which stands in some contrast to the findings of Costas et al. (2015). Their Twitter counts were heavily dominated by biomedical and health sciences, with mathematics and computer science, natural science and engineering among the least Twitter mentioned disciplines.


Table 3: Tweeted journal titles ranked by number of tweets to published articles. Recs. are the total number of articles for each journal. Total Citation Score (TCS) for all published articles in the journal at the time of data collection is indicated as well as Mean Citation Score (MeanCS=TCS/Recs). The rank number for each journal when sorted by MCS (as given in Table 4) is given in the last column.
Rank Pub.JournalRecsTCSMean CSRank Cit.
1PLOS One5461,2412.322
2Physical Review Letters981,59216.27
3Physical Review E96870.931
4Nature61916158
5Nature Communications54631.229
6PLOS Computational Biology54488911
7PLOS Genetics511372.720
8PLOS Medicine413,13376.42
9PLOS Biology3862716.56
10IEEE Transactions on Computational Intelligence and AI in Games361103.119
11BMJ331083.318
12Cell3193730.23
13Scientific Reports30180.635
14Lancet2983928.94
15Modern Pathology29160.636
16ZooKeys29682.321
17Comunicar28220.832
18ELIFE28451.624
19PLOS Neglected Tropical Diseases28401.427
20PLOS Pathogens282629.410
21Journal of the American Chemical Society26411.625
22Proc. of the National Academy of Sciences of the United States of America2536114.49
23Neuron2496416
24Parasite24502.123
25Social Science & Medicine2370.339
26Current Biology2139318.75
27Nature Genetics211748.312
28Angewandte Chemie-International Edition18804.415
29Trends in Ecology & Evolution181246.913
30Computers in Human Behavior1560.438
31Nature Biotechnology15221.526
32Nano Letters1460.437
33Science142,630187.91
34Nature Medicine13715.514
35Organic Letters13100.833


Table 4: Tweeted journal titles ranked by article mean citation rate. The Rank Pub. column indicates tweet mention ranking. Note that only journals with at least 10 mentions are presented in this table.
Rank Cit.JournalRecsTCSMean CSRank Pub.
1Science142,630187.933
2PLOS Medicine413,13376.48
3Cell3193730.212
4Lancet2983928.914
5Current Biology2139318.726
6PLOS Biology3862716.59
7Physical Review Letters981,59216.22
8Nature61916154
9Proc. of the National Academy of Sciences of the United States of America2536114.422
10PLOS Pathogens282629.420
11PLOS Computational Biology5448896
12Nature Genetics211748.327
13Trends in Ecology & Evolution181246.929
14Nature Medicine13715.534
15Angewandte Chemie-International Edition18804.428
16Neuron2496423
17Proc. of the Royal Society B-Biological Sciences10343.443
18BMJ331083.311
19IEEE Transactions on Computational Intelligence and AI in Games361103.110
20PLOS Genetics511372.77
21ZooKeys29682.316
22PLOS One5461,2412.31
23Parasite24502.124
24eLife28451.618
25Journal of the American Chemical Society26411.621
26Nature Biotechnology15221.531
27PLOS Neglected Tropical Diseases28401.419
28ACS Nano10131.340
29Nature Communications54631.25
30Blood12110.937
31Physical Review E96870.93
32Comunicar28220.817
33Organic Letters13100.835
34Macromolecules1290.838
35Scientific Reports30180.613

Most articles tweeted were in the form of peer reviewed journal articles or reviews, together with some editorial material (Table 5). It is also worth noting that most tweeted articles are published by authors from well renowned U.S. and British universities, such as Harvard, Oxford and University College London (UCL). Another striking feature is that Rhodes University (South Africa, ranked in QS as between #501-550) is among the five institutions with most tweeted articles in the set (Table 6). The large English speaking dominance is also seen based on the country of the authors of the tweeted articles in the set (Table 7). Most authors come from English-speaking countries (US, UK, AU, CA), while the main European countries and China are also found at the top.


Table 5: Document type.
Document TypeRecs
Article2,608
Review163
Editorial material155
News item25
Letter22
Correction5
Article; proceedings paper4
Book review4
Proceedings paper3
Article; book chapter2
Biographical-item1


Table 6: Institutions (>29 articles).
InstitutionRecs
Harvard University120
University of Oxford77
Unknown67
University College London64
Rhodes University63
University of Cambridge61
University of Sydney50
Chinese Academy of Scienc46
University of California Berkeley45
Stanford University44
University of Pennsylvania44
University of Michigan43
University of Toronto43
Centre National de la Recherche Scientifique40
Imperial College of Science Technology & Medicine40
University of Washington38
University of Edinburgh37
University of Alberta34
Massachusetts Institute of Technology33
University of Copenhagen33
University of Manchester33
Yale University32
Cornell University31
University of Bristol31
University of Tokyo31
University of Illinois30


Table 7: Country, No. publications and mean citation score (>75 authors).
CountryRecsmeanGCS
USA1,25410.2
UK6159.8
Germany2684.6
Canada2364.5
Australia2134.2
France1996.1
Peoples Republic of China1851.6
Spain1682.7
Netherlands1534.4
Japan1482.7
Italy1153.9
Switzerland1135.3
South Africa970.9
Unknown9417.8
Sweden853.2
Belgium815

Bibliometric analyses of the tweeted articles

To get an overview of what the tweeted papers actually are about, some aggregated information is given in an analysis of the bibliographical coupling of the tweeted articles, as well as from co-word analyses of terms used in the tweeted articles selected in this study. This will serve as a basis for understanding the differences between the actual conversations and the referents of these Twitter conversations. Are the tweets generally about what the tweeted papers are about?

Bibliographic coupling

Bibliographic coupling is based on the quantitative analysis of the literature within the set, the so called research front of the research found in the downloaded papers (Persson, 1994; Åström, 2007). Bibliographic coupling in a set of papers can be measured at a number of levels. At the basic level, the article, it calculates the degree of relatedness of all article pairs in the set based on the number of cited references each pair share with each other (Kessler, 1963). Here, bibliographic coupling was performed at the aggregated level and the results visualised in a citation map for visual inspection and interpretation. Data were calculated at the journal level, meaning that coupling between journal titles depended on the number of shared references in articles published in each journal. In this way, we found out which journals were closely related in terms of subject area their articles belonged to based on what digital object identifiers were mentioned in tweets during the month the data collection was done.

In Figure 1, bibliographic coupling of journals is visualised using VOSviewer, a software tool for calculation and clustering of bibliographic analyses (van Eck and Waltman, 2014). The results are visualised in different views depending on its purpose. Here, journal titles of the 2,992 papers identified in are shown based on the journal level bibliographic coupling. This view features a heat map of journal titles that could analogously be interpreted as a topographic map. Location of a title implies location in two-dimensional space, and the closer two titles are with to other, the closer reference lists in articles published in each journal have been found to be. The coloration resembles a map, of densities, where red is found in the densest peaks in terms of bibliographic coupling strength, whereas yellow, green and blue signifies gradually lower density. PLOS One stands out as the most prominent journal, due to the number of articles mentioned by way of the digital object identifier in the study. Again, this is similar to the journals standing out in the blogosphere (Shema et al., 2014). It is also clear that the PLOS One papers are oriented quite evenly to journals in many different subjects, as would be expected, since it is a general purpose journal. To some degree, the surroundings of the journal are filled with journals that publish material that is more attainable to the public than would be expected from a bibliometric selection based on academic citations. Inter- and multidisciplinary research such as social medicine, public health, ecology/environmental science, as well as specific titles in the social sciences; Scientometrics and Futures stand out. In the mid-section of the map more research in biochemistry and medicine stand out, while physics and chemistry are found in the rightmost part of the map. These latter groups are research areas that would be found in a traditional citation analysis performed within a cross section of all published research. The results relate quite well to the findings of Costas et al. (2015), that biomedical and life sciences, as well as life and earth science dominate, while mathematics and computer science is less well represented within Twitter mentions. One thing that stands out in the map is that many titles allude to up-to-date research. Based on titles having such terms as current, emergent, advances, trends, and, again, futures we could postulate that research mentioned on Twitter is more forward-looking, than historical in nature in comparison with traditional academic referencing.

Figure 1: Bibliographic coupling of sources (Of 814 source publications - journals and conference proceedings - 263 have at least 2 articles).

Figure 1: Bibliographic coupling of sources (Of 814 source publications - journals and conference proceedings - 263 have at least 2 articles).

Term based analyses of keywords and topic terms

Another way to identify what literature is mentioned in the tweets is to read the articles. Here, instead, we will employ quantitative methods of text analysis to analyse the aggregated content of the literature, sometimes described as distant reading (e.g., Moretti, 2013). Here we will employ two different techniques for aggregating information to describe what literature is mentioned in the digital object identifier set. First, we will use author-generated keywords from the Web of Science set of 2,992 articles that were identified and view them based on their co-occurrence at the article level. The pre-processing of data was done using the Science of Science (Sci2) Tool (Sci2 Team, 2009) and the visualization was then prepared in Gephi, using the Force Atlas 2 algorithm (Bastian, Heymann and Jacomy, 2009). Second, a similar visualisation based on title and abstract of the 2,992 articles will be performed. This time, so called noun phrases, meaning phrases of nouns, pronouns or combinations of the two word classes are constructed and subsequently mapped at article level in a visualisation in VOSviewer (van Eck and Waltman, 2014). Additionally the phrases will be clustered according to how sets of topics could be delineated.

Keywords based analysis (Gephi)

In Figure 2 we introduce a keyword map based on the keywords added in the papers in the group of 2,992 articles. Keywords found at least two times are shown and the maximum number of occurrences is 457. Node size is proportional to occurrence and links indicate that the pair has been found in one single article. As in the journal map above, we note a wide selection of terms and concepts describing the articles. Some terms are found closer to each other and by viewing them one could identify clusters that are meaningful. The interpretation starts from the top, and we then find concepts relating to health, medicine and the life sciences. Interestingly enough, neither these areas, nor basic science such as physics and chemistry stand out very well. Instead, social science topics in a broad meaning are the most prominent. These range from information technology topics bordering to mathematics, social medicine and further on to information science, media studies, education and gender studies. This is even more prominent in Figure 3, where the social science and social issues section of the map is enlarged. In the centre of this part, social networks is found close Internet and collaborative learning, and below, social media borders keywords such as emotion, content analysis, MOOC and higher education. To some degree, a circularity of the Twitter - academia complex must be noted here: to a certain degree research articles mentioned in tweets are expressly focused on social media, a correspondence that could not be called a coincidence.

Figure 2: Author-based keywords (WoS).k

Figure 2: Author-based keywords (WoS).

Figure 3: Keywords, zoomed in

Figure 3: Keywords, zoomed in.
VOSviewer term based analyses

The following term based analyses are based on noun phrases constructed by the algorithm from words in title and abstract fields of the articles found in Web of Science. Of 72,807 terms, 1,147 are found 10 or more times in the set. Of these, 688 terms were selected based on a term frequency-inverse document frequency (tf-idf)=60 %. This means that commonly found terms found in the set are omitted (such as paper;, study, e-service, etc). In VOSviewer, a density visualisation was performed using the proprietary mapping and clustering algorithms of VOSviewer introduced by its authors, that, in essence maps terms closer to each other if they often occur together and at the same time clusters them based on a similar closeness calculation (van Eck and Waltman, 2014), yielding a term based map (Figure 4). The clustering algorithm used in VOSviewer Word length was set to 30 letters and the longest phrase consists of 27 letters and the mean number of letters were 9.0 (std.dev=4.2). Nine clusters were identified in the set. The visualisation could be interpreted visually, with help of a list of frequently occurring terms in each cluster (Table 8). Starting with the red cluster (1), quite technical terms such as structure, state and property are found, indicating terminology to describe research. Next, the green cluster (2) signifies the actual research process in a clinical medical setting, with terms like patient, treatment, participant and outcome. The dark green cluster (6) next to that is in the medical sciences, but time rather about social medicine and epidemiology, with terms like death, report, child but also hiv, Africa and recommendation. The blue cluster (3) indicates lab work, predominantly in cancer and gene research, while the adjacent grey cluster (7) focused on biochemistry, with terms like sequence and infection, but also dna, genome, sequencing and antibiotic. Next, the yellow cluster is biological with terms like species, evolution, plant and ecosystem, while the purple cluster (6) sits firmly in survey methodology and the social sciences with terms such as practice and survey, science, and further below policy, education and learning. Lastly, there are two unidentified clusters (8 and 9) with single terms, schizophrenia and higher level.

To a certain degree, the term based clustering based on title and abstract shows that a rather high share of the research found in the DOI-mentioned articles are found in biochemistry and medical research, while the social sciences do not turn out so strongly here. This is probably because there is a high yield of applied research, mainly focusing on health, while base research to some degree is found in biochemistry, but also in ecology, zoology and evolutionary biology.

Figure 4: Term visualisation. 688 terms occurring at least ten times were mapped and cluster resolution was set to 1.40, to generate nine distinct clusters.

Figure 4: Term visualisation. 688 terms occurring at least ten times were mapped and cluster resolution was set to 1.40, to generate nine distinct clusters.

Table 8: Weight and cluster designation for most commonly identified terms within DOI-mentioned article title and abstracts.
Cluster 1nCluster 2nCluster 3nCluster 4n
structure324year311mechanism299species337
state195conclusion276cell292evolution136
property183patient270gene214diversity127
application175treatment222expression184composition95
dynamic174risk212protein176origin84
formation166age198pathway141plant84
theory141participant165cancer109trait72
complex115outcome153regulation101ecosystem70
material114background145mouse99female69
solution105day120mutation92organism66
Cluster 5nCluster 6nCluster 7nCluster 8n
article126death92sequence132schizophrenia11
practice118report84infection123
person113child83dna68Cluster 9n
survey108efficacy70genome63higher level19
country89drug69sequencing63
science82dose55strain63
access77administration40virus57
experience77africa40pathogen51
policy68recommendation40bacterium50
education64min39pcr31

Twitter mentions and retweets

Detailed information including publication titles, discipline, source and number of mentions and citations are given in Tables 14 and 15 in the Appendix. There are larger figures for retweets than for digital object indentifier mentions (Figure 5) which is probably a result of retweeting requiring less effort than posting an original tweet. There are a couple of similarities with the results from the analyses in 4.1 and 4.2. Biology and medicine are the most mentioned research areas, and articles from the sciences are more often mentioned than social sciences, even though there are examples of mentioned papers from political science, research and education and library and information science (Figure 6). The most prominent journals in this respect are PLOS journals, where PLOS One dominates with thirteen mentioned articles (Figure 7). Nature has five mentions. The retweet set was even more dominated by the sciences, with biology and medicine being the more prominent areas (Figure 8). The domination by PLOS journals was not as evident in the retweet set, with seven PLOS One articles and six articles from other PLOS journals (Figure 9). Most of the mentioned and retweeted items were from 2015 (28 and 27, respectively), so it seems as the recent published research is more likely to be mentioned or retweeted, even though there were examples of older articles in this set. A difference between the top lists is that references to Figshare were included among the most retweeted digital object indentifiers, however, the most striking difference is that the by far most retweeted item, "Crickets are not a free lunch: protein capture from scalable organic side-streams via high-density populations of Acheta domesticus", retweeted 1,014 times, is not included among the top thirty mentioned items. There was some overlap between the top digital object indentifiers with eight items present in both sets, but overall these findings indicate that what is most often retweeted is not necessarily what is most often mentioned.

Figure 5: Histograms of the 30 most often mentioned (blue) and retweeted (red) DOIs.

Figure 5: Histograms of the thirty most often mentioned (blue) and retweeted (red) digital object identifiers.

Figure 6: Disciplines of 30 most often mentioned DOIs.

Figure 6: Disciplines of thirty most often mentioned digital object identifiers.

Figure 7: Sources of 30 most often mentioned DOIs.
Figure 7: Sources of thirty most often mentioned digital object identifiers.

Figure 8: Disciplines of 30 most often retweeted DOIs.
Figure 8: Disciplines of thirty most often retweeted digital object identifiers.

Figure 9: Sources of 30 most often mentioned DOIs.

Figure 9: Sources of thirty most often mentioned digital object identifiers.

Conversation threads

Twitter conversation kinds

In this study, twenty-nine conversation threads with more than ten interacting tweets were selected for an in-depth analysis of structure and contents. Table 9 shows descriptive statistics over these threads. The longest thread consisted of fifty-six and the median number of tweets was fifteen. Twitter conversations included between on and eleven participants with a median of four. The other metrics are velocity of the conversation, indicating the number of tweets per hour ranging between 0.0014 and 51 with a median of 1.2 tweets per hour and lastly a time length of the conversation lasting between 0.5 and 21,900 hours with a median of 14.5.


Table 9: General statistics about Twitter conversations.
MeanMedianStd.devMinMax
Volume18.971510.781056
Participants4.2841.89111
Velocity5.841.210.980.001451
Length806.6614.53,989.140.521,900

Aggregated characteristics of a Twitter conversation pertains to various forms of interaction. As shown in Figure 10, the node structure of a Twitter conversation could take on many different forms.

In analysing the aggregated features of the Twitter conversations, one could describe the visual appearances of the interactions between users. An interaction could take the form of a chain, in which each (or most) of the follow-on is consecutive and after each other in a long line. Alternatively, the discussion can have a star-shape, in which a central tweet is approached in many different conversations. Hybrid conversations could be described as connections were both the temporality of the chain is found, but where many different follow-on discussions are started later in the conversation and that does not interact with the original tweet. Within these segmented conversations each new conversation thread that is started could be labelled a bifurcation. Among the twenty-nine threads that were analysed eleven had chain form, one was shaped as a star and seventeen were hybrid with bifurcations along the line of conversation.

Figure 10: Aggregated features of Twitter conversations.

Figure 10: Aggregated features of Twitter conversations.

Of the twenty-nine Twitter threads that were chosen, fourteen were analysed in depth, in terms of the actual conversations that was related to the digital object identifier reference.

Twitter conversations: description

In tables 10-13, the fourteen threads chosen for qualitative analysis are described. For each thread indicated by an ID, thread type, and the measures mentioned above is calculated.


Table 10: Description of threads, part I.
IDTypeVelocityLengthVolumeParticipants
1Star0.252107
2Hybrid0.823194
3Chain0.725185
4Hybrid1.929564
5Hybrid1.122.5255
6Hybrid0.02840195
7Hybrid2.17.5165
8Chain1.211141
9Chain0.1100116
10Chain0.253123
11Hybrid0.001421,900294
12Hybrid300.5142
13Chain1.210123
14Hybrid12.21113

In Table 11, title and source journal for the first article indicated with a digital object identifier reference (if more than one is given in the conversation) in each conversation (ID number corresponds to the ID in the other tables in this set). As noted, many conversations were based on articles from the PLOS and in a count of the fifty most tweeted sources within the set (not shown here, half of the titles belonged to the PLOS consortium) it is almost exclusively journal articles from Public Libraries of Science (PLOS) that are found in more than singular numbers. The spread of research areas is mainly focused on the (bio-) sciences, while policy and education studies are found singularly within the twitter conversations. Another relevant observation is that a large part of these publications are open access publications. This could perhaps be a sign of researcher vanity: publishing accessible articles and self tweeting?


Table 11: Description of threads, part II. Article title and source journal for the first DOI in each conversation.
IDSourceTitle
1PLOS OnePlastic accumulation in the Mediterranean Sea
2PLOS Computational BiologySpeeding up ecological and evolutionary computations in R; Essentials of high performance computing for biologists
3PLOS OneThe repertoire of Archaea cultivated from severe periodontitis
4The New England Journal of MedicineCell-free DNA analysis for non-invasive examination of trisomy
5PLOS OneAbnormalities of AMPK activation and glucose uptake in cultured skeletal muscle cells from ndividuals with chronic fatigue syndrome
6The LancetEfficacy of paracetamol for acute low-back pain: a double-blind, randomised controlled trial
7Journal of Clinical InvestigationQuantification of mutant huntingtin protein in cerebrospinal fluid from Huntington's disease patients
8FigshareDatabase - medical education - Part I - The double standard test. (version 1.0)
9PLOS OneFgf21 impairs adipocyte insulin sensitivity in mice fed a low-carbohydrate, high-fat ketogenic diet
10Applied machine intelligence and informatics (SAMI), 2011 IEEE 9th International Symposium on Identification of carnatic raagas using hidden markov models
11RadiographyA taxonomy of anatomical and pathological entities to support commenting on radiographs (preliminary clinical evaluation)
12City: analysis of urban trends, culture, theory, policy, actionResponse: Building a better theory of the urban: A response to 'Towards a new epistemology of the urban?'
13City: analysis of urban trends, culture, theory, policy, actionTowards a new epistemology of the urban?
14NeuroimageSensible decoding

Content analysis of Twitter conversations

Coding was done bottom-up as described in the method section and four distinguished types were established. The types correspond to contents of singular tweets, meta (or communication), conversation (at the aggregated level) and non-academic. These types will be used to describe the Twitter conversations that were chosen in a conceptual manner. The fourteen Twitter conversations that were analysed are included in Table 12. Each row includes the text of the full tweet, or, if it is the title of the digital object identifier-referenced article, the label [Title], which could be looked up in Table 11. It also includes the interpreted topic of the conversation, and the codes that were used to describe the conversation. In Table 13, the themes and the codes that were developed for the interpretation in the qualitative content analysis are described. Below, we present an interpretation of the conversations by thematically illustrating our findings.


Table 12: Description of threads, part III. *[Title] means that the first tweet consisted of the title of the digital object identifiers article in question.
IDStart tweetTopicCodes
1[Title]*FisheriesTech, Ti, Co
2[Title]Computation in RCr, AffCr, PoImg, Arg, MQ, MA, IC
3#PLOSONE: [Title]PeriodontitisCr, TP, Sarc
4An invasive blood test may be a better way to diagnose Downs syndrome in fetuses at 10-14 weeks of pregnancy.Down's syndrome, blood test Cr, Sarc, Ti, Th, #, Arg, Fur
5[Title]Cronic fatique syndromeMon, Aff, Coll
6Non-opioid ED analgesia. #AAEM15 #FOAMedAnalgetics, Physical therapyTi, Aff, Coll, #
7We detected mutant huntingtin protein, the cause of Hunting-ton's disease, in cerebrospinal fluid for the first timeHuntington's diseaseTi, Coll, Aff, MQ, MA, TP
8The [username] and their school of [user-name] also have #big-pharma ties ( e.g., [username] )Big pharmaArg, #
9Low-carb diet impairs insulin sensitivity in mice. But they don't tell it's casein-based.LCHF dietTi, Cr, AffCr
10Have you ever wondered how apps like Shazam magically detect songs in a very short time? The key is Parson's codeMusic detectionGen, Mon, TA
11hey bro! Don't forget gmailRadiographyGen GQ, TA, Aff, MQ, MA, Me, #
12*long-ass whistle* read from "I am..." all way down to "...replicable city." damn.Urban studies, theoryRant, Me, Quot, Aff
13It's not clear to me, reading Brenner and Schmid, why we even need an urban theory, ~if~ the urban condition is so planetary and total.Urban studies, epistemologyQuot, Rant, Me, Cr
14Interesting commentary - what is MVPA orientation decoding in fMRI actually measuring?NeuroimagingAtt, MQ, Arg, Sarc


Table 13: Codes, explanations and examples.
TypeCodeExplanationExample
TechThTechnical issuese.g., "Link doesn't work"
ContentTiTitle retweeetTitle is retweeted in full or in parts. No other user input
QuotQuote from articlee.g., "'authors could tone down the hyperdrive on their academic prose, so as not to burn the retinas of those less attuned' lol" (note the "lol" comment after the quote.
DXDoi as Examplee.g., "I want to try periodicity measurement on images as in http://t.co/FSzsvXHWwb"
PoImgPointing out specific part in the articlee.g., https://twitter.com/jaimedash/status/ 583757221412151296/photo/1
CoComment on resultEither in own words or by quoting a short passage of the text: e.g., "between 1.000 and 3.000 tonnes!"
GQGeneral Questione.g., "I would be interested in your opinion of this"
MQMethodological Questionse.g., "What software do you use to read the files?", "Do you have a comparison re..."
MAMethodological answerAnswer to the question, e.g., "we have sens/spec/acc for each of the..."
CrCritical comments (topical)e.g., "I find it odd that..."
FurDiscussion is going further than the content of the articleIn an article on blood work for test of Down's syndrome: discussant starts discussion about insurance policies not related to the original conversation or DOI.
Communication (or: meta)AffAffirming result or heads upe.g., "Great Article!"
CollCollegial discussion no argumentse.g., "Very promising to see them get results in this area"
AffCrAffirming Critical commente.g., "Good point. I had not thought about that"
ICInvite college in discussione.g., "[username] to the rescue?"
Description of conversation*ArgArgumentation between respondersBack-and-forth
*MonMonologueA number of tweets from the same author
*TSTurned scientific Discussion turns scientific with the introduction of a DOI in a comment.
*ProsDiscussion turned prosaicGoes from academic to generic. e.g., in the conversation about "hubby just went to the dentist today. Dare I read this?"
Non academicGenGeneral Tweet (No DOI in first)First tweet not academic in nature.
SarcSarcasme.g., "#facepalm" Note the hashtag.
MeMetaCommenting stance, e.g., "being nosey"
RantRanting on contents"'Please, fellow geographers, leave the boxes to the sociologists, who absolutely worship them.' this gets raw lol."
Content

The biggest category is content, where aspects of what was said in the various tweets were coded. The codes are sorted by the level of complexity, so that a tweet only containing the title (Ti) is the simplest form. Here, no other user input is made other than the inclusion of a digital object identifier reference to the material in question. A quote from an article, followed by an identifier reference is another simple kind of tweet. Sometimes the quote is followed by an exclamation mark or internet slang such as 'lol' (laugh out loud).

Other kinds of content based tweets are tweets that exemplifies with a digital object identifier-referenced article, such as 'I want to try periodicity measurement on images as in [DOI]' (DX) or a reference to a specific part in a digital object identifier-referenced article such as an image (PoImg). More content based comments regard comments to the content of a digital object identifier-referenced article, either in the author's own words, or by quoting a short passage and then giving a verbal comment.

Questions posed by participants could be formed in a general fashion (GQ), such as I would be interested in your opinion of this or, more frequently, based on methodological aspects (MQ), e.g., What software do you use to read the files?, Do you have a comparison re... Methodological answers (MA) to those questions are often given in a direct form, e.g., we have sens/spec/acc for each of the... Lastly, critical thoughts (CR) are sometimes posed, such as e.g., I find it odd that... Sometimes these are sustained by another participant that responds with an affirming critical comment (AffCR) e.g., Good point. I had not thought about that.

Meta level

At the meta level, we find such statements as affirming comments or simple giving credit without focusing on the contents (Aff) e.g., Great Article! or Thanks for sharing your article. It was an interesting read. Other tweets address some of the content in a collegial manner (Coll) e.g., Very promising to see them get results in this area. Another meta action that is invoked at one time is the calling for help from another Twitter user to solve a question, e.g., [username] to the rescue? Some people do not ask for help but find seemingly obvious information by themselves and add a hashtag (#) to inform about it e.g., #letmegooglethatforyou. In this category, technical issues (Tech) are incorporated, although only one instance of such a mention was found e.g., Link doesn't work.

Conversation

The style of the conversations at the aggregate level could be distinguished by looking at the style of the whole conversation. There are monologues (Mon), including a long stream of tweets from the same participant, sometimes written in parts (1/3, 2/3, 3/3) and either including quotes from the article in question, or a longer argument that does not fit in one single tweet. Another style is the argument (Arg), where two or more participants argue back and forth on a topic related to the research in the article whose digital object identifier is mentioned.

Another distinguishing feature of the conversations is that a conversation switches between being academic in character and non-academic, or prosaic. In the first case, the conversation turns academic with the introduction of a digital object identifier in a comment (TA), while in the other (TP), the discussion turns prosaic. An example from the conversation on periodontitis one participant responds to the original tweet: hubby just went to the dentist today. Dare I read this?The conversation never returns to the academic realm after this. On the other hand, one Twitter conversation in the set goes further (Fur) than the content of the article; in relation to an article on blood work for test of Down's syndrome, the conversation turns to a discussion about insurance policies not related to the original conversation or the digital object identifier.

Non academic

The last kind of Twitter conversation practices that were identified could be labelled non-academic, since they do not involve any academic (or in some cases intellectual) content. First, there are general tweets, not relating to academic content in any way. Such conversations could turn academic (TA), but does not start as such. An example start Tweet in the collection is hey bro! Don't forget gmail, and another is Have you ever wondered how apps like Shazam magically detect songs in a very short time? The key is Parson's code, although the second indicates that there is some kind of answer to the generic question. Other indications of non-academic content is sarcastic comments (Sarc) e.g., #facepalm, or the tweet about #letmegooglethatfor you above. Note the hashtag in these examples. There are also comments at the meta level (Me), where a participant comments his/her stance towards the issue, by adding being nosey. Last, we find statements that are interpreted as ranting (Rant), 'Please, fellow geographers, leave the boxes to the sociologists, who absolutely worship them.' this gets raw lol.

In summary, the conversation style between participants was very varied and one could find both formal communication as well as tweets consisting almost entirely of Internet slang. Writing in the tweets is often very reflexive in the sense that authors seem to have taken on an internet persona and sometimes using slang and emoticons to convey mental feelings to their words. In a few rare instances communication is almost cordially polite, e.g.

[username] do you mind if I ask, is the CSF DNA neural in origin?

The answer, though, is short:

[username] with* haemoglobin.;

This still rends a polite appreciation from the first participant:

[username] Thanks for the clarification - much appreciated

Following up on Thelwall et al. (2013), we see that there is conversation beyond the digital object identifier references, but it does not seem to give deep insights into the reactions to the content referred to. Some tweets in the conversations could also be related to the more practical issues such as Wi-Fi passwords and dinner plans found by Gonzales (2014).

Conclusions

In this paper, we have demonstrated and exemplified a method for collecting conversations related to research articles and other academic sources. We first outlined the journals, disciplines and topics referred to, then looked at the most often mentioned and retweeted scientific content and finally analysed fourteen conversational threads emanating from a digital object identifier reference. The first observation made was that the tweets referring to a identifer did generate some follow-on conversation. By looking at tweets in relation to the conversation they can be analysed in their contexts. Such an analysis could be used as an extension to current altmetric methods. In the set of tweets, 2,992 unique papers were referred to. There was a quite wide range of topics or research areas referred to, both with mentions and retweets, but there was also a clear emphasis on the natural sciences, with social science being less visible. In related research, similar findings were made by Costas et al. (2015) and Shema et al. (2014). However, while biomedical and health sciences dominated the Twitter usage in the former study, our results are more diverse discipline-wise, with more references to computer science. This might be a consequence of aggregating the analysis to the most referenced source titles, which would not show large numbers of sources with few referenced papers each.

In a small qualitative analysis of Twitter conversations such as the one performed in this study, it is not easy to assess how representative the selected sample is of the whole population. Nevertheless, differences between the most mentioned digital object identifiers and the most retweeted identifiers were found that potentially has consequences for how to use Twitter data to measure impact. Eight identifiers were shared between the top mentions and the top retweets while only one frequently mentioned identifier was also discussed in one thread. Additionally, two frequently retweeted identifiers were discussed in threads. This point is potentially important as the Twitter indicator used by the service Altmetric.com, for example, only considers the number of users tweeting or retweeting a publication (Costas, et al., 2015). Here, it was not the most retweeted items that were most often mentioned and it was not the most mentioned or retweeted identifiers that were most talked about in a conversation context. Therefore we propose that an impact metric should be extended to include measures of visibility, spreadability and the ability to spark discussion to represent Twitter activity around scholarly work to be truly meaningful.

While the analysed conversations were selected purposively, some care was taken to maintain the ratio of research areas from the set of twenty-nine threads that were found with more than ten comments. Biology and health sciences were in the broad sense dominant, while certain data science, mathematical and social sciences also stand out. The conversations were very varied, ranging from serious discussion of concepts and methodology, to prosaic opinionated pieces with value-laden language, often amplified by the use of punctuation, emoticons and internet slang.

In our qualitative content analysis, we found that references to digital object identifier URLs were mainly used for promoting a paper, as conversation starter or as arguments in a discussion. It would be wise to separate the different usage types from each other in an analysis. By doing this we could for example analyse how maps based on non self-promoting references change over time. The self-promoting tweets are of interest as well, for example in an analysis of how or if they have impact on the success of an article. Our findings suggest that for judging scholarly impact, Twitter data and conversations should be used with caution but that it might have a potential for gauging social impact. Hence, we propose that some research on Twitter activity in relation to research can be shifted towards analysing how the public reacts to scientific reports, or what the public seems to find interesting. Collecting tweets referring to identifiers makes it possible to analyse some reaction to the published research. Collecting follow-on conversation makes it possible to analyse the reaction to the tweets referring to the research, thus painting a richer picture of how Twitter users react to research. While this particular study found few examples of extensive or substantial conversations or discussions around the identifiers, further research over longer time periods than one month is needed to confirm whether such communication exists on Twitter and of what relevance these discussions are. It would also be useful to search for other types than digital object identifiers, for example references to popular sources titles, especially if the research question regards the public interest.

Finally, from the large share of titles from PLOS, we could also presume that there is a high open access share in the mentioned articles, although this was not investigated here. Neither was the sender of each tweet investigated, which means that we are not able to discern any motives for posting the tweet that originated the conversation.

Acknowledgements

The authors wish to thank the anonymous referees for their comments, which were helpful for improving the quality of the paper.

About the authors

Dr. Gustaf Nelhans is Senior lecturer at the Swedish School of Library and Information Science (SSLIS) at University of Borås, SE-501 90 Borås, Sweden. His research focuses on the performativity of scientometric indicators as well as on the theory, methodology and research policy aspects of the scholarly publication in scientific practice using a science and technology studies (STS) perspective. Presently his focus of interest has shifted towards other forms of impact measure such as professional impact, i.e., in clinical guidelines.
David Gunnarsson Lorentzen is a PhD Student at the Swedish School of Library and Information Science, University of Borås, SE-501 90 Borås, Sweden. His main research interests concerns social media studies with a focus on method development. He received his Master's degree in Library and Information Science from University of Borås. He can be contacted at: david.gunnarsson_lorentzen@hb.se.

References
  • Åström, F. (2007). Changes in the LIS research front: time-sliced cocitation analyses of LIS journal articles, 1990-2004. Journal of the American Society for Information Science and Technology, 58(7), 947-957.
  • Bastian, M., Heymann, S. & Jacomy, M. (2009). Gephi: an open source software for exploring and manipulating networks. In Proceedings of the Third International AAAI Conference on Weblogs and Social Media. (pp. 361-362). Palo Alto, CA: AAAI. Retrieved from http://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154/1009 [Unable to archive.]
  • Bornmann, L. (2015a). What do altmetrics counts mean? A plea for content analyses. figshare.com. Retrieved from https://ndownloader.figshare.com/files/2206911 (Archived by WebCite® at http://www.webcitation.org/6hg5a8Yn7)
  • Bornmann, L. (2015b). Alternative metrics in scientometrics: a meta-analysis of research into three altmetrics. Scientometrics, 103(3), 1123-1144.
  • Bruns, A. (2012). How long is a tweet? Mapping dynamic conversation networks on Twitter using Gawk and Gephi. Information, Communication & Society, 15(9), 1323-1351.
  • Bruns, A. & Moe, H. (2013). Structural layers of communication on Twitter. In K. Weller, A. Bruns, J. Burgess, M. Mahrt & C. Puschmann (eds.), Twitter and Society, (pp. 15-28). New York, NY: Peter Lang.
  • Costas, R., Zahedi, Z. & Wouters, P. (2015). Do "altmetrics" correlate with citations? Extensive comparison of altmetric indicators with citations from a multidisciplinary perspective. Journal of the Association for Information Science and Technology, 66(10), 2003-2019.?
  • Glaser, B.G. & Strauss, A.L. (1967). The discovery of grounded theory: strategies for qualitative research. New Brunswick, NJ: Aldine Transaction.
  • Gonzales, L. (2014). An analysis of Twitter conversations at academic conferences. In Proceedings of the 32nd ACM International Conference on The Design of Communication CD-ROM (p. 4). New York, NY: ACM.
  • Hammarfelt, B. (2014). Using altmetrics for assessing research impact in the humanities. Scientometrics, 101(2), 1419-1430.
  • Haustein, S., Bowman, T. D., Holmberg, K., Peters, I. & Larivière, V. (2014). Astrophysicists on Twitter: an in-depth analysis of tweeting and scientific publication behavior. Aslib Journal of Information Management, 66(3), 279-296.
  • Haustein, S., Bowman, T. D., Holmberg, K., Tsou, A., Sugimoto, C. R. & Larivière, V. (2016). Tweets as impact indicators: examining the implications of automated "bot" accounts on Twitter. Journal of the Association for Information Science and Technology, 67(1), 232-238.
  • Haustein, S., Peters, I., Bar-Ilan, J., Priem, J., Shema, H. & Terliesner, J. (2014). Coverage and adoption of altmetrics sources in the bibliometric community. Scientometrics, 101(2), 1145-1163.
  • Haustein, S., Sugimoto, C., & Larivière, V. (2015). Guest editorial: social media in scholarly communication. Aslib Journal of Information Management, 67(3).
  • Highfield, T., Harrington, S., and Bruns, A. (2013). Twitter as a technology for audiencing and fandom: the #Eurovision phenomenon. Information, Communication and Society, 16(3), 315-339.
  • Holmberg, K., Bowman, T. D., Haustein, S., & Peters, I. (2014). Astrophysicists' conversational connections on Twitter. PLOS One, 9(8), 13.
  • Holmberg, K., & Thelwall, M. (2014). Disciplinary differences in twitter scholarly communication. Scientometrics, 101(2), 1027-1042.
  • Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14(1), 10-25.
  • Lorentzen, D.G., & Nolin, J. (in press). Approaching completeness: capturing a hashtagged Twitter conversation and its follow-on conversation. Social Science Computer Review. Retrieved from http://www.adm.hb.se/~dgu/papers/Approaching%20Completeness%20post-print.pdf (Archived by WebCite® at http://www.webcitation.org/6h6MeXKEh).
  • Moretti, F. (2013). Distant reading. London: Verso.
  • Orduña-Malea, E., Torres-Salinas, D., & Delgado López-Cózar, E. (2015). Hyperlinks embedded in twitter as a proxy for total external in-links to international university websites. Journal of the Association for Information Science and Technology, 66(7), 1447-1462.
  • Persson, O. (1994). The intellectual base and research fronts of JASIS 1986-1990. Journal of the American Society for Information Science, 45(1), 31-38.
  • Sci2 Team. (2009). Science of Science (Sci2) Tool. Bloomington, IN: Indiana University and SciTech Strategies. Retrieved from https://sci2.cns.iu.edu (Archived by WebCite® at http://www.webcitation.org/6hbYrjA1s)
  • Shema, H., Bar-Ilan, J. & Thelwall, M. (2014). Do blog citations correlate with a higher number of future citations? Research blogs as a potential source for alternative metrics. Journal of the Association for Information Science and Technology, 65(5), 1018-1027.
  • Thelwall, M., Haustein, S., Larivière, V., & Sugimoto, C. R. (2013). Do altmetrics work? Twitter and ten other social Web services. PLOS One, 8(5), 7.
  • Thelwall, M., Tsou, A., Weingart, S., Holmberg, K. & Haustein, S. (2013). Tweeting links to academic articles. Cybermetrics, 17(1). Retrieved from http://cybermetrics.cindoc.csic.es/articles/v17i1p1.pdf (Archived by WebCite® at http://www.webcitation.org/6hbZ0tcoi)
  • van Eck, N.J. & Waltman, L. (2014). Visualizing bibliometric networks. In Y. Ding, R. Rousseau & D. Wolfram (Eds.), Measuring scholarly impact: methods and practice (pp. 285-320). Berlin: Springer.
  • White, M. D., & Marsh, E. E. (2006). Content analysis: a flexible methodology. Library Trends, 55(1), 22-45.
How to cite this paper

Nelhans, G. & Lorentzen, D.G. (2015). Twitter conversation patterns related to research papers. Information Research, 21(2), paper SM2. Retrieved from http://InformationR.net/ir/21-2/SM2.html (Archived by WebCite® at http://www.webcitation.org/6hn1QAh41)

Check for citations, using Google Scholar


Appendices

Appendix I: The thirty most mentioned and retweeted items


Table 14: The 30 most often mentioned items. ME=mentions, RT=retweets. * Items among the 30 most often retweeted.
TitleDisciplineSourceYearMERTME rankRT rank
Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm*BiologyPLOS Biology2015 114 78 1 6
How to Get All Trials Reported: Audit, Better Data, and Individual Accountability*MedicinePLOS Medicine2015 38 42 2 10
Ten Simple Rules to Win a Nobel PrizeBiologyPLOS Computational Biology2015 33 3 3 421
Data science: Industry allureData scienceNature2015 25 12 4 67
Real-Time Visualization of Joint CavitationMedicinePLOS One2015 17 1 5 910
Association between an Internet-Based Measure of Area Racism and Black MortalityMedicinePLOS One2015 16 3 6 421
Plastic Accumulation in the Mediterranean Sea*ClimatologyPLOS One2015 15 72 7 7
The Rise of Partisanship and Super-Cooperators in the U.S. House of RepresentativesPolitical SciencePLOS One2015 15 6 7 188
Heritability of Attractiveness to Mosquitoes*BiologyPLOS One2015 14 38 9 12
Ten Simple Rules for Effective Online OutreachResearch and educationPLOS Computational Biology2015 14 7 9 158
Rationale for WHO's New Position Calling for Prompt Reporting and Public Disclosure of Interventional Clinical Trial Results*MedicinePLOS Medicine2015 12 102 11 3
Disrupting the subscription journals' business model for the necessary large-scale transformation to open access*Library and information sciencePublic repository2015 12 90 11 5
The future of the postdocResearch and educationNature2015 12 3 11 421
Testing Theories of American Politics: Elites, Interest Groups, and Average CitizensPolitical SciencePerspectives on Politics2014 12 0 11 N/A
Open Labware: 3-D Printing Your Own Lab EquipmentBiologyPLOS Biology2015 11 15 15 44
A Sunken Ship of the Desert at the River Danube in Tulln, AustriaBiologyPLOS One2015 11 15 15 44
Colour As a Signal for Entraining the Mammalian Circadian ClockBiologyPLOS Biology2015 10 4 17 295
Evidence for Sexual Dimorphism in the Plated Dinosaur Stegosaurus mjosi (Ornithischia, Stegosauria) from the Morrison Formation (Upper Jurassic) of Western USA*PaleontologyPLOS One2015 9 135 18 2
Ten Simple (Empirical) Rules for Writing ScienceResearch and educationPLOS Computational Biology2015 9 8 18 133
Faster Increases in Human Life Expectancy Could Lead to Slower Population AgingMedicinePLOS One2015 9 6 18 188
Lewontin's Paradox Resolved? In Larger Populations, Stronger Selection Erases More DiversityGeneticsPLOS Biology2015 9 5 18 239
Contributions of Incidence and Persistence to the Prevalence of Childhood Obesity during the Emerging Epidemic in DenmarkMedicinePLOS One2012 9 0 18 N/A
Statistics: P values are just the tip of the icebergStatisticsNature2015 9 0 18 N/A
Glaciology: Climatology on thin ice*ClimatologyNature2015 8 67 24 8
Are Cranial Biomechanical Simulation Data Linked to Known Diets in Extant Taxa? A Method for Applying Diet-Biomechanics Linkage Models to Infer Feeding Capability of Extinct SpeciesEcologyPLOS One2015 8 20 24 31
Characterizing Social Media Metrics of Scholarly Papers: The Effect of Document Properties and Collaboration PatternsLibrary and Information sciencePLOS One2015 8 20 24 31
Glowing Seashells: Diversity of Fossilized Coloration Patterns on Coral Reef-Associated Cone Snail (Gastropoda: Conidae) Shells from the Neogene of the Dominican RepublicPaleontologyPLOS One2015 8 15 24 44
The Quantitative Methods Boot Camp: Teaching Quantitative Thinking and Computing Skills to Graduate Students in the Life SciencesResearch and educationPLOS Computational Biology2015 8 8 24 133
Early Modern Humans and Morphological Variation in Southeast Asia: Fossil Evidence from Tam Pa Ling, LaosPaleontologyPLOS One2015 7 11 29 82
In vivo genome editing using Staphylococcus aureus Cas9GeneticsNature2015 7 1 29 910

Table 15: The 30 most often retweeted items. ME=mentions, RT=retweets. * Items among the 30 most often mentioned.
TitleDisciplineSourceYearMERTME rankRT rank
Crickets Are Not a Free Lunch: Protein Capture from Scalable Organic Side-Streams via High-Density Populations of Acheta domesticusBiologyPLOS One2015 5 1,014 52 1
Evidence for Sexual Dimorphism in the Plated Dinosaur Stegosaurus mjosi (Ornithischia, Stegosauria) from the Morrison Formation (Upper Jurassic) of Western USA*PaleontologyPLOS One2015 9 135 18 2
Rationale for WHO's New Position Calling for Prompt Reporting and Public Disclosure of Interventional Clinical Trial Results*MedicinePLOS Medicine2015 12 102 11 3
Klout Score-ranking of the top 15 science stars of TwitterN/AFigshare2015 0 96 N/A 4
Disrupting the subscription journals' business model for the necessary large-scale transformation to open access*Library and information sciencePublic repository2015 12 90 11 5
Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm*BiologyPLOS Biology2015 114 78 1 6
Plastic Accumulation in the Mediterranean Sea*ClimatologyPLOS One2015 15 72 7 7
Glaciology: Climatology on thin ice*ClimatologyNature2015 8 67 24 8
Reliability and sensitivity of a simple isometric posterior lower limb muscle test in professional football playersMedicineJournal of Sports Sciences2015 2 52 252 9
How to Get All Trials Reported: Audit, Better Data, and Individual Accountability*MedicinePLOS Medicine2015 38 42 2 10
GOBLET: The Global Organisation for Bioinformatics Learning, Education and TrainingBioinformaticsPLOS Computational Biology2015 0 42 N/A 10
Heritability of Attractiveness to Mosquitoes*BiologyPLOS One2015 14 38 10 12
Emerging ethical threats to client privacy in cloud communication and data storagePsychologyProfessional Psychology-Research and Practics2015 1 38 677 12
What Are Priorities for Deprescribing for Elderly Patients? Capturing the Voice of Practitioners: A Modified Delphi ProcessMedicinePLOS One2015 2 37 252 14
Quantification of mutant huntingtin protein in cerebrospinal fluid from Huntington's disease patientsMedicineJournal of Clinical Investigation2015 3 31 132 15
General Relationship of Global Topology, Local Dynamics, and Directionality in Large-Scale Brain NetworksBiologyPLOS Computational Biology2015 3 29 132 16
Extinction risk and conservation of the world's sharks and raysEcologyeLife2015 2 27 252 17
Systematic imaging reveals features and changing localization of mRNAs in Drosophila developmentBiologyeLife2015 2 26 252 18
First Record of Invasive Lionfish (Pterois volitans) for the Brazilian CoastEcologyPLOS One2015 5 25 52 19
A new semionotiform actinopterygian fish from the Mesozoic of Spain and its phylogenetic implicationsPaleontologyJournal of Systematic Palaeontology2015 1 25 677 19
Austrian Science Fund (FWF) Publication Cost Data 2014N/AFigshare2015 5 24 52 21
The Worst of Both Worlds: How U.S. and U.K. Models are Influencing Australian EducationEducationEPAA/ APEE2015 5 23 52 22
Speeding Up Ecological and Evolutionary Computations in R; Essentials of High Performance Computing for BiologistsBiologyPLOS Computational Biology2014 5 23 52 22
Development and Evolution of Dentition Pattern and Tooth Order in the Skates And Rays (Batoidea; Chondrichthyes)BiologyPLOS One2015 5 22 52 24
Qualities of knowledge brokers: reflections from practicePolicyEvidence & Policy2015 4 22 87 24
Exploring population size changes using SNP frequency spectraGeneticsNature Genetics2015 3 22 132 24
Worldwide access to treatment for end-stage kidney disease: a systematic reviewMedicineThe Lancet2015 3 22 132 24
Isotopic and structural constraints on the location of the Main Central thrust in the Annapurna Range, central Nepal HimalayaGeologyGeological Society of America Bulletin2013 1 22 677 24
Interactions among multiple invasive animalsEcologyEcology2015 1 21 677 29
Positional demands of professional rugbyMedicineEuropean Journal of Sport Science2015 1 21 677 29