Cancer information seeking in social question and answer services: identifying health-related topics in cancer questions on Yahoo! Answers
Cancer is the No. 2 leading cause of death in the U.S. Approximately 14.5 million Americans have a history of cancer and about 1.7 million new cases would be diagnosed in 2015 (American Cancer Society, 2015). As a serious and complex disease, cancer leads to profound psychological, social, and financial changes in life. To cope with this illness, nearly half of all Americans (48.7%) have looked for cancer-related information. Percentages were higher for those who have been affected by cancer (63.1% of cancer survivors and 54.6% of those with family histories) (Hessee, Arora, Burke and Rutten, 2008). Cancer patients look for information on a wide range of topics, including treatment, diagnosis and prognosis, rehabilitation, interpersonal relationships, and financial and legal information (Adams, Boulton & Watson, 2009; Yetisgen-Yildiz and Pratt, 2006), from many different sources, such as health professionals, the Internet, mass media, family and friends, and health organizations (Hesse et al., 2008; Yetisgen-Yildiz and Pratt, 2006).
In recent years, online cancer support groups have become an important information source for cancer patients. These cancer-specific communities exist in various forms, such as mailing lists, bulletin boards, web-based discussion forums and social networking sites (Eysenbach, Powell, Englesakis, Rizo and Stern, 2004). In these communities, patients and caregivers seek and provide information, express personal opinions, share personal experiences, and offer encouragement (Blank, Schmidt, Vangsness, Monteiro and Santagata, 2010; Ginossar, 2008). The practical and experiential information from peers, such as how to deal with day-to-day challenges and how to face the consequences of being ill, is highly valued, as it makes people feel better informed, better prepared to manage their illness, and less lonely (Nambisan, 2011; Rozmovits and Ziebland, 2004; van Uden-Kraan, et al., 2008). Among various forms of communities, Social question and answer services, comparatively new social media platforms for health information, have received little attention. Particularly, little is known about them as a source of cancer information.
The purpose of the current study was to investigate the cancer-related topics that users of these services are interested in and discuss. Social question and answer sites are community-based online services through which users ask and answer questions of one another about a wide range of topics in everyday life, including various health-related topics. We chose to study this venue due to the great variety and the high volume of cancer-related topics on it. The research questions we investigated were:
- What are the health topics that users discuss and share in questions about cancer in social question and answer sites?
- How frequently have health topics been discussed in cancer questions on such sites?
- What are the information contexts of the health topics that we could observe from cancer questions on such sites?
Previous studies have investigated health information seeking on social question and answer sites. For example, Kim, Pinkerton, and Ganesh (2011) analysed questions relating to Influenza A Virus (H1N1) and answers on Yahoo! Answers and identified important topic categories pertaining to H1N1 information such as flu-specific terms, medical and non-medical concerns, and sources of information. Studies also found that major types of information needs that consumers presented in questions on social question and answer sites included facts, explanations, advice, personal stories and emotional support (Westbrook, 2015; Zhang, 2013). Furthermore, earlier studies revealed that questions on these sites also include information that helps contextualize requests. For example, when asking about eating disorders, users provided personal narratives centred on past experiences and effects (Oh, He, Jeng, Mattern and Bowler, 2013). Zhang (2010) further noted that consumers also provide contextual information, such as demographic and social and environmental information, in order to elicit relevant answers. These previous studies have generated valuable insights into ways in which individual users formulate and express health-related questions, but fell short in revealing possible patterns at the population level due to the small set of data that was used (Oh et al., 2013; Westbrook, 2015, Zhang, 2013).
In the current study, however, we observed user expressions pertaining to cancer from a holistic point of view by collecting and analysing a large number of cancer-related questions posted on a social question and answer site (about 80,000 questions). Our focus was to examine how users convey cancer-related questions, to identify relevant topics that they discuss, and to understand users' cancer-related questioning behaviour. The terms people use in their questions could represent the health topics of their concerns. Therefore, we first extracted and analysed the terms from cancer questions to identify the topics. And then the frequencies of questions associated with the terms were counted to observe the common issues and concerns across the health topics. We further analysed the health topics by classifying them into several categories of information contexts in order to understand users' cognitive, affective, social or other status when seeking and sharing cancer information on these sites.
Health information consumers' questioning behaviour
Acquiring information, or information seeking, in online health communities is often achieved by asking peers questions (Rubenstein, 2015; Wildemuth, de Bliek, Friedman and Miya, 1994). White (2000) reported that about 23% of the messages posted on a colon cancer list contained questions. Ginossar (2008) reported that 17% of the messages on a lung cancer forum and 19% of the messages on a chroniclymphocytic leukaemia forum were questions.
Consumers' questioning behaviour in online health communities has been conceptualised and analysed from several different perspectives. First, it is viewed as a linguistic behaviour and researchers focus on identifying the linguistic characteristics of questions. For example, Smith, Stavri, and Chapman (2002) found that the majority of consumer vocabularies describing the clinical findings and features in questions that consumers submitted to a cancer information service matched with controlled health care terminologies. Slaughter, Soergel, and Rindflesch (2006) characterised the semantic relationships in consumer questions on a medical Website based on the relationship classes and structure of the Unified Medical Language System semantic network. Other linguistic characteristics, such as the length of questioning messages (Zhang, 2010) and the number of sentences contained in questions (Slaughter et al., 2006) were also examined.
Secondly, consumers' questioning behaviour is analysed in relation to the function of expected answers to a question. For example, White (2000) analysed questions on a colon cancer listserv using Graesser's taxonomy of questions (Graesser, McMahen and Johnson, 1994), and found that there were eighteen types of questions, out of the twenty outlined in the taxonomy. Among them, verification questions accounted for nearly half of the questions, followed by directives, concept completion questions, and judgmental questions.
Third, questions are conceptualized as manifestations of consumer health information needs, enacted to fill gaps in an individual's state of knowledge (Belkin, Oddy and Brooks, 1982). Subsequently, consumers' questioning behaviour is viewed as a manifestation of information needs and a form of information seeking. For example, Weinberg and Schmale (1996) found that questions on a breast cancer community website mainly asked about how peers were doing and if they had similar experiences. Klemm, Reppert, and Visich (1998) found that colorectal cancer patients sought information about treatments, alternative therapies, medications, and new drug delivery systems in an online community. White (2000) found that half of the questions in a colon cancer electronic list were medical-related; the subjects of particular concern were medications, diagnosis, treatments, epidemiology, prognosis, and diet.
Contexts for health information searching
Some studies went further to view questions as manifestations of consumers' current states of knowledge (Belkin et al., 1982) that can help contextualize their specific requests. White (2000) pointed out that the majority of cancer questions were accompanied by some type of contexts and rarely asked for information in complete isolation. Consumers established contexts by referring to previous messages, by describing reasons or purposes, or by telling a story. Gooden and Winefield (2007) observed that some people were unaware of exactly what information they needed, so that they provided incidental information to prompt others to discover potentially helpful questions. Following this line of thinking, Zhang (2010) recently examined health-related questions in social question and answer services and found that, in order to elicit relevant information from peers, consumers provided a variety of contextual information. Building on this analysis as well as the existing information behaviour models (Saracevic, 1997; Wilson, 1999), she further developed a layered model to describe how consumers contextualized their inquiries to acquire personally relevant information (Zhang, 2013).
In this study, we adopted Zhang's (2013) layered model of context for health information searching in order to help frame our exploration of how users convey their cancer information needs in questions. We chose Zhang's model for two reasons. First, it provides a comprehensive and systematic view of how users contextualize their health-related requests. Most widely used health information seeking models primarily focus on the health information searching process (Freimuth, Stein and Kean, 1989; Lenz, 1984) or antecedents to information seeking behaviour (e.g., use of a certain channel for health information (Johnson, 1997) with little attention to users' articulation of health information needs. Secondly, it was developed by examining health questions on social question and answer sites, in particular, and by analysing user expressions in depth. In this model, five layers of contextual factors were identified that consumers used to contextualise their requests when asking health questions on such sites, specifically,
- the demographic layer refers to the demographic information of patients, including age, sex, ethnicity, geographic location, an individual's height and weight.
- the cognitive layer refers to consumers' cognitive structures that motivate and enable them to search for information and make relevance judgments. It includes consumers' perceived topics of interest, types of information needed, and their cognitive abilities to articulate their needs.
- the affective layer refers to consumers' intentions, such as beliefs, motivations, and feelings. It includes four major emotional motives: reducing uncertainty, buffering stressful feelings, clearing suspicions, and avoiding embarrassment.
- the situational layer refers to consumers' understanding of their health conditions that give rise to their questions, including the current health problems and the current stage in the illness trajectory.
- the social and environmental layer refers to the social environment in which an individual's health problem is situated, such as one's social networks, perceived social norms, and information channels.
We used the model as a baseline for classifying the health and other topics identified from our data set and then to further develop and revise the model.
A total of 81,434 questions posted from 2009 to 2014 was randomly collected from the cancer category in Yahoo! Answers, using its application programming interface. Yahoo! Answers is one of the most frequently used social question and answer services with approximately 4.2 million visitors a month as of June 2016 (Quantcast, 2016). In our dataset, cancer questions vary in length, ranging from 2 to 1,691 words (mean = 84.1). Long questions often contain detailed descriptions of personal health situations (see Figure 1.)
Text mining is a machine-supported process of discovering new patterns of knowledge and information from unstructured textual data by utilizing various techniques and algorithms for information retrieval, information extraction, and natural language processing (Feldman & Dagan, 1995; Hotho, Nürnberger and Paaß, 2005). Text mining software, IBM® SPSS® Modeler Premium (SPSS Modeler), was used to extract terms from the 81,434 cancer questions.
Text mining often consists of three steps, (1) text pre-processing, (2) text representation, and (3) knowledge discovery (Hu and Liu, 2012). We carried out text mining following these steps. First, the SPSS Modeler was used to pre-process the questions by removing stop words, combining different forms of a word (stemming), and cleaning up synonyms, acronyms, and typos in cancer questions. The American-English Dictionary and Medical Subject Headings were used to identify both generic and medical terms in questions.
Second, the SPSS Modeler extracted terms from the pre-processed text for text representation. If a term appears multiple times in a question, it was counted only once in our analysis. Thus, the unique number of questions per term was counted. SPSS Modeler enables extraction of up to 5,000 concepts from a data set by default. We used z-score to find an appropriate cut-off point to represent the common health topics discussed in questions. The calculation using a z-score cut off of 0 resulted in a total of 751 terms, each of which appeared in more than 189 questions.
Third, the 751 terms and a set of associated questions were reviewed to identify the contextual layers related to cancer information seeking in the questions. The five layers in Zhang's (2013) model served as the top-level categories. In the meantime, the definition of each layer as well as subcategories under some layers evolved. Also at this stage, the terms that were too general to identify contexts (i.e., people, person, health, place, life) were removed from further analysis. As a result, 420 specific terms were identified; they appeared in 79,125 questions that represent askers' cancer-related concerns.
All three authors participated in classifying the terms into categories and subcategories at the third stage of analysis and each of them coded a random sample of 118 terms (about 16% of the total number of terms used for the manual review) independently. The intercoder reliability among the three coders, measured using Cohen's (1960) κ reached 0.81, indicating an almost-perfect level of agreement. According to Landis and Koch's (1997) scale of Cohen's κ, the value between 0.81 and 1.00 indicates almost perfect in the degree of concordance (Fair - 0.21–0.40; Moderate - 0.41–0.60; Substantial - 0.61–0.80).
Our analyses resulted in six layers of context to account for consumers' questioning behaviour. The six layers are:
- the demographic layer includes demographic information of consumers such as sex and age.
- the cognitive layer refers to consumers' representations of their current medical conditions and the information needs that result from their conditions.
- the affective layer refers to expressions or statements related to consumers' emotional status in relation to their cancer situations.
- the situational layer refers to consumers' habitual way of living, or lifestyles, and the environment in which he/she is situated.
- the social layer consists of social relationships that askers identified in their questions.
- the technical layer indicates the information resources and social supports that consumers' access to seek health information.
Table 1 shows an overview of the six layers and subcategories of each layer. The percentages of the terms and questions were calculated based on the total number of terms (n= 420) and the questions (n= 79,125), respectively.
|Contextual layers||Term categories||Terms||Questions|
|Cognitive layer||Diseases and conditions||33||7.9%||56,491||71.4%|
|Body parts and body systems||85||20.2%||43,324||54.8%|
|Affective layer||Negative feelings||17||4.0%||9,890||12.5%|
|Situational layer||Habitual or addictive behaviour||5||1.4%||8,146||10.3%|
|Sexual behaviour, pregnancy, and birth||6||1.4%||2,522||3.2%|
|Social layer||Relationships with family members||18||4.3%||23,235||29.4%|
|Relationships with health care providers/services||14||3.3%||21,319||26.9%|
|Relationships with acquaintance||2||0.5%||5,172||6.5%|
|Technical layer||Social supports||17||4.0%||31,791||40.2%|
Information related to the cognitive layer appeared in 95.4% of the questions, followed by information related to the social, technical, and situational layers. Demographic and affective information appeared in relatively fewer questions. As for the number of terms, the cognitive layer contained the greatest number (267 terms), with one-third being about symptoms or body locations/systems. The situational layer contained the second greatest number of terms (49 terms), followed by the social layer (37 terms) and the technical layer (34 terms).
Figure 2 shows a comparison of the terms and question distributions across contextual layers. The cognitive layer contains the greatest number of both terms and questions, compared with other categories. The terms assigned to social and technical layers are less than 10% of each (social: 8.8%, technical: 8.1%), but these terms appeared in almost half of the questions (social: 48.1%, and technical: 45.2%) in the data set.
The details of the terms and the layers are explained in the rest of this section. The numbers of terms assigned to each layer ranged widely, from 13 to 267. Thus, the top five most frequently occurring terms in each term category or sub-category are presented in the paper. The rest of the terms in each layer/table are available from the project website (http://socialqa.cci.fsu.edu/cancer/).
Thirteen terms, primarily concerning age groups and sex, appeared in a total of 13,744 (17.3 %) questions (see Table 2). The percentage in Table 2 was calculated based on the total number of questions in the demographic layer (n = 13,744).
Askers may or may not specify the demographic groups of interests in questions. The analysis we made in this study was based on askers' descriptions in questions. It was observed that askers' interests in information pertaining to sex were specified using terms including male, female, man, woman, guy, or lady. Their interests in information pertaining to age groups were specified using terms including child, teenager, young adult or young people, and adult. Some askers also provided information about ethnicity, such as Asian (in 99 questions) and Hispanic (in 28 questions). These terms, however, were not included in the analysis due to their low frequencies of occurrence.
A total of 267 terms were identified from 75,488 questions (95.4%). Due to a high volume of terms and questions in this category, the terms pertaining to specific diseases and conditions, body parts and systems, symptoms, treatments, and tests were further classified into sub-categories.
Specific diseases and conditions
Thirty three terms indicating cancer or other conditions were identified from 56,491 questions. Among them, 26 were cancer diseases (mentioned in 55,171 questions (73.1%)) and seven were other conditions (mentioned in 4,465 questions (5.9%)). Table 3 shows the top five most frequently occurring cancer-related conditions, excluding the general terms, cancer and tumour. The percentage in Table 3 was calculated based on the total number of questions in the cognitive layer (n = 75,488).
|human papilloma virus||527||0.7%|
|sexually transmitted diseases||302||0.4%|
When mentioning specific cancers, askers were attempting to specify the type of cancer that they wanted to know about. Other conditions were often mentioned as an alternative diagnosis to cancer, an accompanying disease to cancer, or a condition that may lead to cancer.
Body location and systems
Askers specified body location or body systems to indicate the sites from which the cancer emanates or from where the symptoms are shown; 85 terms were identified from 43,324 questions (54.8%) and categorised based on the body location or system schema adapted from MedlinePlus (U.S. National Library of Medicine, 2015) (See Table 4). The percentages in Table 4 were calculated based on the total number of questions in the cognitive layer (n = 75,488).
|Heads and necks||neck||3,772||5.0%|
|Bones, joints, and muscles||arm||2,191||2.9%|
|Lungs and breathing||lung||2,252||3.0%|
|Immune system||lymph nodes||4,263||5.6%|
|bone tissue marrow||510||0.7%|
|Cells and genes||cell||1,390||1.8%|
|Skin, hair, and nails||skin||2,807||3.7%|
|Endocrine system||thyroid gland||1,238||1.6%|
|Brain and nerves||brain||2,215||2.9%|
|Brain and nerves||heart||1,123||1.5%|
A variety of body parts and systems were mentioned in questions. Terms related to reproductive systems were the most frequently observed, followed by many other body parts and systems shown in Table 4 (The categories were ordered by the number of terms assigned to each category). For example, askers often observed lump or bumpy texture in breasts, necks, and armpits. They also mentioned immune systems, endocrine systems and cells and genes about which they may not have observed physical changes or sensational pains directly but in relation to their medical histories of having certain conditions in the past, i.e., a change in results of blood tests.
The greatest number of terms (107) was classified into the symptoms category, which shows a variety of symptoms that cause askers to be suspicious of having cancer. These terms appeared in 37,641 (47.6%) questions. The general term symptom appeared in 6,828 questions. The remaining 106 terms were classified into two categories depending on whether they are visually observable or can only be felt by the patient (See Table 5). The percentage in Table 5 was calculated based on the total number of questions in the cognitive layer (n = 75,488).
|Visual or physical changes||lump, mass||8,857||11.7%|
Askers discussed their conditions when noticing a change in their bodies, such as lumps, masses, bumps moles, and bleeding (haemorrhage or spotting), or any changes in colour (e.g., red, dark) or size (e.g., swelling) of these existing problems. Major symptoms that can only be felt by the patients included pain, headache, pressure, fever, and burning. In addition, askers mentioned fatigue, nausea, a change in appetite, a problem in respiration, diarrhoea, back pain, dizziness, hearing problem, and anaemia.
Twenty-three treatment-related terms were identified from 22,369 (28.3%) questions. The term treatment was in 7,495 questions (9.9%). The remaining terms were grouped into three sub-categories, therapy, procedure, and medicine (See Table 6). The percentage in Table 6 was calculated based on the total number of questions in the cognitive layer (n = 75,488).
Chemotherapy appeared the most frequently, followed by radiation therapy. Many surgical procedures were discussed, with the term surgery appearing the most frequently. Antibiotics and shots were mentioned in order to ask about medicines people were taking or willing to take before or after their cancer treatments. Askers were also concerned about side effects and dosages of medications.
A total of 19 test-related terms were identified from 13,901 (17.6%) questions. The general term test and examination appeared in 4,632 (6.1%) questions. The remaining terms were classified into diagnostic/monitoring tests and screening tests (see Table 7). The percentage in Table 7 was calculated based on the total number of questions in the cognitive layer (n = 75,488).
|Diagnostic/ monitoring tests||scanning||2,969||3.9%|
|cbc (complete blood count)||409||0.5%|
The most frequently occurring terms were common diagnostic or monitoring tests, including scanning, biopsy, haematological tests, ultrasound, and x-ray. Askers posted questions about what a test is, what it entails, or what would follow; or they simply provided tests as a piece of background information to help readers interpret their questions. Among screening tests, the most frequently mentioned was mammography. Askers primarily sought help with interpreting test results.
Twenty terms describing negative or positive feelings were identified from a total of 11,275 questions (see Table 8). The percentage in Table 8 was calculated based on the total number of questions in the affective layer (n = 11,275).
Most feelings were negative, with worry, freaking, and anxiety topping the list. One term was worth noting, paranoid. It suggests that questioners acknowledged that their concerns might not be rational. Nevertheless, positive feelings also appeared, with “love” appearing the most frequently.
Forty-nine terms related to this layer were identified from 23,234 questions (29.4%). These terms were classified into five categories (see Table 9). The percentage in Table 9 was calculated based on the total number of questions in the situational layer (n = 23,234).
|Habitual or addictive behaviour||smoking||5,975||25.7%|
|Sexual behaviour, pregnancy and birth||pregnancy||949||4.1%|
Three health-related activities, smoking, eating, and drinking, appeared in the greatest number of questions in this layer. Terms, including money, health insurance, and jobs, topped the list related to individuals' social and economic situations, illustrating askers' financial concerns. In terms of sexual behaviour, most of the top five terms were related to pregnancy and child birth. In terms of environmental exposure, the term vaccine mostly referred to human papilloma virus (cervical cancer) vaccines. Many debatable issues pertaining to getting such vaccines were posted. The terms sun and tanning were often associated with skin cancer.
Thirty-seven terms designating users' social relationships were identified from 38,082 questions (48.1%). The social relationships primarily were family members and health care providers (see Table 10). The percentage in Table 10 was calculated based on the total number of questions in the social layer (n = 38,082).
|health care providers/services||doctor||16,819||44.2%|
In some cases, askers were concerned about family members' health conditions as care-givers. In others, family members were referred to in order to help explain the askers' own conditions. Askers mentioned health care providers, for example, to ask if they need to see doctors. In some cases, askers posted diagnoses from doctors to elicit second opinions.
Thirty-four terms related to this layer were identified from 35,754 questions (45.2%) (see Table 11). The percentage was calculated based on the total number of questions in the technical layer (n = 35,754)
Terms in the social supports category mostly indicated users' information goals or their expectations about answers from peers in the community. Terms that topped the list were mostly general requests such as help, need, question, and answer. The most frequently occurring term related to information sources were research, web, internet, report, and news. Information sources were often provided as a piece of background information to set the stage for further questions.
An exploratory approach was taken in the current study to examine consumers' topics of interest and concerns about cancer by extracting and analysing terms that they used to express their needs in cancer questions. Findings demonstrated that, to seek personally-relevant cancer-related information on social question and answer sites, askers disclose multiple layers of personal information, including demographic, cognitive, affective, social, situational and technical information, to contextualise their requests. Our results confirmed earlier observations of consumers' health-related questions posted in online health communities (Weinberg and Schmale, 1996; Wilson, 1999; Zhang, 2010) in the sense that a variety of topics pertaining to health and other associated issues in life are discussed in questions.
Additionally, our analysis contributed to a comprehensive understanding of cancer information needs by revealing the most commonly appearing factors in each contextual layer. In the demographic layer, sex and age are the most frequently mentioned factors, indicating that askers believe that they could obtain more pertinent information by providing this information. This phenomenon is consistent with scientific research that both factors are important indicators of the risk of having certain types of cancers (Claus, Risch and Thompson, 1990; Harris, Zang, Anderson and Wynder, 1993).
The cognitive layer shows askers' representations of their current medical situations and their information needs. The diversity of the information needs of cancer patients and care givers confirmed results from prior studies (Rutten, Arora, Bakos, Aziz and Rowland , 2005; Rutten, Squiers and Treiman, 2006). Our study further revealed that askers often mentioned cancer together with cold, infection, human papilloma virus, sexually transmitted diseases, and diabetes, indicating that people may associate these conditions with a high risk of developing cancer or a higher probability of co-occurrence with a specific type of cancer. Many of these beliefs are consistent with scientific knowledge; it is well known that human papilloma virus and sexually transmitted diseases could be causes of cancers related to reproductive systems, such as cervical cancer and vulvar cancer (Centers for Disease Control and Prevention, 2015).
In the affective layer, feelings expressed in questions were mostly negative, represented by worry, freaking, and anxiety. We brought special attention to one term, paranoid. The term suggests the cyberchondria phenomenon, where people's health concerns are escalated irrationally over the course of information searching (White and Horvitz, 2009). We also found positive feelings, represented by love, trust, and fun. This may be explained by the fact that many of the questions were asked by care-givers and they tended to express their affection for their loved ones in questions.
The situational layer includes users' health behaviour and socio-economic situations. Health behaviour is one of the important topics in cancer information seeking (Shim, Kelly and Hornik, 2006), as major health behaviour and lifestyle changes can prevent certain types of cancer (Anand et al., 2008). Our study further revealed that the askers mentioned smoking, drinking, sex, drug use, vaccines, as well as sun exposure and tanning. Additionally, askers mentioned money and health insurance, indicating their struggle with or interest in financial-related issues, which has also been revealed in previous studies (Rutten et al., 2005).
In the social layer, terms represented family, friends, and acquaintances. They could be cancer patients, care-givers, or someone sharing an asker's family medical history. Their appearance vividly suggests that cancer is not a personal issue, rather, it affects individual's social ties, particularly families, in many different ways. Cancer patients search for information about medical systems, primarily looking for whether or not a physician has sufficient experiences or qualifications (Rutten et al., 2005). Our findings also indicate that askers post questions after consulting with their doctors, for various reasons; they may not have had enough time to fully discuss their concerns with doctors, or they may not trust their doctors (White and Horvitz, 2009).
The technical layer contains information sources and information-related social supports. Two terms web and Internet appeared frequently, indicating their status as major sources of cancer-related information. Earlier studies on questions in social question and answer environments indicated that askers requested a variety of information, including facts, explanations, advice, personal stories and emotional support (Westbrook, 2015; Zhang, 2013). The current study corroborated these findings by revealing that askers used terms, such as research, experience, idea, advice, and opinion to express their needs.
Our study has several implications. Theoretically, consumers' health information needs presented by natural language were systematically examined and framed within the contexts of health information behaviour. Findings, in one way, confirmed the layered model of contexts in health information searching by disclosing various topics that askers seek and share in questions. We adopted Zhang's model to guide our data analysis because it is specifically concerned about consumers' information needs in the context of health problems. Based on our empirical analysis, we revised the model, which specified the most common health topics in each layer in the case of cancer information seeking. It also highlighted the intense discussions about social and technical issues, in addition to the cognitive aspects of information seeking in health questions. Correspondingly we reorganized the layers in the original model and assign some of the layers with new definitions (see Figure 3).
Specifically, the demographic layer focuses on the two most frequently discussed topics in cancer-related questions, sex and age. Zhang's model specifies three cognitive contextual factors: perceived topics of interest, types of information, and consumers' cognitive abilities to articulate their needs. The current model blended the three factors and emphasized the medical aspect of users' representations of their conditions (i.e., diseases, symptoms, treatments, and tests). Non-medical information associated with health behaviour and lifestyles are moved to the situational layer. The social and environmental layer was broken down into two layers: the social layer mainly focuses on the relationships with other people and the technical layer includes types and channels of information and social supports askers seek and share in questions. In the future, the revised model could be applied to analyse the context of health information needs discussed in other types of online communities or social media.
The major methodological implication of the study is that text mining, facilitated by appropriate theoretical lenses, could be an effective way to help understand information seeking behaviour at the population level. Several studies utilized the text mining method to examine users' information needs, but they mainly used cluster analysis to identify the most common topics appearing in online health communities (Chen, 2012; Kim et al., 2011). In our study, going beyond identifying the most common themes, we further analysed the extracted terms based on Zhang's (2013) layered model of context for health information searching. This model-based approach is fruitful. It allowed us to more systematically examine users' cancer-related questioning behaviour in the social question and answer context. This approach could be adopted to examine questions and answers regarding other topics in such contexts or other types of social media.
Several practical implications can be drawn from this study. First, the findings of this study could be useful for health care providers, especially physicians, to better understand their patients' concerns regarding cancer. They could learn about what kinds of symptoms cause their patients to believe that they may have cancer, what makes their patients hesitate to have tests or treatments, and what are the situational factors causing their patients to believe they may have cancer. Health care providers could develop materials for health promotion and education, for example, to deliver information that addresses consumers' concerns in questions. Second, the results highlighted the cyberchondria phenomenon related to health anxieties and the escalation of such anxieties in the social question and answerr environment. Prior studies found that cancer patients' stresses and anxiety were mitigated when they received personalised messages (Mayer et al., 2007). Social question and answer sites could be an ideal environment for seeking or receiving personalized answers due to its human computation nature; nevertheless, there is still room for developers to think about how to provide users with more personalised answers. For example, systems can provide certain metadata to describe answerers' age ranges or sex to help contextualise the answers. Our findings also could inform the design of general health information search systems. Specifically, an information search system could enable users to specify their cancer information inquiries according to various demographic, cognitive, affective, social, situational, and technical parameters to receive more tailored search results.
This study has a few limitations. First, the current study took a descriptive approach to analyse data and it may not reveal the latent and potential relationships among the topics in health questions. The current study was, however, useful to facilitate rich data gathering and analysis and could be used to develop follow-up studies, examining the distributions and applications of the topics in health information seeking and sharing. Secondly, the text mining technique that we used could be improved. For example, we could not infer the meaning of numbers and thus we were not able to determine whether a particular number refers to weight, height, age, or cancer stage. Also, unlike queries, which indicate specific subjects searched, our text mining techniques only allow us to identify what's being expressed, while lack the ability to differentiate what is being requested and what constitutes background information. Third, both the original and the augmented layered model of contexts were developed based on questions collected from Yahoo! Answers. Yahoo! Answers is one of the most frequently visited social question and answer sites, but may not represent users' questioning behaviour in all such sites.
The current study analysed a large number of cancer-related questions using the text mining method coupled with a manual review of a subset of questions. The analysis identified 420 terms distributed across six layers, including demographic, cognitive, affective, situational, social, and technical layers. These terms represent topics or issues that askers were concerned about the most and characterize people's question asking in social question and answer environments. The important findings include:
- Users provided different layers of information, including demographic, cognitive, affective, situational, social, and technical information to help contextualize their information needs expressed in social question and answer sites.
- The most commonly mentioned demographic information were sex and age.
- Askers often mentioned cancer together with conditions including cold, infection, human papilloma virus, sexually transmitted diseases and diabetes. They also often mentioned their health behaviors, most commonly smoking, drinking, and eating, indicating that askers have a comparatively holistic view of cancer as a disease.
- The most expressed emotions were negative, represented by terms worry, freaking, and anxiety. At the same time, askers recognised that they may experience health anxieties and irrational escalation of anxieties while asking questions in the social question and answer environment.
In future studies, we will further develop our text mining techniques by adopting semantic approaches with which to analyse the messages embedded in health questions, and possibly the answers as well, to examine the exchange of information and social support between askers and answerers.
About the author
Sanghee Oh is an Assistant Professor in the School of Information at Florida State University. She obtained her PhD in Information and Library Science from the University of North Carolina at Chapel Hill, and her Master of Library and Information Science from the University of California at Los Angeles. Her areas of research interest are health information behaviour, health informatics, social informatics, social media use, human-computer interaction, and digital libraries. She can be contacted at firstname.lastname@example.org.
Yan Zhang is an Associate Professor in the School of Information at the University of Texas at Austin. She received her PhD from the University of North Carolina at Chapel Hill. Her research focus is information behaviour with emphases on consumer health information search behaviour and consumer health information system design. She can be contacted at email@example.com.
Min Sook Park is a doctoral candidate in School of Information at the Florida State University. She also received her Masters in Library and Information Science from the Florida State University. Her research interests lie at the intersection of information behaviour, data science, social informatics, and information organization. She can be contacted at firstname.lastname@example.org.