Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Stud Health Technol Inform ; 310: 659-663, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269891

RESUMO

Electronic Nicotine Delivery Systems (ENDS) use has increased substantially in the United States since 2010. To date, there is limited evidence regarding the nature and extent of ENDS documentation in the clinical note. In this work we investigate the effectiveness of different approaches to identify a patient's documented ENDS use. We report on the development and validation of a natural language processing system to identify patients with explicit documentation of ENDS using a large national cohort of patients at the United States Department of Veterans Affairs.


Assuntos
Sistemas Eletrônicos de Liberação de Nicotina , Vaping , Estados Unidos , Humanos , Processamento de Linguagem Natural , Documentação , United States Department of Veterans Affairs
2.
JCO Clin Cancer Inform ; 7: e2200131, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36753686

RESUMO

PURPOSE: Histopathologic features are critical for studying risk factors of colorectal polyps, but remain deeply embedded within unstructured pathology reports, requiring costly and time-consuming manual abstraction for research. In this study, we developed and evaluated a natural language processing (NLP) pipeline to automatically extract histopathologic features of colorectal polyps from pathology reports, with an emphasis on individual polyp size. These data were then linked with structured electronic health record (EHR) data, creating an analysis-ready epidemiologic data set. METHODS: We obtained 24,584 pathology reports from colonoscopies performed at the University of Utah's Gastroenterology Clinic. Two investigators annotated 350 reports to determine inter-rater agreement, develop an annotation scheme, and create a reference standard for performance evaluation. The pipeline was then developed, and performance was compared against the reference for extracting polyp location, histology, size, shape, dysplasia, and the number of polyps. Finally, the pipeline was applied to 24,225 unseen reports and NLP-extracted data were linked with structured EHR data. RESULTS: Across all features, our pipeline achieved a precision of 98.9%, a recall of 98.0%, and an F1-score of 98.4%. In patients with polyps, the pipeline correctly extracted 95.6% of sizes, 97.2% of polyp locations, 97.8% of histology, 98.3% of shapes, and 98.3% of dysplasia levels. When applied to unseen data, the pipeline classified 12,889 patients as having polyps, 4,907 patients without polyps, and extracted the features of 28,387 polyps. Tubular adenomas were the most common subtype (55.9%), 8.1% of polyps were advanced adenomas, and the mean polyp size was 0.57 (±0.4) cm. CONCLUSION: Our pipeline extracted histopathologic features of colorectal polyps from colonoscopy pathology reports, most notably individual polyp sizes, with considerable accuracy. This study demonstrates the utility of NLP for extracting polyp features and linking these data with EHR data to create an epidemiologic data set to study colorectal polyp risk factors and outcomes.


Assuntos
Adenoma , Pólipos do Colo , Neoplasias Colorretais , Humanos , Pólipos do Colo/diagnóstico , Pólipos do Colo/epidemiologia , Pólipos do Colo/patologia , Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/epidemiologia , Neoplasias Colorretais/patologia , Processamento de Linguagem Natural , Adenoma/diagnóstico , Adenoma/epidemiologia , Adenoma/patologia , Estudos Epidemiológicos , Hiperplasia
3.
J Med Internet Res ; 25: e36667, 2023 02 27.
Artigo em Inglês | MEDLINE | ID: mdl-36848191

RESUMO

BACKGROUND: The use and acceptance of medicinal cannabis is on the rise across the globe. To support the interests of public health, evidence relating to its use, effects, and safety is required to match this community demand. Web-based user-generated data are often used by researchers and public health organizations for the investigation of consumer perceptions, market forces, population behaviors, and for pharmacoepidemiology. OBJECTIVE: In this review, we aimed to summarize the findings of studies that have used user-generated text as a data source to study medicinal cannabis or the use of cannabis as medicine. Our objectives were to categorize the insights provided by social media research on cannabis as medicine and describe the role of social media for consumers using medicinal cannabis. METHODS: The inclusion criteria for this review were primary research studies and reviews that reported on the analysis of web-based user-generated content on cannabis as medicine. The MEDLINE, Scopus, Web of Science, and Embase databases were searched from January 1974 to April 2022. RESULTS: We examined 42 studies published in English and found that consumers value their ability to exchange experiences on the web and tend to rely on web-based information sources. Cannabis discussions have portrayed the substance as a safe and natural medicine to help with many health conditions including cancer, sleep disorders, chronic pain, opioid use disorders, headaches, asthma, bowel disease, anxiety, depression, and posttraumatic stress disorder. These discussions provide a rich resource for researchers to investigate medicinal cannabis-related consumer sentiment and experiences, including the opportunity to monitor cannabis effects and adverse events, given the anecdotal and often biased nature of the information is properly accounted for. CONCLUSIONS: The extensive web-based presence of the cannabis industry coupled with the conversational nature of social media discourse results in rich but potentially biased information that is often not well-supported by scientific evidence. This review summarizes what social media is saying about the medicinal use of cannabis and discusses the challenges faced by health governance agencies and professionals to make use of web-based resources to both learn from medicinal cannabis users and provide factual, timely, and reliable evidence-based health information to consumers.


Assuntos
Cannabis , Maconha Medicinal , Mídias Sociais , Humanos , Maconha Medicinal/uso terapêutico , Opinião Pública , Saúde Pública
4.
Drug Alcohol Depend ; 228: 109016, 2021 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-34560332

RESUMO

INTRODUCTION: The relationship between cannabis, tobacco, and vaping devices is both rapidly changing and poorly understood, with consumers rapidly shifting between use of all three product types. Given this dynamic and evolving landscape, there is an urgent need to monitor and better understand co-use, dual-use, and transition patterns between these products. This study describes work that utilizes social media - in this case, Reddit - in conjunction with automated Natural Language Processing (NLP) methods to better understand cannabis, tobacco, and vaping device product usage patterns. METHODS: We collected Reddit data from the period 2013-2018, sourced from eight popular, high-volume Reddit communities (subreddits) related to the three product categories. We then manually annotated (coded) a set of 2640 Reddit posts and trained a machine learning-based NLP algorithm to automatically identify and disambiguate between cannabis or tobacco mentions (both smoking and vaping) in Reddit posts. This classifier was then applied to all data derived from the eight subreddits, 767,788 posts in total. RESULTS: The NLP algorithm achieved an overall moderate performance (overall F-score of 0.77). When applied to our large corpus of Reddit posts, we discovered that over 10% of posts in the smoking cessation subreddit r/stopsmoking were classified as referring to vaping nicotine, and that only 2% of posts from the subreddits r/electronic_cigarette and r/vaping were classified as referring to smoking (tobacco) cessation. CONCLUSIONS: This study presents the results of applying an NLP algorithm designed to identify and distinguish between cannabis and tobacco mentions (both smoking and vaping) in Reddit posts, hence contributing to our currently limited understanding of co-use, dual-use, and transition patterns between these products.


Assuntos
Cannabis , Sistemas Eletrônicos de Liberação de Nicotina , Mídias Sociais , Produtos do Tabaco , Vaping , Humanos , Processamento de Linguagem Natural , Prevalência , Nicotiana
5.
AMIA Annu Symp Proc ; 2021: 343-351, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35308940

RESUMO

Use of Electronic Nicotine Delivery Systems (ENDS, colloquially known as "electronic cigarettes") has increased substantially in the United States in the decade since 2010. However, currently relatively little is known regarding the documentation of ENDS use in clinical notes. With this study, we describe the development of an annotation scheme (and associated annotated corpus) consisting of 4,351 ENDS mentions derived from Department of Veterans Affairs clinical notes during the period 2010-2020. Analysis of our corpus provides important insights into ENDS documentation practices at the VA, in addition to providing a resource for the future development and validation of Natural Language Processing algorithms capable of reliably identifying ENDS-use status.


Assuntos
Sistemas Eletrônicos de Liberação de Nicotina , Vaping , Veteranos , Documentação , Humanos , Processamento de Linguagem Natural , Estados Unidos
6.
Front Public Health ; 9: 738513, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35071153

RESUMO

Background: Perceptions of tobacco, cannabis, and electronic nicotine delivery systems (ENDS) are continually evolving in the United States. Exploring these characteristics through user generated text sources may provide novel insights into product use behavior that are challenging to identify using survey-based methods. The objective of this study was to compare the topics frequently discussed among Reddit members in cannabis, tobacco, and ENDS-specific subreddits. Methods: We collected 643,070 posts on the social media site Reddit between January 2013 and December 2018. We developed and validated an annotation scheme, achieving a high level of agreement among annotators. We then manually coded a subset of 2,630 posts for their content with relation to experiences and use of the three products of interest, and further developed word cloud representations of the words contained in these posts. Finally, we applied Latent Dirichlet Allocation (LDA) topic modeling to the 643,070 posts to identify emerging themes related to cannabis, tobacco, and ENDS products being discussed on Reddit. Results: Our manual annotation process yielded 2,148 (81.6%) posts that contained a mention(s) of either cannabis, tobacco, or ENDS with 1,537 (71.5%) of these posts mentioning cannabis, 421 (19.5%) mentioning ENDS, and 264 (12.2%) mentioning tobacco. In cannabis-specific subreddits, personal experiences with cannabis, cannabis legislation, health effects of cannabis use, methods and forms of cannabis, and the cultivation of cannabis were commonly discussed topics. The discussion in tobacco-specific subreddits often focused on the discussion of brands and types of combustible tobacco, as well as smoking cessation experiences and advice. In ENDS-specific subreddits, topics often included ENDS accessories and parts, flavors and nicotine solutions, procurement of ENDS, and the use of ENDS for smoking cessation. Conclusion: Our findings highlight the posting and participation patterns of Reddit members in cannabis, tobacco, and ENDS-specific subreddits and provide novel insights into aspects of personal use regarding these products. These findings complement epidemiologic study designs and highlight the potential of using specific subreddits to explore personal experiences with cannabis, ENDS, and tobacco products.


Assuntos
Cannabis , Produtos do Tabaco , Vaping , Humanos , Processamento de Linguagem Natural , Nicotiana , Estados Unidos
7.
JMIR Public Health Surveill ; 6(3): e19975, 2020 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-32876579

RESUMO

BACKGROUND: Increases in electronic nicotine delivery system (ENDS) use among high school students from 2017 to 2019 appear to be associated with the increasing popularity of the ENDS device JUUL. OBJECTIVE: We employed a content analysis approach in conjunction with natural language processing methods using Twitter data to understand salient themes regarding JUUL use on Twitter, sentiment towards JUUL, and underage JUUL use. METHODS: Between July 2018 and August 2019, 11,556 unique tweets containing a JUUL-related keyword were collected. We manually annotated 4000 tweets for JUUL-related themes of use and sentiment. We used 3 machine learning algorithms to classify positive and negative JUUL sentiments as well as underage JUUL mentions. RESULTS: Of the annotated tweets, 78.80% (3152/4000) contained a specific mention of JUUL. Only 1.43% (45/3152) of tweets mentioned using JUUL as a method of smoking cessation, and only 6.85% (216/3152) of tweets mentioned the potential health effects of JUUL use. Of the machine learning methods used, the random forest classifier was the best performing algorithm among all 3 classification tasks (ie, positive sentiment, negative sentiment, and underage JUUL mentions). CONCLUSIONS: Our findings suggest that a vast majority of Twitter users are not using JUUL to aid in smoking cessation nor do they mention the potential health benefits or detriments of JUUL use. Using machine learning algorithms to identify tweets containing underage JUUL mentions can support the timely surveillance of JUUL habits and opinions, further assisting youth-targeted public health intervention strategies.


Assuntos
Comportamento do Adolescente/psicologia , Sistemas Eletrônicos de Liberação de Nicotina/normas , Mídias Sociais/instrumentação , Adolescente , Sistemas Eletrônicos de Liberação de Nicotina/estatística & dados numéricos , Feminino , Humanos , Aprendizado de Máquina/estatística & dados numéricos , Masculino , Processamento de Linguagem Natural , Mídias Sociais/estatística & dados numéricos
8.
Addiction ; 115(9): 1777-1785, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32107817

RESUMO

BACKGROUND AND AIMS: Sustained psychosocial support via online social groups may help former tobacco users maintain abstinence. This study aims to examine the effectiveness of participating in a WhatsApp social group for long-term smoking cessation. DESIGN: Two-arm, open-labelled, pragmatic, individually randomized controlled trial. SETTING: All participants are service users of smoking cessation clinics, and all interventions are delivered via mobile phones. PARTICIPANTS: Participants included 1008 adult quitters who self-report no tobacco use in the past 3-30 days. INTERVENTIONS: The intervention group (n = 504) will join a WhatsApp social group to receive standardized and theory-based reminders of smoking relapse prevention and participate in discussion with other WhatsApp group members using their own mobile phones. All social groups will be led by counselors or specialist nurse practitioners. The control group (n = 504) will receive similar reminders via short messages to their own mobile phones but will not interact with other participants. The intervention duration for both groups is 8 weeks. Both groups will receive a booklet at baseline about how to prevent smoking relapse. MEASUREMENTS: The primary outcome is biochemically validated tobacco abstinence at 12 months after consent. COMMENTS: The findings will provide evidence concerning the utility of operating online social group discussion for prevention of smoking relapse and sustaining long-term abstinence.


Assuntos
Aplicativos Móveis , Prevenção Secundária/métodos , Abandono do Hábito de Fumar/métodos , Adulto , Telefone Celular , Feminino , Comportamentos Relacionados com a Saúde , Humanos , Masculino , Folhetos , Sistemas de Apoio Psicossocial , Prevenção do Hábito de Fumar/métodos , Envio de Mensagens de Texto
9.
Yearb Med Inform ; 28(1): 208-217, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31419834

RESUMO

OBJECTIVE: We present a narrative review of recent work on the utilisation of Natural Language Processing (NLP) for the analysis of social media (including online health communities) specifically for public health applications. METHODS: We conducted a literature review of NLP research that utilised social media or online consumer-generated text for public health applications, focussing on the years 2016 to 2018. Papers were identified in several ways, including PubMed searches and the inspection of recent conference proceedings from the Association of Computational Linguistics (ACL), the Conference on Human Factors in Computing Systems (CHI), and the International AAAI (Association for the Advancement of Artificial Intelligence) Conference on Web and Social Media (ICWSM). Popular data sources included Twitter, Reddit, various online health communities, and Facebook. RESULTS: In the recent past, communicable diseases (e.g., influenza, dengue) have been the focus of much social media-based NLP health research. However, mental health and substance use and abuse (including the use of tobacco, alcohol, marijuana, and opioids) have been the subject of an increasing volume of research in the 2016 - 2018 period. Associated with this trend, the use of lexicon-based methods remains popular given the availability of psychologically validated lexical resources suitable for mental health and substance abuse research. Finally, we found that in the period under review "modern" machine learning methods (i.e. deep neural-network-based methods), while increasing in popularity, remain less widely used than "classical" machine learning methods.


Assuntos
Pesquisa sobre Serviços de Saúde/métodos , Processamento de Linguagem Natural , Dados de Saúde Gerados pelo Paciente , Mídias Sociais , Bibliometria , Humanos , Saúde Pública/ética , Vigilância em Saúde Pública/métodos
11.
Tob Use Insights ; 11: 1179173X18782879, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30046257

RESUMO

BACKGROUND: In this article, we present qualitative work designed to explore physicians' attitudes toward and knowledge of electronic cigarettes (or Electronic Nicotine Delivery Systems-ENDS), particularly focusing on personal attitudes held by physicians regarding ENDS use, physician beliefs regarding the relative safety of ENDS, attitudes regarding the efficacy of ENDS as a smoking cessation tool, and how physicians' document ENDS use in the electronic health record (EHR). METHODS: We completed a total of 17 semistructured qualitative interviews with physicians in 4 different outpatient clinic locations. Clinics were selected with the goal of reaching patient panels across a diversity of socioeconomic and local geographic locations. RESULTS: The findings from our qualitative analysis suggest that physicians feel uninformed about the long-term health risks of ENDS and believe that they lack the critical medical knowledge required for discussing ENDS with their patients who smoke. Although physician responses did not endorse the view that ENDS use is a safer alternative to combustible tobacco use, approximately one-third of our physician sample did not hold strong objections to ENDS usage. Physicians placed varying degrees of importance on the issue of ENDS documentation practices. DISCUSSION: Three overarching themes were revealed from our analysis. These themes included (1) physicians' attitudes regarding the use of ENDS for smoking cessation, (2) physicians' guidance and advisement to patients in the use of ENDS for smoking cessation, and (3) current practices of clinical documentation of ENDS use in an EHR. Our qualitative results indicate that physicians in our study rarely screen patients for ENDS use, even for those patients who are both documented smokers and recipients of physician-led tobacco cessation counseling. However, most physicians agreed that the prospect of creating a structured data field specifically for the documentation of ENDS use within the EHR would result in the likelihood of increased screening and documentation of ENDS use patterns.

12.
JMIR Public Health Surveill ; 3(1): e1, 2017 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-28062390

RESUMO

BACKGROUND: The popularity and use of electronic cigarettes (e-cigarettes) has increased across all demographic groups in recent years. However, little is currently known about the readability of health information and advice aimed at the general public regarding the use of e-cigarettes. OBJECTIVE: The objective of our study was to examine the readability of publicly available health information as well as advice on e-cigarettes. We compared information and advice available from US government agencies, nongovernment organizations, English speaking government agencies outside the United States, and for-profit entities. METHODS: A systematic search for health information and advice on e-cigarettes was conducted using search engines. We manually verified search results and converted to plain text for analysis. We then assessed readability of the collected documents using 4 readability metrics followed by pairwise comparisons of groups with adjustment for multiple comparisons. RESULTS: A total of 54 documents were collected for this study. All 4 readability metrics indicate that all information and advice on e-cigarette use is written at a level higher than that recommended for the general public by National Institutes of Health (NIH) communication guidelines. However, health information and advice written by for-profit entities, many of which were promoting e-cigarettes, were significantly easier to read. CONCLUSIONS: A substantial proportion of potential and current e-cigarette users are likely to have difficulty in fully comprehending Web-based health information regarding e-cigarettes, potentially hindering effective health-seeking behaviors. To comply with NIH communication guidelines, government entities and nongovernment organizations would benefit from improving the readability of e-cigarettes information and advice.

13.
AMIA Annu Symp Proc ; 2017: 1362-1371, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29854205

RESUMO

We use Reddit to demonstrate social media's potential for public health applications. First, we employ a lexicon-based approach to track the prevalence of keywords indicating public interest in Ebola, electronic cigarette, influenza, and marijuana. Second, to better understand the public reactions, we use the Latent Dirichlet Allocation algorithm, to identify either the general themes or motivations for extreme changes in the volume of discussion over time. We observe that discussions related to Ebola and influenza, infectious diseases of public health interests, surged when the first case of Ebola was diagnosed and a new strain of H1N1 influenza virus was confirmed in the United States. We also observed that discussions of a controversial health topic like marijuana increased with the announcement of a major change in United States federal policy. Discussions of electronic cigarette highlighted opportunities for better health education. Lastly, we discuss the implications of our findings for utilizing Reddit data for public health applications.


Assuntos
Mineração de Dados/métodos , Vigilância da População/métodos , Informática em Saúde Pública , Mídias Sociais , Cannabis , Sistemas Eletrônicos de Liberação de Nicotina , Doença pelo Vírus Ebola , Humanos , Vírus da Influenza A Subtipo H1N1 , Influenza Humana , Motivação , Estados Unidos
14.
J Med Internet Res ; 17(9): e220, 2015 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-26420469

RESUMO

BACKGROUND: The rise in popularity of electronic cigarettes (e-cigarettes) and hookah over recent years has been accompanied by some confusion and uncertainty regarding the development of an appropriate regulatory response towards these emerging products. Mining online discussion content can lead to insights into people's experiences, which can in turn further our knowledge of how to address potential health implications. In this work, we take a novel approach to understanding the use and appeal of these emerging products by applying text mining techniques to compare consumer experiences across discussion forums. OBJECTIVE: This study examined content from the websites Vapor Talk, Hookah Forum, and Reddit to understand people's experiences with different tobacco products. Our investigation involves three parts. First, we identified contextual factors that inform our understanding of tobacco use behaviors, such as setting, time, social relationships, and sensory experience, and compared the forums to identify the ones where content on these factors is most common. Second, we compared how the tobacco use experience differs with combustible cigarettes and e-cigarettes. Third, we investigated differences between e-cigarette and hookah use. METHODS: In the first part of our study, we employed a lexicon-based extraction approach to estimate prevalence of contextual factors, and then we generated a heat map based on these estimates to compare the forums. In the second and third parts of the study, we employed a text mining technique called topic modeling to identify important topics and then developed a visualization, Topic Bars, to compare topic coverage across forums. RESULTS: In the first part of the study, we identified two forums, Vapor Talk Health & Safety and the Stopsmoking subreddit, where discussion concerning contextual factors was particularly common. The second part showed that the discussion in Vapor Talk Health & Safety focused on symptoms and comparisons of combustible cigarettes and e-cigarettes, and the Stopsmoking subreddit focused on psychological aspects of quitting. Last, we examined the discussion content on Vapor Talk and Hookah Forum. Prominent topics included equipment, technique, experiential elements of use, and the buying and selling of equipment. CONCLUSIONS: This study has three main contributions. Discussion forums differ in the extent to which their content may help us understand behaviors with potential health implications. Identifying dimensions of interest and using a heat map visualization to compare across forums can be helpful for identifying forums with the greatest density of health information. Additionally, our work has shown that the quitting experience can potentially be very different depending on whether or not e-cigarettes are used. Finally, e-cigarette and hookah forums are similar in that members represent a "hobbyist culture" that actively engages in information exchange. These differences have important implications for both tobacco regulation and smoking cessation intervention design.


Assuntos
Mineração de Dados/métodos , Sistemas Eletrônicos de Liberação de Nicotina/estatística & dados numéricos , Internet , Fumar/epidemiologia , Conjuntos de Dados como Assunto , Humanos , Prevalência , Segurança , Nicotiana , Produtos do Tabaco/estatística & dados numéricos
15.
J Med Internet Res ; 15(8): e174, 2013 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-23989137

RESUMO

BACKGROUND: Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users' levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by hookah and electronic cigarettes. OBJECTIVE: To develop a content and sentiment analysis of tobacco-related Twitter posts and build machine learning classifiers to detect tobacco-relevant posts and sentiment towards tobacco, with a particular focus on new and emerging products like hookah and electronic cigarettes. METHODS: We collected 7362 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012. Each tweet was manually classified using a triaxial scheme, capturing genre, theme, and sentiment. Using the collected data, machine-learning classifiers were trained to detect tobacco-related vs irrelevant tweets as well as positive vs negative sentiment, using Naïve Bayes, k-nearest neighbors, and Support Vector Machine (SVM) algorithms. Finally, phi contingency coefficients were computed between each of the categories to discover emergent patterns. RESULTS: The most prevalent genres were first- and second-hand experience and opinion, and the most frequent themes were hookah, cessation, and pleasure. Sentiment toward tobacco was overall more positive (1939/4215, 46% of tweets) than negative (1349/4215, 32%) or neutral among tweets mentioning it, even excluding the 9% of tweets categorized as marketing. Three separate metrics converged to support an emergent distinction between, on one hand, hookah and electronic cigarettes corresponding to positive sentiment, and on the other hand, traditional tobacco products and more general references corresponding to negative sentiment. These metrics included correlations between categories in the annotation scheme (phihookah-positive=0.39; phi(e-cigs)-positive=0.19); correlations between search keywords and sentiment (χ²4=414.50, P<.001, Cramer's V=0.36), and the most discriminating unigram features for positive and negative sentiment ranked by log odds ratio in the machine learning component of the study. In the automated classification tasks, SVMs using a relatively small number of unigram features (500) achieved best performance in discriminating tobacco-related from unrelated tweets (F score=0.85). CONCLUSIONS: Novel insights available through Twitter for tobacco surveillance are attested through the high prevalence of positive sentiment. This positive sentiment is correlated in complex ways with social image, personal experience, and recently popular products such as hookah and electronic cigarettes. Several apparent perceptual disconnects between these products and their health effects suggest opportunities for tobacco control education. Finally, machine classification of tobacco-related posts shows a promising edge over strictly keyword-based approaches, yielding an improved signal-to-noise ratio in Twitter data and paving the way for automated tobacco surveillance applications.


Assuntos
Internet , Nicotiana , Fumar , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA