Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 60
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Nucleic Acids Res ; 51(W1): W78-W82, 2023 07 05.
Article in English | MEDLINE | ID: mdl-37194699

ABSTRACT

Access to computationally based visualization tools to navigate chemical space has become more important due to the increasing size and diversity of publicly accessible databases, associated compendiums of high-throughput screening (HTS) results, and other descriptor and effects data. However, application of these techniques requires advanced programming skills that are beyond the capabilities of many stakeholders. Here we report the development of the second version of the ChemMaps.com webserver (https://sandbox.ntp.niehs.nih.gov/chemmaps/) focused on environmental chemical space. The chemical space of ChemMaps.com v2.0, released in 2022, now includes approximately one million environmental chemicals from the EPA Distributed Structure-Searchable Toxicity (DSSTox) inventory. ChemMaps.com v2.0 incorporates mapping of HTS assay data from the U.S. federal Tox21 research collaboration program, which includes results from around 2000 assays tested on up to 10 000 chemicals. As a case example, we showcased chemical space navigation for Perfluorooctanoic Acid (PFOA), part of the Per- and polyfluoroalkyl substances (PFAS) chemical family, which are of significant concern for their potential effects on human health and the environment.


Subject(s)
Databases, Chemical , High-Throughput Screening Assays , Software , Environment
2.
J Med Internet Res ; 25: e36667, 2023 02 27.
Article in English | MEDLINE | ID: mdl-36848191

ABSTRACT

BACKGROUND: The use and acceptance of medicinal cannabis is on the rise across the globe. To support the interests of public health, evidence relating to its use, effects, and safety is required to match this community demand. Web-based user-generated data are often used by researchers and public health organizations for the investigation of consumer perceptions, market forces, population behaviors, and for pharmacoepidemiology. OBJECTIVE: In this review, we aimed to summarize the findings of studies that have used user-generated text as a data source to study medicinal cannabis or the use of cannabis as medicine. Our objectives were to categorize the insights provided by social media research on cannabis as medicine and describe the role of social media for consumers using medicinal cannabis. METHODS: The inclusion criteria for this review were primary research studies and reviews that reported on the analysis of web-based user-generated content on cannabis as medicine. The MEDLINE, Scopus, Web of Science, and Embase databases were searched from January 1974 to April 2022. RESULTS: We examined 42 studies published in English and found that consumers value their ability to exchange experiences on the web and tend to rely on web-based information sources. Cannabis discussions have portrayed the substance as a safe and natural medicine to help with many health conditions including cancer, sleep disorders, chronic pain, opioid use disorders, headaches, asthma, bowel disease, anxiety, depression, and posttraumatic stress disorder. These discussions provide a rich resource for researchers to investigate medicinal cannabis-related consumer sentiment and experiences, including the opportunity to monitor cannabis effects and adverse events, given the anecdotal and often biased nature of the information is properly accounted for. CONCLUSIONS: The extensive web-based presence of the cannabis industry coupled with the conversational nature of social media discourse results in rich but potentially biased information that is often not well-supported by scientific evidence. This review summarizes what social media is saying about the medicinal use of cannabis and discusses the challenges faced by health governance agencies and professionals to make use of web-based resources to both learn from medicinal cannabis users and provide factual, timely, and reliable evidence-based health information to consumers.


Subject(s)
Cannabis , Medical Marijuana , Social Media , Humans , Medical Marijuana/therapeutic use , Public Opinion , Public Health
3.
Nucleic Acids Res ; 48(W1): W586-W590, 2020 07 02.
Article in English | MEDLINE | ID: mdl-32421835

ABSTRACT

High-throughput screening (HTS) research programs for drug development or chemical hazard assessment are designed to screen thousands of molecules across hundreds of biological targets or pathways. Most HTS platforms use fluorescence and luminescence technologies, representing more than 70% of the assays in the US Tox21 research consortium. These technologies are subject to interferent signals largely explained by chemicals interacting with light spectrum. This phenomenon results in up to 5-10% of false positive results, depending on the chemical library used. Here, we present the InterPred webserver (version 1.0), a platform to predict such interference chemicals based on the first large-scale chemical screening effort to directly characterize chemical-assay interference, using assays in the Tox21 portfolio specifically designed to measure autofluorescence and luciferase inhibition. InterPred combines 17 quantitative structure activity relationship (QSAR) models built using optimized machine learning techniques and allows users to predict the probability that a new chemical will interfere with different combinations of cellular and technology conditions. InterPred models have been applied to the entire Distributed Structure-Searchable Toxicity (DSSTox) Database (∼800,000 chemicals). The InterPred webserver is available at https://sandbox.ntp.niehs.nih.gov/interferences/.


Subject(s)
High-Throughput Screening Assays , Software , Artifacts , Fluorescence , Internet , Machine Learning , Pharmaceutical Preparations/chemistry , Quantitative Structure-Activity Relationship , Workflow
4.
BMC Pediatr ; 22(1): 167, 2022 03 31.
Article in English | MEDLINE | ID: mdl-35361157

ABSTRACT

BACKGROUND & OBJECTIVES: This study aims to explore and elucidate parents' experience of newborn screening [NBS], with the overarching goal of identifying desiderata for the development of informatics-based educational and health management resources. METHODS: We conducted four focus groups and four one-on-one qualitative interviews with a total of 35 participants between March and September 2020. Participants were grouped into three types: parents who had received true positive newborn screening results; parents who had received false positive results; and soon-to-be parents who had no direct experience of the screening process. Interview data were subjected to analysis using an inductive, constant comparison approach. RESULTS: Results are divided into five sections: (1) experiences related to the process of receiving NBS results and prior knowledge of the NBS program; (2) approaches to the management of a child's medical data; (3) sources of additional informational and emotional support; (4) barriers faced by parents navigating the health system; and (5) recommendations and suggestions for new parents experiencing the NBS process. CONCLUSION: Our analysis revealed a wide range of experiences of, and attitudes towards the newborn screening program and the wider newborn screening system. While parents' view of the screening process was - on the whole - positive, some participants reported experiencing substantial frustration, particularly related to how results are initially communicated and difficulties in accessing reliable, timely information. This frustration with current information management and education resources indicates a role for informatics-based approaches in addressing parents' information needs.


Subject(s)
Neonatal Screening , Parents , Child , Focus Groups , Humans , Infant, Newborn , Neonatal Screening/psychology , Pain , Parents/psychology , Qualitative Research
5.
J Med Internet Res ; 24(11): e35974, 2022 11 16.
Article in English | MEDLINE | ID: mdl-36383417

ABSTRACT

BACKGROUND: Medicinal cannabis is increasingly being used for a variety of physical and mental health conditions. Social media and web-based health platforms provide valuable, real-time, and cost-effective surveillance resources for gleaning insights regarding individuals who use cannabis for medicinal purposes. This is particularly important considering that the evidence for the optimal use of medicinal cannabis is still emerging. Despite the web-based marketing of medicinal cannabis to consumers, currently, there is no robust regulatory framework to measure clinical health benefits or individual experiences of adverse events. In a previous study, we conducted a systematic scoping review of studies that contained themes of the medicinal use of cannabis and used data from social media and search engine results. This study analyzed the methodological approaches and limitations of these studies. OBJECTIVE: We aimed to examine research approaches and study methodologies that use web-based user-generated text to study the use of cannabis as a medicine. METHODS: We searched MEDLINE, Scopus, Web of Science, and Embase databases for primary studies in the English language from January 1974 to April 2022. Studies were included if they aimed to understand web-based user-generated text related to health conditions where cannabis is used as a medicine or where health was mentioned in general cannabis-related conversations. RESULTS: We included 42 articles in this review. In these articles, Twitter was used 3 times more than other computer-generated sources, including Reddit, web-based forums, GoFundMe, YouTube, and Google Trends. Analytical methods included sentiment assessment, thematic analysis (manual and automatic), social network analysis, and geographic analysis. CONCLUSIONS: This study is the first to review techniques used by research on consumer-generated text for understanding cannabis as a medicine. It is increasingly evident that consumer-generated data offer opportunities for a greater understanding of individual behavior and population health outcomes. However, research using these data has some limitations that include difficulties in establishing sample representativeness and a lack of methodological best practices. To address these limitations, deidentified annotated data sources should be made publicly available, researchers should determine the origins of posts (organizations, bots, power users, or ordinary individuals), and powerful analytical techniques should be used.


Subject(s)
Cannabis , Medical Marijuana , Medicine , Mental Disorders , Social Media , Humans , Medical Marijuana/therapeutic use
6.
J Sch Nurs ; 38(1): 74-83, 2022 Feb.
Article in English | MEDLINE | ID: mdl-33944636

ABSTRACT

School nurses are the most accessible health care providers for many young people including adolescents and young adults. Early identification of depression results in improved outcomes, but little information is available comprehensively describing depressive symptoms specific to this population. The aim of this study was to develop a taxonomy of depressive symptoms that were manifested and described by young people based on a scoping review and content analysis. Twenty-five journal articles that included narrative descriptions of depressive symptoms in young people were included. A total of 60 depressive symptoms were identified and categorized into five dimensions: behavioral (n = 8), cognitive (n = 14), emotional (n = 15), interpersonal (n = 13), and somatic (n = 10). This comprehensive depression symptom taxonomy can help school nurses to identify young people who may experience depression and will support future research to better screen for depression.


Subject(s)
Depression , Adolescent , Humans , Young Adult
7.
J Biomed Inform ; 90: 103091, 2019 02.
Article in English | MEDLINE | ID: mdl-30611893

ABSTRACT

"Psychiatric Treatment Adverse Reactions" (PsyTAR) corpus is an annotated corpus that has been developed using patients narrative data for psychiatric medications, particularly SSRIs (Selective Serotonin Reuptake Inhibitor) and SNRIs (Serotonin Norepinephrine Reuptake Inhibitor) medications. This corpus consists of three main components: sentence classification, entity identification, and entity normalization. We split the review posts into sentences and labeled them for presence of adverse drug reactions (ADRs) (2168 sentences), withdrawal symptoms (WDs) (438 sentences), sign/symptoms/illness (SSIs) (789 sentences), drug indications (517), drug effectiveness (EF) (1087 sentences), and drug infectiveness (INF) (337 sentences). In the entity identification phase, we identified and extracted ADRs (4813 mentions), WDs (590 mentions), SSIs (1219 mentions), and DIs (792). In the entity normalization phase, we mapped the identified entities to the corresponding concepts in both UMLS (918 unique concepts) and SNOMED CT (755 unique concepts). Four annotators double coded the sentences and the span of identified entities by strictly following guidelines rules developed for this study. We used the PsyTAR sentence classification component to automatically train a range of supervised machine learning classifiers to identifying text segments with the mentions of ADRs, WDs, DIs, SSIs, EF, and INF. SVMs classifiers had the highest performance with F-Score 0.90. We also measured performance of the cTAKES (clinical Text Analysis and Knowledge Extraction System) in identifying patients' expressions of ADRs and WDs with and without adding PsyTAR dictionary to the core dictionary of cTAKES. Augmenting cTAKES dictionary with PsyTAR improved the F-score cTAKES by 25%. The findings imply that PsyTAR has significant implications for text mining algorithms aimed to identify information about adverse drug events and drug effectiveness from patients' narratives data, by linking the patients' expressions of adverse drug events to medical standard vocabularies. The corpus is publicly available at Zolnoori et al. [30].


Subject(s)
Adverse Drug Reaction Reporting Systems , Selective Serotonin Reuptake Inhibitors/adverse effects , Serotonin and Noradrenaline Reuptake Inhibitors/adverse effects , Algorithms , Data Collection , Data Mining , Humans , Pharmacovigilance , Systematized Nomenclature of Medicine , Unified Medical Language System
8.
J Med Internet Res ; 20(4): e121, 2018 04 10.
Article in English | MEDLINE | ID: mdl-29636316

ABSTRACT

BACKGROUND: Mental disorders such as depression, bipolar disorder, and schizophrenia are common, incapacitating, and have the potential to be fatal. Despite the prevalence and gravity of mental disorders, our knowledge concerning everyday challenges associated with them is relatively limited. One of the most studied deficits related to everyday challenges is language impairment, yet we do not know how mental disorders can impact common forms of written communication, for example, social media. OBJECTIVE: The aims of this study were to investigate written communication challenges manifest in online mental health communities focusing on depression, bipolar disorder, and schizophrenia, as well as the impact of participating in these online mental health communities on written communication. As the control, we selected three online health communities focusing on positive emotion, exercising, and weight management. METHODS: We examined lexical diversity and readability, both important features for measuring the quality of writing. We used four well-established readability metrics that consider word frequencies and syntactic complexity to measure writers' written communication ability. We then measured the lexical diversity by calculating the percentage of unique words in posts. To compare lexical diversity and readability among communities, we first applied pairwise independent sample t tests, followed by P value adjustments using the prespecified Hommel procedure to adjust for multiple comparison. To measure the changes, we applied linear least squares regression to the readability and lexical diversity scores against the interaction sequence for each member, followed by pairwise independent sample t tests and P value adjustments. Given the large sample of members, we also report effect sizes and 95% CIs for the pairwise comparisons. RESULTS: On average, members of depression, bipolar disorder, and schizophrenia communities showed indications of difficulty expressing their ideas compared with three other online health communities. Our results also suggest that participating in these platforms has the potential to improve members' written communication. For example, members of all three mental health communities showed statistically significant improvement in both lexical diversity and readability compared with members of the OHC focusing on positive emotion. CONCLUSIONS: We provide new insights into the written communication challenges faced by individuals suffering from depression, bipolar disorder, and schizophrenia. A comparison with three other online health communities suggests that written communication in mental health communities is significantly more difficult to read, while also consisting of a significantly less diverse lexicon. We contribute practical suggestions for utilizing our findings in Web-based communication settings to enhance members' communicative experience. We consider these findings to be an important step toward understanding and addressing everyday written communication challenges among individuals suffering from mental disorders.


Subject(s)
Mental Health/trends , Social Media/instrumentation , Text Messaging/instrumentation , Communication , Comprehension , Female , Humans , Male
9.
J Med Internet Res ; 19(3): e71, 2017 03 20.
Article in English | MEDLINE | ID: mdl-28320692

ABSTRACT

BACKGROUND: Major depression is a serious challenge at both the individual and population levels. Although online health communities have shown the potential to reduce the symptoms of depression, emotional contagion theory suggests that negative emotion can spread within a community, and prolonged interactions with other depressed individuals has potential to worsen the symptoms of depression. OBJECTIVE: The goals of our study were to investigate longitudinal changes in psychological states that are manifested through linguistic changes in depression community members who are interacting with other depressed individuals. METHODS: We examined emotion-related language usages using the Linguistic Inquiry and Word Count (LIWC) program for each member of a depression community from Reddit. To measure the changes, we applied linear least-squares regression to the LIWC scores against the interaction sequence for each member. We measured the differences in linguistic changes against three online health communities focusing on positive emotion, diabetes, and irritable bowel syndrome. RESULTS: On average, members of an online depression community showed improvement in 9 of 10 prespecified linguistic dimensions: "positive emotion," "negative emotion," "anxiety," "anger," "sadness," "first person singular," "negation," "swear words," and "death." Moreover, these members improved either significantly or at least as much as members of other online health communities. CONCLUSIONS: We provide new insights into the impact of prolonged participation in an online depression community and highlight the positive emotion change in members. The findings of this study should be interpreted with caution, because participating in an online depression community is not the sole factor for improvement or worsening of depressive symptoms. Still, the consistent statistical results including comparative analyses with different communities could indicate that the emotion-related language usage of depression community members are improving either significantly or at least as much as members of other online communities. On the basis of these findings, we contribute practical suggestions for designing online depression communities to enhance psychosocial benefit gains for members. We consider these results to be an important step toward a better understanding of the impact of prolonged participation in an online depression community, in addition to providing insights into the long-term psychosocial well-being of members.


Subject(s)
Depression/psychology , Depressive Disorder, Major/psychology , Internet , Social Media , Adult , Community Networks , Female , Humans , Linguistics , Longitudinal Studies , Male , Mental Health , Social Networking
10.
J Med Internet Res ; 19(2): e48, 2017 02 28.
Article in English | MEDLINE | ID: mdl-28246066

ABSTRACT

BACKGROUND: With a lifetime prevalence of 16.2%, major depressive disorder is the fifth biggest contributor to the disease burden in the United States. OBJECTIVE: The aim of this study, building on previous work qualitatively analyzing depression-related Twitter data, was to describe the development of a comprehensive annotation scheme (ie, coding scheme) for manually annotating Twitter data with Diagnostic and Statistical Manual of Mental Disorders, Edition 5 (DSM 5) major depressive symptoms (eg, depressed mood, weight change, psychomotor agitation, or retardation) and Diagnostic and Statistical Manual of Mental Disorders, Edition IV (DSM-IV) psychosocial stressors (eg, educational problems, problems with primary support group, housing problems). METHODS: Using this annotation scheme, we developed an annotated corpus, Depressive Symptom and Psychosocial Stressors Acquired Depression, the SAD corpus, consisting of 9300 tweets randomly sampled from the Twitter application programming interface (API) using depression-related keywords (eg, depressed, gloomy, grief). An analysis of our annotated corpus yielded several key results. RESULTS: First, 72.09% (6829/9473) of tweets containing relevant keywords were nonindicative of depressive symptoms (eg, "we're in for a new economic depression"). Second, the most prevalent symptoms in our dataset were depressed mood and fatigue or loss of energy. Third, less than 2% of tweets contained more than one depression related category (eg, diminished ability to think or concentrate, depressed mood). Finally, we found very high positive correlations between some depression-related symptoms in our annotated dataset (eg, fatigue or loss of energy and educational problems; educational problems and diminished ability to think). CONCLUSIONS: We successfully developed an annotation scheme and an annotated corpus, the SAD corpus, consisting of 9300 tweets randomly-selected from the Twitter application programming interface using depression-related keywords. Our analyses suggest that keyword queries alone might not be suitable for public health monitoring because context can change the meaning of keyword in a statement. However, postprocessing approaches could be useful for reducing the noise and improving the signal needed to detect depression symptoms using social media.


Subject(s)
Depression/diagnosis , Depressive Disorder, Major/diagnosis , Internet/statistics & numerical data , Social Media/statistics & numerical data , Stress, Psychological/diagnosis , Depression/epidemiology , Depressive Disorder, Major/epidemiology , Humans , Machine Learning , Psychology , Stress, Psychological/epidemiology
12.
BMC Med Ethics ; 17: 22, 2016 Apr 14.
Article in English | MEDLINE | ID: mdl-27080238

ABSTRACT

BACKGROUND: Recently, significant research effort has focused on using Twitter (and other social media) to investigate mental health at the population-level. While there has been influential work in developing ethical guidelines for Internet discussion forum-based research in public health, there is currently limited work focused on addressing ethical problems in Twitter-based public health research, and less still that considers these issues from users' own perspectives. In this work, we aim to investigate public attitudes towards utilizing public domain Twitter data for population-level mental health monitoring using a qualitative methodology. METHODS: The study explores user perspectives in a series of five, 2-h focus group interviews. Following a semi-structured protocol, 26 Twitter users with and without a diagnosed history of depression discussed general Twitter use, along with privacy expectations, and ethical issues in using social media for health monitoring, with a particular focus on mental health monitoring. Transcripts were then transcribed, redacted, and coded using a constant comparative approach. RESULTS: While participants expressed a wide range of opinions, there was an overall trend towards a relatively positive view of using public domain Twitter data as a resource for population level mental health monitoring, provided that results are appropriately aggregated. Results are divided into five sections: (1) a profile of respondents' Twitter use patterns and use variability; (2) users' privacy expectations, including expectations regarding data reach and permanence; (3) attitudes towards social media based population-level health monitoring in general, and attitudes towards mental health monitoring in particular; (4) attitudes towards individual versus population-level health monitoring; and (5) users' own recommendations for the appropriate regulation of population-level mental health monitoring. CONCLUSIONS: Focus group data reveal a wide range of attitudes towards the use of public-domain social media "big data" in population health research, from enthusiasm, through acceptance, to opposition. Study results highlight new perspectives in the discussion of ethical use of public data, particularly with respect to consent, privacy, and oversight.


Subject(s)
Attitude , Depression , Mental Health , Population Surveillance/methods , Privacy , Social Media/ethics , Adult , Female , Focus Groups , Humans , Male , Middle Aged , Young Adult
13.
J Biomed Inform ; 58: 280-287, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26556646

ABSTRACT

Self-reported patient data has been shown to be a valuable knowledge source for post-market pharmacovigilance. In this paper we propose using the popular micro-blogging service Twitter to gather evidence about adverse drug reactions (ADRs) after firstly having identified micro-blog messages (also know as "tweets") that report first-hand experience. In order to achieve this goal we explore machine learning with data crowdsourced from laymen annotators. With the help of lay annotators recruited from CrowdFlower we manually annotated 1548 tweets containing keywords related to two kinds of drugs: SSRIs (eg. Paroxetine), and cognitive enhancers (eg. Ritalin). Our results show that inter-annotator agreement (Fleiss' kappa) for crowdsourcing ranks in moderate agreement with a pair of experienced annotators (Spearman's Rho=0.471). We utilized the gold standard annotations from CrowdFlower for automatically training a range of supervised machine learning models to recognize first-hand experience. F-Score values are reported for 6 of these techniques with the Bayesian Generalized Linear Model being the best (F-Score=0.64 and Informedness=0.43) when combined with a selected set of features obtained by using information gain criteria.


Subject(s)
Crowdsourcing , Drug Prescriptions , Social Media , Humans
14.
J Med Internet Res ; 17(9): e220, 2015 Sep 29.
Article in English | MEDLINE | ID: mdl-26420469

ABSTRACT

BACKGROUND: The rise in popularity of electronic cigarettes (e-cigarettes) and hookah over recent years has been accompanied by some confusion and uncertainty regarding the development of an appropriate regulatory response towards these emerging products. Mining online discussion content can lead to insights into people's experiences, which can in turn further our knowledge of how to address potential health implications. In this work, we take a novel approach to understanding the use and appeal of these emerging products by applying text mining techniques to compare consumer experiences across discussion forums. OBJECTIVE: This study examined content from the websites Vapor Talk, Hookah Forum, and Reddit to understand people's experiences with different tobacco products. Our investigation involves three parts. First, we identified contextual factors that inform our understanding of tobacco use behaviors, such as setting, time, social relationships, and sensory experience, and compared the forums to identify the ones where content on these factors is most common. Second, we compared how the tobacco use experience differs with combustible cigarettes and e-cigarettes. Third, we investigated differences between e-cigarette and hookah use. METHODS: In the first part of our study, we employed a lexicon-based extraction approach to estimate prevalence of contextual factors, and then we generated a heat map based on these estimates to compare the forums. In the second and third parts of the study, we employed a text mining technique called topic modeling to identify important topics and then developed a visualization, Topic Bars, to compare topic coverage across forums. RESULTS: In the first part of the study, we identified two forums, Vapor Talk Health & Safety and the Stopsmoking subreddit, where discussion concerning contextual factors was particularly common. The second part showed that the discussion in Vapor Talk Health & Safety focused on symptoms and comparisons of combustible cigarettes and e-cigarettes, and the Stopsmoking subreddit focused on psychological aspects of quitting. Last, we examined the discussion content on Vapor Talk and Hookah Forum. Prominent topics included equipment, technique, experiential elements of use, and the buying and selling of equipment. CONCLUSIONS: This study has three main contributions. Discussion forums differ in the extent to which their content may help us understand behaviors with potential health implications. Identifying dimensions of interest and using a heat map visualization to compare across forums can be helpful for identifying forums with the greatest density of health information. Additionally, our work has shown that the quitting experience can potentially be very different depending on whether or not e-cigarettes are used. Finally, e-cigarette and hookah forums are similar in that members represent a "hobbyist culture" that actively engages in information exchange. These differences have important implications for both tobacco regulation and smoking cessation intervention design.


Subject(s)
Data Mining/methods , Electronic Nicotine Delivery Systems/statistics & numerical data , Internet , Smoking/epidemiology , Datasets as Topic , Humans , Prevalence , Safety , Nicotiana , Tobacco Products/statistics & numerical data
15.
J Med Internet Res ; 16(12): e290, 2014 Dec 22.
Article in English | MEDLINE | ID: mdl-25533619

ABSTRACT

BACKGROUND: The rise of social media and microblogging platforms in recent years, in conjunction with the development of techniques for the processing and analysis of "big data", has provided significant opportunities for public health surveillance using user-generated content. However, relatively little attention has been focused on developing ethically appropriate approaches to working with these new data sources. OBJECTIVE: Based on a review of the literature, this study seeks to develop a taxonomy of public health surveillance-related ethical concepts that emerge when using Twitter data, with a view to: (1) explicitly identifying a set of potential ethical issues and concerns that may arise when researchers work with Twitter data, and (2) providing a starting point for the formation of a set of best practices for public health surveillance through the development of an empirically derived taxonomy of ethical concepts. METHODS: We searched Medline, Compendex, PsycINFO, and the Philosopher's Index using a set of keywords selected to identify Twitter-related research papers that reference ethical concepts. Our initial set of queries identified 342 references across the four bibliographic databases. We screened titles and abstracts of these references using our inclusion/exclusion criteria, eliminating duplicates and unavailable papers, until 49 references remained. We then read the full text of these 49 articles and discarded 36, resulting in a final inclusion set of 13 articles. Ethical concepts were then identified in each of these 13 articles. Finally, based on a close reading of the text, a taxonomy of ethical concepts was constructed based on ethical concepts discovered in the papers. RESULTS: From these 13 articles, we iteratively generated a taxonomy of ethical concepts consisting of 10 top level categories: privacy, informed consent, ethical theory, institutional review board (IRB)/regulation, traditional research vs Twitter research, geographical information, researcher lurking, economic value of personal information, medical exceptionalism, and benefit of identifying socially harmful medical conditions. CONCLUSIONS: In summary, based on a review of the literature, we present a provisional taxonomy of public health surveillance-related ethical concepts that emerge when using Twitter data.


Subject(s)
Classification , Ethics , Internet/ethics , Public Health Surveillance/methods , Social Media/ethics , Databases, Bibliographic , Humans , Information Storage and Retrieval/ethics , Information Storage and Retrieval/methods , MEDLINE/ethics
16.
Stud Health Technol Inform ; 310: 579-583, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38269875

ABSTRACT

The reliable identification of skin and soft tissue infections (SSTIs) from electronic health records is important for a number of applications, including quality improvement, clinical guideline construction, and epidemiological analysis. However, in the United States, types of SSTIs (e.g. is the infection purulent or non-purulent?) are not captured reliably in structured clinical data. With this work, we trained and evaluated a rule-based clinical natural language processing system using 6,576 manually annotated clinical notes derived from the United States Veterans Health Administration (VA) with the goal of automatically extracting and classifying SSTI subtypes from clinical notes. The trained system achieved mention- and document-level performance metrics of the range 0.39 to 0.80 for mention level classification and 0.49 to 0.98 for document level classification.


Subject(s)
Soft Tissue Infections , United States , Humans , Soft Tissue Infections/diagnosis , Skin , Benchmarking , Electronic Health Records , Natural Language Processing
17.
Stud Health Technol Inform ; 310: 659-663, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38269891

ABSTRACT

Electronic Nicotine Delivery Systems (ENDS) use has increased substantially in the United States since 2010. To date, there is limited evidence regarding the nature and extent of ENDS documentation in the clinical note. In this work we investigate the effectiveness of different approaches to identify a patient's documented ENDS use. We report on the development and validation of a natural language processing system to identify patients with explicit documentation of ENDS using a large national cohort of patients at the United States Department of Veterans Affairs.


Subject(s)
Electronic Nicotine Delivery Systems , Vaping , United States , Humans , Natural Language Processing , Documentation , United States Department of Veterans Affairs
18.
J Cheminform ; 16(1): 19, 2024 Feb 20.
Article in English | MEDLINE | ID: mdl-38378618

ABSTRACT

The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, the common concern is the quality of both the chemical structure information and associated experimental data. This is especially true when those data are collected from multiple sources as chemical substance mappings can contain many duplicate structures and molecular inconsistencies. Such issues can impact the resulting molecular descriptors and their mappings to experimental data and, subsequently, the quality of the derived models in terms of accuracy, repeatability, and reliability. Herein we describe the development of an automated workflow to standardize chemical structures according to a set of standard rules and generate two and/or three-dimensional "QSAR-ready" forms prior to the calculation of molecular descriptors. The workflow was designed in the KNIME workflow environment and consists of three high-level steps. First, a structure encoding is read, and then the resulting in-memory representation is cross-referenced with any existing identifiers for consistency. Finally, the structure is standardized using a series of operations including desalting, stripping of stereochemistry (for two-dimensional structures), standardization of tautomers and nitro groups, valence correction, neutralization when possible, and then removal of duplicates. This workflow was initially developed to support collaborative modeling QSAR projects to ensure consistency of the results from the different participants. It was then updated and generalized for other modeling applications. This included modification of the "QSAR-ready" workflow to generate "MS-ready structures" to support the generation of substance mappings and searches for software applications related to non-targeted analysis mass spectrometry. Both QSAR and MS-ready workflows are freely available in KNIME, via standalone versions on GitHub, and as docker container resources for the scientific community. Scientific contribution: This work pioneers an automated workflow in KNIME, systematically standardizing chemical structures to ensure their readiness for QSAR modeling and broader scientific applications. By addressing data quality concerns through desalting, stereochemistry stripping, and normalization, it optimizes molecular descriptors' accuracy and reliability. The freely available resources in KNIME, GitHub, and docker containers democratize access, benefiting collaborative research and advancing diverse modeling endeavors in chemistry and mass spectrometry.

19.
J Biomed Inform ; 46(4): 734-43, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23602781

ABSTRACT

A major goal of Natural Language Processing in the public health informatics domain is the automatic extraction and encoding of data stored in free text patient records. This extracted data can then be utilized by computerized systems to perform syndromic surveillance. In particular, the chief complaint--a short string that describes a patient's symptoms--has come to be a vital resource for syndromic surveillance in the North American context due to its near ubiquity. This paper reviews fifteen systems in North America--at the city, county, state and federal level--that use chief complaints for syndromic surveillance.


Subject(s)
Population Surveillance , Humans , North America , Syndrome
20.
J Med Internet Res ; 15(8): e174, 2013 Aug 29.
Article in English | MEDLINE | ID: mdl-23989137

ABSTRACT

BACKGROUND: Social media platforms such as Twitter are rapidly becoming key resources for public health surveillance applications, yet little is known about Twitter users' levels of informedness and sentiment toward tobacco, especially with regard to the emerging tobacco control challenges posed by hookah and electronic cigarettes. OBJECTIVE: To develop a content and sentiment analysis of tobacco-related Twitter posts and build machine learning classifiers to detect tobacco-relevant posts and sentiment towards tobacco, with a particular focus on new and emerging products like hookah and electronic cigarettes. METHODS: We collected 7362 tobacco-related Twitter posts at 15-day intervals from December 2011 to July 2012. Each tweet was manually classified using a triaxial scheme, capturing genre, theme, and sentiment. Using the collected data, machine-learning classifiers were trained to detect tobacco-related vs irrelevant tweets as well as positive vs negative sentiment, using Naïve Bayes, k-nearest neighbors, and Support Vector Machine (SVM) algorithms. Finally, phi contingency coefficients were computed between each of the categories to discover emergent patterns. RESULTS: The most prevalent genres were first- and second-hand experience and opinion, and the most frequent themes were hookah, cessation, and pleasure. Sentiment toward tobacco was overall more positive (1939/4215, 46% of tweets) than negative (1349/4215, 32%) or neutral among tweets mentioning it, even excluding the 9% of tweets categorized as marketing. Three separate metrics converged to support an emergent distinction between, on one hand, hookah and electronic cigarettes corresponding to positive sentiment, and on the other hand, traditional tobacco products and more general references corresponding to negative sentiment. These metrics included correlations between categories in the annotation scheme (phihookah-positive=0.39; phi(e-cigs)-positive=0.19); correlations between search keywords and sentiment (χ²4=414.50, P<.001, Cramer's V=0.36), and the most discriminating unigram features for positive and negative sentiment ranked by log odds ratio in the machine learning component of the study. In the automated classification tasks, SVMs using a relatively small number of unigram features (500) achieved best performance in discriminating tobacco-related from unrelated tweets (F score=0.85). CONCLUSIONS: Novel insights available through Twitter for tobacco surveillance are attested through the high prevalence of positive sentiment. This positive sentiment is correlated in complex ways with social image, personal experience, and recently popular products such as hookah and electronic cigarettes. Several apparent perceptual disconnects between these products and their health effects suggest opportunities for tobacco control education. Finally, machine classification of tobacco-related posts shows a promising edge over strictly keyword-based approaches, yielding an improved signal-to-noise ratio in Twitter data and paving the way for automated tobacco surveillance applications.


Subject(s)
Internet , Nicotiana , Smoking , Humans
SELECTION OF CITATIONS
SEARCH DETAIL