Búsqueda | Portal de Búsqueda de la BVS

Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study.

Ahne, Adrian; Fagherazzi, Guy; Tannier, Xavier; Czernichow, Thomas; Orchard, Francisco.

J Med Internet Res ; 24(1): e27434, 2022 01 18.

Artículo en Inglés | MEDLINE | ID: mdl-35040795

RESUMEN

BACKGROUND: The amount of available textual health data such as scientific and biomedical literature is constantly growing and becoming more and more challenging for health professionals to properly summarize those data and practice evidence-based clinical decision making. Moreover, the exploration of unstructured health text data is challenging for professionals without computer science knowledge due to limited time, resources, and skills. Current tools to explore text data lack ease of use, require high computational efforts, and incorporate domain knowledge and focus on topics of interest with difficulty. OBJECTIVE: We developed a methodology able to explore and target topics of interest via an interactive user interface for health professionals with limited computer science knowledge. We aim to reach near state-of-the-art performance while reducing memory consumption, increasing scalability, and minimizing user interaction effort to improve the clinical decision-making process. The performance was evaluated on diabetes-related abstracts from PubMed. METHODS: The methodology consists of 4 parts: (1) a novel interpretable hierarchical clustering of documents where each node is defined by headwords (words that best represent the documents in the node), (2) an efficient classification system to target topics, (3) minimized user interaction effort through active learning, and (4) a visual user interface. We evaluated our approach on 50,911 diabetes-related abstracts providing a hierarchical Medical Subject Headings (MeSH) structure, a unique identifier for a topic. Hierarchical clustering performance was compared against the implementation in the machine learning library scikit-learn. On a subset of 2000 randomly chosen diabetes abstracts, our active learning strategy was compared against 3 other strategies: random selection of training instances, uncertainty sampling that chooses instances about which the model is most uncertain, and an expected gradient length strategy based on convolutional neural networks (CNNs). RESULTS: For the hierarchical clustering performance, we achieved an F1 score of 0.73 compared to 0.76 achieved by scikit-learn. Concerning active learning performance, after 200 chosen training samples based on these strategies, the weighted F1 score of all MeSH codes resulted in a satisfying 0.62 F1 score using our approach, 0.61 using the uncertainty strategy, 0.63 using the CNN, and 0.45 using the random strategy. Moreover, our methodology showed a constant low memory use with increased number of documents. CONCLUSIONS: We proposed an easy-to-use tool for health professionals with limited computer science knowledge who combine their domain knowledge with topic exploration and target specific topics of interest while improving transparency. Furthermore, our approach is memory efficient and highly parallelizable, making it interesting for large Big Data sets. This approach can be used by health professionals to gain deep insights into biomedical literature to ultimately improve the evidence-based clinical decision making process.

Asunto(s)

Diabetes Mellitus , Medical Subject Headings , Toma de Decisiones Clínicas , Diabetes Mellitus/terapia , Humanos , Redes Neurales de la Computación , PubMed

Epitweetr: Early warning of public health threats using Twitter data.

Espinosa, Laura; Wijermans, Ariana; Orchard, Francisco; Höhle, Michael; Czernichow, Thomas; Coletti, Pietro; Hermans, Lisa; Faes, Christel; Kissling, Esther; Mollet, Thomas.

Euro Surveill ; 27(39)2022 09.

Artículo en Inglés | MEDLINE | ID: mdl-36177867

RESUMEN

BackgroundThe European Centre for Disease Prevention and Control (ECDC) systematically collates information from sources to rapidly detect early public health threats. The lack of a freely available, customisable and automated early warning tool using data from Twitter prompted the ECDC to develop epitweetr, which collects, geolocates and aggregates tweets generating signals and email alerts.AimThis study aims to compare the performance of epitweetr to manually monitoring tweets for the purpose of early detecting public health threats.MethodsWe calculated the general and specific positive predictive value (PPV) of signals generated by epitweetr between 19 October and 30 November 2020. Sensitivity, specificity, timeliness and accuracy and performance of tweet geolocation and signal detection algorithms obtained from epitweetr and the manual monitoring of 1,200 tweets were compared.ResultsThe epitweetr geolocation algorithm had an accuracy of 30.1% at national, and 25.9% at subnational levels. The signal detection algorithm had 3.0% general PPV and 74.6% specific PPV. Compared to manual monitoring, epitweetr had greater sensitivity (47.9% and 78.6%, respectively), and reduced PPV (97.9% and 74.6%, respectively). Median validation time difference between 16 common events detected by epitweetr and manual monitoring was -48.6 hours (IQR: -102.8 to -23.7).ConclusionEpitweetr has shown sufficient performance as an early warning tool for public health threats using Twitter data. Since epitweetr is a free, open-source tool with configurable settings and a strong automated component, it is expected to increase in usability and usefulness to public health experts.

Asunto(s)

Salud Pública , Medios de Comunicación Sociales , Algoritmos , Recolección de Datos , Humanos

PANDEM-Source, a tool to collect or generate surveillance indicators for pandemic management: a use case with COVID-19 data.

Orchard, Francisco; Clain, Charline; Madie, William; Hayes, Jessica S; Connolly, Máire A; Sevin, Etienne; Sentís, Alexis.

Front Public Health ; 12: 1295117, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38572005

RESUMEN

Introduction: PANDEM-Source (PS) is a tool to collect and integrate openly available public health-related data from heterogeneous data sources to support the surveillance of infectious diseases for pandemic management. The tool may also be used for pandemic preparedness by generating surveillance data for training purposes. It was developed as part of the EU-funded Horizon 2020 PANDEM-2 project during the COVID-19 pandemic as a result of close collaboration in a consortium of 19 partners, including six European public health agencies, one hospital, and three first responder organizations. This manuscript describes PS's features and design to disseminate its characteristics and capabilities to strengthen pandemic preparedness and response. Methods: A requirement-gathering process with EU pandemic managers in the consortium was performed to identify and prioritize a list of variables and indicators useful for surveillance and pandemic management. Using the COVID-19 pandemic as a use case, we developed PS with the purpose of feeding all necessary data to be displayed in the PANDEM-2 dashboard. Results: PS routinely monitors, collects, and standardizes data from open or restricted heterogeneous data sources (users can upload their own data). It supports indicators and health resources related data from traditional data sources reported by national and international agencies, and indicators from non-traditional data sources such as those captured in social and mass media, participatory surveillance, and seroprevalence studies. The tool can also calculate indicators and be used to produce data for training purposes by generating synthetic data from a minimal set of indicators to simulate pandemic scenarios. PS is currently set up for COVID-19 surveillance at the European level but can be adapted to other diseases or threats and regions. Conclusion: With the lessons learnt during the COVID-19 pandemic, it is important to keep building capacity to monitor potential threats and develop tools that can facilitate training in all the necessary aspects to manage future pandemics. PS is open source and its design provides flexibility to collect heterogeneous data from open data sources or to upload end users's own data and customize surveillance indicators. PS is easily adaptable to future threats or different training scenarios. All these features make PS a unique and valuable tool for pandemic management.

Asunto(s)

COVID-19 , Humanos , COVID-19/epidemiología , Pandemias , Estudios Seroepidemiológicos , Salud Pública

Extraction of Explicit and Implicit Cause-Effect Relationships in Patient-Reported Diabetes-Related Tweets From 2017 to 2021: Deep Learning Approach.

Ahne, Adrian; Khetan, Vivek; Tannier, Xavier; Rizvi, Md Imbesat Hassan; Czernichow, Thomas; Orchard, Francisco; Bour, Charline; Fano, Andrew; Fagherazzi, Guy.

JMIR Med Inform ; 10(7): e37201, 2022 Jul 19.

Artículo en Inglés | MEDLINE | ID: mdl-35852829

RESUMEN

BACKGROUND: Intervening in and preventing diabetes distress requires an understanding of its causes and, in particular, from a patient's perspective. Social media data provide direct access to how patients see and understand their disease and consequently show the causes of diabetes distress. OBJECTIVE: Leveraging machine learning methods, we aim to extract both explicit and implicit cause-effect relationships in patient-reported diabetes-related tweets and provide a methodology to better understand the opinions, feelings, and observations shared within the diabetes online community from a causality perspective. METHODS: More than 30 million diabetes-related tweets in English were collected between April 2017 and January 2021. Deep learning and natural language processing methods were applied to focus on tweets with personal and emotional content. A cause-effect tweet data set was manually labeled and used to train (1) a fine-tuned BERTweet model to detect causal sentences containing a causal relation and (2) a conditional random field model with Bidirectional Encoder Representations from Transformers (BERT)-based features to extract possible cause-effect associations. Causes and effects were clustered in a semisupervised approach and visualized in an interactive cause-effect network. RESULTS: Causal sentences were detected with a recall of 68% in an imbalanced data set. A conditional random field model with BERT-based features outperformed a fine-tuned BERT model for cause-effect detection with a macro recall of 68%. This led to 96,676 sentences with cause-effect relationships. "Diabetes" was identified as the central cluster followed by "death" and "insulin." Insulin pricing-related causes were frequently associated with death. CONCLUSIONS: A novel methodology was developed to detect causal sentences and identify both explicit and implicit, single and multiword cause, and the corresponding effect, as expressed in diabetes-related tweets leveraging BERT-based architectures and visualized as cause-effect network. Extracting causal associations in real life, patient-reported outcomes in social media data provide a useful complementary source of information in diabetes research.

Insulin pricing and other major diabetes-related concerns in the USA: a study of 46 407 tweets between 2017 and 2019.

Ahne, Adrian; Orchard, Francisco; Tannier, Xavier; Perchoux, Camille; Balkau, Beverley; Pagoto, Sherry; Harding, Jessica Lee; Czernichow, Thomas; Fagherazzi, Guy.

BMJ Open Diabetes Res Care ; 8(1)2020 06.

Artículo en Inglés | MEDLINE | ID: mdl-32503810

RESUMEN

INTRODUCTION: Little research has been done to systematically evaluate concerns of people living with diabetes through social media, which has been a powerful tool for social change and to better understand perceptions around health-related issues. This study aims to identify key diabetes-related concerns in the USA and primary emotions associated with those concerns using information shared on Twitter. RESEARCH DESIGN AND METHODS: A total of 11.7 million diabetes-related tweets in English were collected between April 2017 and July 2019. Machine learning methods were used to filter tweets with personal content, to geolocate (to the USA) and to identify clusters of tweets with emotional elements. A sentiment analysis was then applied to each cluster. RESULTS: We identified 46 407 tweets with emotional elements in the USA from which 30 clusters were identified; 5 clusters (18% of tweets) were related to insulin pricing with both positive emotions (joy, love) referring to advocacy for affordable insulin and sadness emotions related to the frustration of insulin prices, 5 clusters (12% of tweets) to solidarity and support with a majority of joy and love emotions expressed. The most negative topics (10% of tweets) were related to diabetes distress (24% sadness, 27% anger, 21% fear elements), to diabetic and insulin shock (45% anger, 46% fear) and comorbidities (40% sadness). CONCLUSIONS: Using social media data, we have been able to describe key diabetes-related concerns and their associated emotions. More specifically, we were able to highlight the real-world concerns of insulin pricing and its negative impact on mood. Using such data can be a useful addition to current measures that inform public decision making around topics of concern and burden among people with diabetes.

Asunto(s)

Diabetes Mellitus , Medios de Comunicación Sociales , Costos y Análisis de Costo , Diabetes Mellitus/tratamiento farmacológico , Diabetes Mellitus/epidemiología , Emociones , Humanos , Insulina , Estados Unidos/epidemiología

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA