Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Biomed Inform ; 142: 104343, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36935011

RESUMO

Clinical documentation in electronic health records contains crucial narratives and details about patients and their care. Natural language processing (NLP) can unlock the information conveyed in clinical notes and reports, and thus plays a critical role in real-world studies. The NLP Working Group at the Observational Health Data Sciences and Informatics (OHDSI) consortium was established to develop methods and tools to promote the use of textual data and NLP in real-world observational studies. In this paper, we describe a framework for representing and utilizing textual data in real-world evidence generation, including representations of information from clinical text in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), the workflow and tools that were developed to extract, transform and load (ETL) data from clinical notes into tables in OMOP CDM, as well as current applications and specific use cases of the proposed OHDSI NLP solution at large consortia and individual institutions with English textual data. Challenges faced and lessons learned during the process are also discussed to provide valuable insights for researchers who are planning to implement NLP solutions in real-world studies.


Assuntos
Ciência de Dados , Informática Médica , Humanos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Narração
2.
Rheumatology (Oxford) ; 60(7): 3222-3234, 2021 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-33367863

RESUMO

OBJECTIVES: Concern has been raised in the rheumatology community regarding recent regulatory warnings that HCQ used in the coronavirus disease 2019 pandemic could cause acute psychiatric events. We aimed to study whether there is risk of incident depression, suicidal ideation or psychosis associated with HCQ as used for RA. METHODS: We performed a new-user cohort study using claims and electronic medical records from 10 sources and 3 countries (Germany, UK and USA). RA patients ≥18 years of age and initiating HCQ were compared with those initiating SSZ (active comparator) and followed up in the short (30 days) and long term (on treatment). Study outcomes included depression, suicide/suicidal ideation and hospitalization for psychosis. Propensity score stratification and calibration using negative control outcomes were used to address confounding. Cox models were fitted to estimate database-specific calibrated hazard ratios (HRs), with estimates pooled where I2 <40%. RESULTS: A total of 918 144 and 290 383 users of HCQ and SSZ, respectively, were included. No consistent risk of psychiatric events was observed with short-term HCQ (compared with SSZ) use, with meta-analytic HRs of 0.96 (95% CI 0.79, 1.16) for depression, 0.94 (95% CI 0.49, 1.77) for suicide/suicidal ideation and 1.03 (95% CI 0.66, 1.60) for psychosis. No consistent long-term risk was seen, with meta-analytic HRs of 0.94 (95% CI 0.71, 1.26) for depression, 0.77 (95% CI 0.56, 1.07) for suicide/suicidal ideation and 0.99 (95% CI 0.72, 1.35) for psychosis. CONCLUSION: HCQ as used to treat RA does not appear to increase the risk of depression, suicide/suicidal ideation or psychosis compared with SSZ. No effects were seen in the short or long term. Use at a higher dose or for different indications needs further investigation. TRIAL REGISTRATION: Registered with EU PAS (reference no. EUPAS34497; http://www.encepp.eu/encepp/viewResource.htm? id=34498). The full study protocol and analysis source code can be found at https://github.com/ohdsi-studies/Covid19EstimationHydroxychloroquine2.


Assuntos
Antirreumáticos/efeitos adversos , Tratamento Farmacológico da COVID-19 , Depressão/induzido quimicamente , Depressão/epidemiologia , Hidroxicloroquina/efeitos adversos , Psicoses Induzidas por Substâncias/epidemiologia , Psicoses Induzidas por Substâncias/etiologia , Ideação Suicida , Suicídio/estatística & dados numéricos , Adolescente , Adulto , Idoso , Antirreumáticos/uso terapêutico , Artrite Reumatoide/tratamento farmacológico , Estudos de Coortes , Feminino , Alemanha , Humanos , Hidroxicloroquina/uso terapêutico , Masculino , Pessoa de Meia-Idade , Medição de Risco , Reino Unido , Estados Unidos , Adulto Jovem
3.
BMC Infect Dis ; 21(1): 432, 2021 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-33962563

RESUMO

BACKGROUND: Low testing rates and delays in reporting hinder the estimation of the mortality burden associated with the COVID-19 pandemic. During a public health emergency, estimating all cause excess deaths above an expected level of death can provide a more reliable picture of the mortality burden. Here, we aim to estimate the absolute and relative mortality impact of COVID-19 pandemic in Mexico. METHODS: We obtained weekly mortality time series due to all causes for Mexico, and by gender, and geographic region from 2015 to 2020. We also compiled surveillance data on COVID-19 cases and deaths to assess the timing and intensity of the pandemic and assembled weekly series of the proportion of tweets about 'death' from Mexico to assess the correlation between people's media interaction about 'death' and the rise in pandemic deaths. We estimated all-cause excess mortality rates and mortality rate ratio increase over baseline by fitting Serfling regression models and forecasted the total excess deaths for Mexico for the first 4 weeks of 2021 using the generalized logistic growth model. RESULTS: We estimated the all-cause excess mortality rate associated with the COVID-19 pandemic in Mexico in 2020 at 26.10 per 10,000 population, which corresponds to 333,538 excess deaths. Males had about 2-fold higher excess mortality rate (33.99) compared to females (18.53). Mexico City reported the highest excess death rate (63.54) and RR (2.09) compared to rest of the country (excess rate = 23.25, RR = 1.62). While COVID-19 deaths accounted for only 38.64% of total excess deaths in Mexico, our forecast estimate that Mexico has accumulated a total of ~ 61,610 [95% PI: 60,003, 63,216] excess deaths in the first 4 weeks of 2021. Proportion of tweets was significantly correlated with the excess mortality (ρ = 0.508 [95% CI: 0.245, 0.701], p-value = 0.0004). CONCLUSION: The COVID-19 pandemic has heavily affected Mexico. The lab-confirmed COVID-19 deaths accounted for only 38.64% of total all cause excess deaths (333,538) in Mexico in 2020. This reflects either the effect of low testing rates in Mexico, or the surge in number of deaths due to other causes during the pandemic. A model-based forecast indicates that an average of 61,610 excess deaths have occurred in January 2021.


Assuntos
COVID-19/mortalidade , COVID-19/epidemiologia , Cidades/epidemiologia , Feminino , Humanos , Masculino , México/epidemiologia , Mídias Sociais
4.
J Biomed Inform ; 120: 103844, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34153432

RESUMO

The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the "#medtwitter" community on Twitter). However, identifying clinically-relevant content in social media without manual labeling is a challenge because of the sheer volume of irrelevant data. We present an unsupervised, iterative approach to mine clinically relevant information from social media data, which begins by heuristically filtering for HCP-authored texts and incorporates topic modeling and concept extraction with MetaMap. This approach identifies granular topics and tweets with high clinical relevance from a set of about 52 million COVID-19-related tweets from January to mid-June 2020. We also show that because the technique does not require manual labeling, it can be used to identify emerging topics on a week-to-week basis. Our method can aid in future public-health emergencies by facilitating knowledge transfer among healthcare workers in a rapidly-changing information environment, and by providing an efficient and unsupervised way of highlighting potential areas for clinical research.


Assuntos
COVID-19 , Mídias Sociais , Humanos , Armazenamento e Recuperação da Informação , Pandemias , SARS-CoV-2
5.
J Med Internet Res ; 23(8): e28716, 2021 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-34227996

RESUMO

BACKGROUND: News media coverage of antimask protests, COVID-19 conspiracies, and pandemic politicization has overemphasized extreme views but has done little to represent views of the general public. Investigating the public's response to various pandemic restrictions can provide a more balanced assessment of current views, allowing policy makers to craft better public health messages in anticipation of poor reactions to controversial restrictions. OBJECTIVE: Using data from social media, this infoveillance study aims to understand the changes in public opinion associated with the implementation of COVID-19 restrictions (eg, business and school closures, regional lockdown differences, and additional public health restrictions, such as social distancing and masking). METHODS: COVID-19-related tweets in Ontario (n=1,150,362) were collected based on keywords between March 12 and October 31, 2020. Sentiment scores were calculated using the VADER (Valence Aware Dictionary and Sentiment Reasoner) algorithm for each tweet to represent its negative to positive emotion. Public health restrictions were identified using government and news media websites. Dynamic regression models with autoregressive integrated moving average errors were used to examine the association between public health restrictions and changes in public opinion over time (ie, collective attention, aggregate positive sentiment, and level of disagreement), controlling for the effects of confounders (ie, daily COVID-19 case counts, holidays, and COVID-19-related official updates). RESULTS: In addition to expected direct effects (eg, business closures led to decreased positive sentiment and increased disagreements), the impact of restrictions on public opinion was contextually driven. For example, the negative sentiment associated with business closures was reduced with higher COVID-19 case counts. While school closures and other restrictions (eg, masking, social distancing, and travel restrictions) generated increased collective attention, they did not have an effect on aggregate sentiment or the level of disagreement (ie, sentiment polarization). Partial (ie, region-targeted) lockdowns were associated with better public response (ie, higher number of tweets with net positive sentiment and lower levels of disagreement) compared to province-wide lockdowns. CONCLUSIONS: Our study demonstrates the feasibility of a rapid and flexible method of evaluating the public response to pandemic restrictions using near real-time social media data. This information can help public health practitioners and policy makers anticipate public response to future pandemic restrictions and ensure adequate resources are dedicated to addressing increases in negative sentiment and levels of disagreement in the face of scientifically informed, but controversial, restrictions.


Assuntos
COVID-19 , Mídias Sociais , Controle de Doenças Transmissíveis , Humanos , Ontário , SARS-CoV-2
6.
Proc Natl Acad Sci U S A ; 113(27): 7329-36, 2016 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-27274072

RESUMO

Observational research promises to complement experimental research by providing large, diverse populations that would be infeasible for an experiment. Observational research can test its own clinical hypotheses, and observational studies also can contribute to the design of experiments and inform the generalizability of experimental research. Understanding the diversity of populations and the variance in care is one component. In this study, the Observational Health Data Sciences and Informatics (OHDSI) collaboration created an international data network with 11 data sources from four countries, including electronic health records and administrative claims data on 250 million patients. All data were mapped to common data standards, patient privacy was maintained by using a distributed model, and results were aggregated centrally. Treatment pathways were elucidated for type 2 diabetes mellitus, hypertension, and depression. The pathways revealed that the world is moving toward more consistent therapy over time across diseases and across locations, but significant heterogeneity remains among sources, pointing to challenges in generalizing clinical trial results. Diabetes favored a single first-line medication, metformin, to a much greater extent than hypertension or depression. About 10% of diabetes and depression patients and almost 25% of hypertension patients followed a treatment pathway that was unique within the cohort. Aside from factors such as sample size and underlying population (academic medical center versus general population), electronic health records data and administrative claims data revealed similar results. Large-scale international observational research is feasible.


Assuntos
Padrões de Prática Médica/estatística & dados numéricos , Antidepressivos/uso terapêutico , Anti-Hipertensivos/uso terapêutico , Bases de Dados Factuais , Depressão/tratamento farmacológico , Diabetes Mellitus Tipo 2/tratamento farmacológico , Registros Eletrônicos de Saúde , Humanos , Hipertensão/tratamento farmacológico , Hipoglicemiantes/uso terapêutico , Internacionalidade , Informática Médica
7.
Epilepsia ; 58(8): e101-e106, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28681416

RESUMO

Recent adverse event reports have raised the question of increased angioedema risk associated with exposure to levetiracetam. To help address this question, the Observational Health Data Sciences and Informatics research network conducted a retrospective observational new-user cohort study of seizure patients exposed to levetiracetam (n = 276,665) across 10 databases. With phenytoin users (n = 74,682) as a comparator group, propensity score-matching was conducted and hazard ratios computed for angioedema events by per-protocol and intent-to-treat analyses. Angioedema events were rare in both the levetiracetam and phenytoin groups (54 vs. 71 in per-protocol and 248 vs. 435 in intent-to-treat). No significant increase in angioedema risk with levetiracetam was seen in any individual database (hazard ratios ranging from 0.43 to 1.31). Meta-analysis showed a summary hazard ratio of 0.72 (95% confidence interval [CI] 0.39-1.31) and 0.64 (95% CI 0.52-0.79) for the per-protocol and intent-to-treat analyses, respectively. The results suggest that levetiracetam has the same or lower risk for angioedema than phenytoin, which does not currently carry a labeled warning for angioedema. Further studies are warranted to evaluate angioedema risk across all antiepileptic drugs.


Assuntos
Angioedema/induzido quimicamente , Angioedema/epidemiologia , Epilepsia/tratamento farmacológico , Fenitoína/efeitos adversos , Piracetam/análogos & derivados , Redes Comunitárias/estatística & dados numéricos , Bases de Dados Factuais/estatística & dados numéricos , Feminino , Humanos , Levetiracetam , Masculino , Piracetam/efeitos adversos
9.
J Am Med Inform Assoc ; 31(4): 991-996, 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38218723

RESUMO

OBJECTIVE: The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. In this paper, we present the annotated corpora, a technical summary of participants' systems, and the performance results. METHODS: The eighth iteration of the #SMM4H shared tasks was hosted at the AMIA 2023 Annual Symposium and consisted of 5 tasks that represented various social media platforms (Twitter and Reddit), languages (English and Spanish), methods (binary classification, multi-class classification, extraction, and normalization), and topics (COVID-19, therapies, social anxiety disorder, and adverse drug events). RESULTS: In total, 29 teams registered, representing 17 countries. In general, the top-performing systems used deep neural network architectures based on pre-trained transformer models. In particular, the top-performing systems for the classification tasks were based on single models that were pre-trained on social media corpora. CONCLUSION: To facilitate future work, the datasets-a total of 61 353 posts-will remain available by request, and the CodaLab sites will remain active for a post-evaluation phase.


Assuntos
Mídias Sociais , Humanos , Mineração de Dados/métodos , Aprendizado de Máquina , Processamento de Linguagem Natural , Redes Neurais de Computação
10.
JAMIA Open ; 6(2): ooad043, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37397506

RESUMO

Objective: Biases within probabilistic electronic phenotyping algorithms are largely unexplored. In this work, we characterize differences in subgroup performance of phenotyping algorithms for Alzheimer's disease and related dementias (ADRD) in older adults. Materials and methods: We created an experimental framework to characterize the performance of probabilistic phenotyping algorithms under different racial distributions allowing us to identify which algorithms may have differential performance, by how much, and under what conditions. We relied on rule-based phenotype definitions as reference to evaluate probabilistic phenotype algorithms created using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation framework. Results: We demonstrate that some algorithms have performance variations anywhere from 3% to 30% for different populations, even when not using race as an input variable. We show that while performance differences in subgroups are not present for all phenotypes, they do affect some phenotypes and groups more disproportionately than others. Discussion: Our analysis establishes the need for a robust evaluation framework for subgroup differences. The underlying patient populations for the algorithms showing subgroup performance differences have great variance between model features when compared with the phenotypes with little to no differences. Conclusion: We have created a framework to identify systematic differences in the performance of probabilistic phenotyping algorithms specifically in the context of ADRD as a use case. Differences in subgroup performance of probabilistic phenotyping algorithms are not widespread nor do they occur consistently. This highlights the great need for careful ongoing monitoring to evaluate, measure, and try to mitigate such differences.

11.
AMIA Annu Symp Proc ; 2023: 834-843, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38222429

RESUMO

The types of clinical notes in electronic health records (EHRs) are diverse and it would be great to standardize them to ensure unified data retrieval, exchange, and integration. The LOINC Document Ontology (DO) is a subset of LOINC that is created specifically for naming and describing clinical documents. Despite the efforts of promoting and improving this ontology, how to efficiently deploy it in real-world clinical settings has yet to be explored. In this study we evaluated the utility of LOINC DO by mapping clinical note titles collected from five institutions to the LOINC DO and classifying the mapping into three classes based on semantic similarity between note titles and LOINC DO codes. Additionally, we developed a standardization pipeline that automatically maps clinical note titles from multiple sites to suitable LOINC DO codes, without accessing the content of clinical notes. The pipeline can be initialized with different large language models, and we compared the performances between them. The results showed that our automated pipeline achieved an accuracy of 0.90. By comparing the manual and automated mapping results, we analyzed the coverage of LOINC DO in describing multi-site clinical note titles and summarized the potential scope for extension.


Assuntos
Registros Eletrônicos de Saúde , Logical Observation Identifiers Names and Codes , Humanos , Armazenamento e Recuperação da Informação , Semântica
12.
medRxiv ; 2023 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-37986776

RESUMO

The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. The eighth iteration of the #SMM4H shared tasks was hosted at the AMIA 2023 Annual Symposium and consisted of five tasks that represented various social media platforms (Twitter and Reddit), languages (English and Spanish), methods (binary classification, multi-class classification, extraction, and normalization), and topics (COVID-19, therapies, social anxiety disorder, and adverse drug events). In total, 29 teams registered, representing 18 countries. In this paper, we present the annotated corpora, a technical summary of the systems, and the performance results. In general, the top-performing systems used deep neural network architectures based on pre-trained transformer models. In particular, the top-performing systems for the classification tasks were based on single models that were pre-trained on social media corpora. To facilitate future work, the datasets-a total of 61,353 posts-will remain available by request, and the CodaLab sites will remain active for a post-evaluation phase.

13.
Database (Oxford) ; 20232023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36734300

RESUMO

This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user's timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user's timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/. The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Humanos , Mineração de Dados/métodos
14.
J Am Med Inform Assoc ; 30(5): 859-868, 2023 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-36826399

RESUMO

OBJECTIVE: Observational studies can impact patient care but must be robust and reproducible. Nonreproducibility is primarily caused by unclear reporting of design choices and analytic procedures. This study aimed to: (1) assess how the study logic described in an observational study could be interpreted by independent researchers and (2) quantify the impact of interpretations' variability on patient characteristics. MATERIALS AND METHODS: Nine teams of highly qualified researchers reproduced a cohort from a study by Albogami et al. The teams were provided the clinical codes and access to the tools to create cohort definitions such that the only variable part was their logic choices. We executed teams' cohort definitions against the database and compared the number of subjects, patient overlap, and patient characteristics. RESULTS: On average, the teams' interpretations fully aligned with the master implementation in 4 out of 10 inclusion criteria with at least 4 deviations per team. Cohorts' size varied from one-third of the master cohort size to 10 times the cohort size (2159-63 619 subjects compared to 6196 subjects). Median agreement was 9.4% (interquartile range 15.3-16.2%). The teams' cohorts significantly differed from the master implementation by at least 2 baseline characteristics, and most of the teams differed by at least 5. CONCLUSIONS: Independent research teams attempting to reproduce the study based on its free-text description alone produce different implementations that vary in the population size and composition. Sharing analytical code supported by a common data model and open-source tools allows reproducing a study unambiguously thereby preserving initial design choices.


Assuntos
Pesquisadores , Humanos , Bases de Dados Factuais
15.
NPJ Digit Med ; 6(1): 89, 2023 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-37208468

RESUMO

Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.

16.
PLoS Negl Trop Dis ; 16(3): e0010228, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35245285

RESUMO

Colombia announced the first case of severe acute respiratory syndrome coronavirus 2 on March 6, 2020. Since then, the country has reported a total of 5,002,387 cases and 127,258 deaths as of October 31, 2021. The aggressive transmission dynamics of SARS-CoV-2 motivate an investigation of COVID-19 at the national and regional levels in Colombia. We utilize the case incidence and mortality data to estimate the transmission potential and generate short-term forecasts of the COVID-19 pandemic to inform the public health policies using previously validated mathematical models. The analysis is augmented by the examination of geographic heterogeneity of COVID-19 at the departmental level along with the investigation of mobility and social media trends. Overall, the national and regional reproduction numbers show sustained disease transmission during the early phase of the pandemic, exhibiting sub-exponential growth dynamics. Whereas the most recent estimates of reproduction number indicate disease containment, with Rt<1.0 as of October 31, 2021. On the forecasting front, the sub-epidemic model performs best at capturing the 30-day ahead COVID-19 trajectory compared to the Richards and generalized logistic growth model. Nevertheless, the spatial variability in the incidence rate patterns across different departments can be grouped into four distinct clusters. As the case incidence surged in July 2020, an increase in mobility patterns was also observed. On the contrary, a spike in the number of tweets indicating the stay-at-home orders was observed in November 2020 when the case incidence had already plateaued, indicating the pandemic fatigue in the country.


Assuntos
COVID-19 , Pandemias , COVID-19/epidemiologia , Colômbia/epidemiologia , Previsões , Humanos , SARS-CoV-2
17.
Clin Epidemiol ; 14: 369-384, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35345821

RESUMO

Purpose: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) Characterizing Health Associated Risks and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation and analysis of COVID-19 RWD. Patients and Methods: We conducted a descriptive retrospective database study using a federated network of data partners in the United States, Europe (the Netherlands, Spain, the UK, Germany, France and Italy) and Asia (South Korea and China). The study protocol and analytical package were released on 11th June 2020 and are iteratively updated via GitHub. We identified three non-mutually exclusive cohorts of 4,537,153 individuals with a clinical COVID-19 diagnosis or positive test, 886,193 hospitalized with COVID-19, and 113,627 hospitalized with COVID-19 requiring intensive services. Results: We aggregated over 22,000 unique characteristics describing patients with COVID-19. All comorbidities, symptoms, medications, and outcomes are described by cohort in aggregate counts and are readily available online. Globally, we observed similarities in the USA and Europe: more women diagnosed than men but more men hospitalized than women, most diagnosed cases between 25 and 60 years of age versus most hospitalized cases between 60 and 80 years of age. South Korea differed with more women than men hospitalized. Common comorbidities included type 2 diabetes, hypertension, chronic kidney disease and heart disease. Common presenting symptoms were dyspnea, cough and fever. Symptom data availability was more common in hospitalized cohorts than diagnosed. Conclusion: We constructed a global, multi-centre view to describe trends in COVID-19 progression, management and evolution over time. By characterising baseline variability in patients and geography, our work provides critical context that may otherwise be misconstrued as data quality issues. This is important as we perform studies on adverse events of special interest in COVID-19 vaccine surveillance.

18.
Neural Comput Appl ; : 1-9, 2021 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-34728902

RESUMO

Twitter has been a remarkable resource for research in pharmacovigilance in the last decade. Traditionally, rule- or lexicon-based methods have been utilized for automatically extracting drug tweets for human annotation. The process of human annotation to create labeled sets for machine learning models is laborious, time consuming and not scalable. In this work, we demonstrate the feasibility of applying weak supervision (noisy labeling) to select drug data, and build machine learning models using large amounts of noisy labeled data instead of limited gold standard labelled sets. Our results demonstrate the models built with large amounts of noisy data achieve similar performance than models trained on limited gold standard datasets, hence demonstrating that weak supervision helps reduce the need to rely on manual annotation, allowing more data to be easily labeled and useful for downstream machine learning applications, in this case drug mention identification.

19.
Genomics Inform ; 19(3): e21, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34638168

RESUMO

The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present. However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations don't generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best-practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.

20.
ArXiv ; 2021 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-34341767

RESUMO

The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the COVID-19 pandemic, researchers have turned to more nontraditional sources of clinical data to characterize the disease in near real-time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present (Long-COVID). However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations do not generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA