Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29.793
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 184(8): 2068-2083.e11, 2021 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-33861964

RESUMO

Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations. Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups. This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.


Assuntos
Etnicidade/genética , Saúde da População , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Genômica , Humanos , Autorrelato
2.
Cell ; 184(6): 1415-1419, 2021 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-33740447

RESUMO

Precision medicine promises improved health by accounting for individual variability in genes, environment, and lifestyle. Precision medicine will continue to transform healthcare in the coming decade as it expands in key areas: huge cohorts, artificial intelligence (AI), routine clinical genomics, phenomics and environment, and returning value across diverse populations.


Assuntos
Atenção à Saúde , Medicina de Precisão , Inteligência Artificial , Big Data , Pesquisa Biomédica , Diversidade Cultural , Registros Eletrônicos de Saúde , Humanos , Fenômica
3.
Cell ; 177(1): 58-69, 2019 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-30901549

RESUMO

Personalized medicine has largely been enabled by the integration of genomic and other data with electronic health records (EHRs) in the United States and elsewhere. Increased EHR adoption across various clinical settings and the establishment of EHR-linked population-based biobanks provide unprecedented opportunities for the types of translational and implementation research that drive personalized medicine. We review advances in the digitization of health information and the proliferation of genomic research in health systems and provide insights into emerging paths for the widespread implementation of personalized medicine.


Assuntos
Registros Eletrônicos de Saúde/tendências , Medicina de Precisão/métodos , Medicina de Precisão/tendências , Testes Genéticos , Genoma Humano/genética , Genômica/métodos , Genômica/tendências , Humanos , Estados Unidos
4.
Cell ; 173(7): 1692-1704.e11, 2018 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-29779949

RESUMO

Heritability is essential for understanding the biological causes of disease but requires laborious patient recruitment and phenotype ascertainment. Electronic health records (EHRs) passively capture a wide range of clinically relevant data and provide a resource for studying the heritability of traits that are not typically accessible. EHRs contain next-of-kin information collected via patient emergency contact forms, but until now, these data have gone unused in research. We mined emergency contact data at three academic medical centers and identified 7.4 million familial relationships while maintaining patient privacy. Identified relationships were consistent with genetically derived relatedness. We used EHR data to compute heritability estimates for 500 disease phenotypes. Overall, estimates were consistent with the literature and between sites. Inconsistencies were indicative of limitations and opportunities unique to EHR research. These analyses provide a validation of the use of EHRs for genetics and disease research.


Assuntos
Registros Eletrônicos de Saúde , Doenças Genéticas Inatas/genética , Algoritmos , Bases de Dados Factuais , Relações Familiares , Doenças Genéticas Inatas/patologia , Genótipo , Humanos , Linhagem , Fenótipo , Característica Quantitativa Herdável
5.
Nature ; 627(8003): 340-346, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38374255

RESUMO

Comprehensively mapping the genetic basis of human disease across diverse individuals is a long-standing goal for the field of human genetics1-4. The All of Us Research Program is a longitudinal cohort study aiming to enrol a diverse group of at least one million individuals across the USA to accelerate biomedical research and improve human health5,6. Here we describe the programme's genomics data release of 245,388 clinical-grade genome sequences. This resource is unique in its diversity as 77% of participants are from communities that are historically under-represented in biomedical research and 46% are individuals from under-represented racial and ethnic minorities. All of Us identified more than 1 billion genetic variants, including more than 275 million previously unreported genetic variants, more than 3.9 million of which had coding consequences. Leveraging linkage between genomic data and the longitudinal electronic health record, we evaluated 3,724 genetic variants associated with 117 diseases and found high replication rates across both participants of European ancestry and participants of African ancestry. Summary-level data are publicly available, and individual-level data can be accessed by researchers through the All of Us Researcher Workbench using a unique data passport model with a median time from initial researcher registration to data access of 29 hours. We anticipate that this diverse dataset will advance the promise of genomic medicine for all.


Assuntos
Conjuntos de Dados como Assunto , Genética Médica , Genética Populacional , Genoma Humano , Genômica , Grupos Minoritários , Grupos Raciais , Humanos , Acesso à Informação , População Negra/genética , Registros Eletrônicos de Saúde , Etnicidade/genética , População Europeia/genética , Predisposição Genética para Doença/genética , Variação Genética/genética , Genoma Humano/genética , Estudos Longitudinais , Grupos Raciais/genética , Reprodutibilidade dos Testes , Pesquisadores , Fatores de Tempo , Populações Vulneráveis
6.
CA Cancer J Clin ; 72(3): 287-300, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-34964981

RESUMO

Generating evidence on the use, effectiveness, and safety of new cancer therapies is a priority for researchers, health care providers, payers, and regulators given the rapid pace of change in cancer diagnosis and treatments. The use of real-world data (RWD) is integral to understanding the utilization patterns and outcomes of these new treatments among patients with cancer who are treated in clinical practice and community settings. An initial step in the use of RWD is careful study design to assess the suitability of an RWD source. This pivotal process can be guided by using a conceptual model that encourages predesign conceptualization. The primary types of RWD included are electronic health records, administrative claims data, cancer registries, and specialty data providers and networks. Careful consideration of each data type is necessary because they are collected for a specific purpose, capturing a set of data elements within a certain population for that purpose, and they vary by population coverage and longitudinality. In this review, the authors provide a high-level assessment of the strengths and limitations of each data category to inform data source selection appropriate to the study question. Overall, the development and accessibility of RWD sources for cancer research are rapidly increasing, and the use of these data requires careful consideration of composition and utility to assess important questions in understanding the use and effectiveness of new therapies.


Assuntos
Armazenamento e Recuperação da Informação , Oncologia , Registros Eletrônicos de Saúde , Humanos , Sistema de Registros , Projetos de Pesquisa
7.
Nature ; 616(7956): 259-265, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-37045921

RESUMO

The exceptionally rapid development of highly flexible, reusable artificial intelligence (AI) models is likely to usher in newfound capabilities in medicine. We propose a new paradigm for medical AI, which we refer to as generalist medical AI (GMAI). GMAI models will be capable of carrying out a diverse set of tasks using very little or no task-specific labelled data. Built through self-supervision on large, diverse datasets, GMAI will flexibly interpret different combinations of medical modalities, including data from imaging, electronic health records, laboratory results, genomics, graphs or medical text. Models will in turn produce expressive outputs such as free-text explanations, spoken recommendations or image annotations that demonstrate advanced medical reasoning abilities. Here we identify a set of high-impact potential applications for GMAI and lay out specific technical capabilities and training datasets necessary to enable them. We expect that GMAI-enabled applications will challenge current strategies for regulating and validating AI devices for medicine and will shift practices associated with the collection of large medical datasets.


Assuntos
Inteligência Artificial , Medicina , Diagnóstico por Imagem , Registros Eletrônicos de Saúde , Genômica , Conjuntos de Dados como Assunto , Aprendizado de Máquina não Supervisionado , Humanos
8.
Nature ; 619(7969): 357-362, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37286606

RESUMO

Physicians make critical time-constrained decisions every day. Clinical predictive models can help physicians and administrators make decisions by forecasting clinical and operational events. Existing structured data-based clinical predictive models have limited use in everyday practice owing to complexity in data processing, as well as model development and deployment1-3. Here we show that unstructured clinical notes from the electronic health record can enable the training of clinical language models, which can be used as all-purpose clinical predictive engines with low-resistance development and deployment. Our approach leverages recent advances in natural language processing4,5 to train a large language model for medical language (NYUTron) and subsequently fine-tune it across a wide range of clinical and operational predictive tasks. We evaluated our approach within our health system for five such tasks: 30-day all-cause readmission prediction, in-hospital mortality prediction, comorbidity index prediction, length of stay prediction, and insurance denial prediction. We show that NYUTron has an area under the curve (AUC) of 78.7-94.9%, with an improvement of 5.36-14.7% in the AUC compared with traditional models. We additionally demonstrate the benefits of pretraining with clinical text, the potential for increasing generalizability to different sites through fine-tuning and the full deployment of our system in a prospective, single-arm trial. These results show the potential for using clinical language models in medicine to read alongside physicians and provide guidance at the point of care.


Assuntos
Tomada de Decisão Clínica , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Médicos , Humanos , Tomada de Decisão Clínica/métodos , Readmissão do Paciente , Mortalidade Hospitalar , Comorbidade , Tempo de Internação , Cobertura do Seguro , Área Sob a Curva , Sistemas Automatizados de Assistência Junto ao Leito/tendências , Ensaios Clínicos como Assunto
9.
Nature ; 613(7944): 519-525, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36653560

RESUMO

Identifying causal factors for Mendelian and common diseases is an ongoing challenge in medical genetics1. Population bottleneck events, such as those that occurred in the history of the Finnish population, enrich some homozygous variants to higher frequencies, which facilitates the identification of variants that cause diseases with recessive inheritance2,3. Here we examine the homozygous and heterozygous effects of 44,370 coding variants on 2,444 disease phenotypes using data from the nationwide electronic health records of 176,899 Finnish individuals. We find associations for homozygous genotypes across a broad spectrum of phenotypes, including known associations with retinal dystrophy and novel associations with adult-onset cataract and female infertility. Of the recessive disease associations that we identify, 13 out of 20 would have been missed by the additive model that is typically used in genome-wide association studies. We use these results to find many known Mendelian variants whose inheritance cannot be adequately described by a conventional definition of dominant or recessive. In particular, we find variants that are known to cause diseases with recessive inheritance with significant heterozygous phenotypic effects. Similarly, we find presumed benign variants with disease effects. Our results show how biobanks, particularly in founder populations, can broaden our understanding of complex dosage effects of Mendelian variants on disease.


Assuntos
Alelos , Bancos de Espécimes Biológicos , Doença , Animais , Feminino , Estudo de Associação Genômica Ampla , Fenótipo , Doença/genética , Finlândia , Distrofias Retinianas , Catarata , Infertilidade Feminina , Genes Recessivos , Heterozigoto , Efeito Fundador , Dosagem de Genes , Registros Eletrônicos de Saúde
10.
Nature ; 624(7990): 138-144, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37968391

RESUMO

Diabetes is a leading cause of morbidity, mortality and cost of illness1,2. Health behaviours, particularly those related to nutrition and physical activity, play a key role in the development of type 2 diabetes mellitus3. Whereas behaviour change programmes (also known as lifestyle interventions or similar) have been found efficacious in controlled clinical trials4,5, there remains controversy about whether targeting health behaviours at the individual level is an effective preventive strategy for type 2 diabetes mellitus6 and doubt among clinicians that lifestyle advice and counselling provided in the routine health system can achieve improvements in health7-9. Here we show that being referred to the largest behaviour change programme for prediabetes globally (the English Diabetes Prevention Programme) is effective in improving key cardiovascular risk factors, including glycated haemoglobin (HbA1c), excess body weight and serum lipid levels. We do so by using a regression discontinuity design10, which uses the eligibility threshold in HbA1c for referral to the behaviour change programme, in electronic health data from about one-fifth of all primary care practices in England. We confirm our main finding, the improvement of HbA1c, using two other quasi-experimental approaches: difference-in-differences analysis exploiting the phased roll-out of the programme and instrumental variable estimation exploiting regional variation in programme coverage. This analysis provides causal, rather than associational, evidence that lifestyle advice and counselling implemented at scale in a national health system can achieve important health improvements.


Assuntos
Diabetes Mellitus Tipo 2 , Comportamentos Relacionados com a Saúde , Promoção da Saúde , Programas Nacionais de Saúde , Estado Pré-Diabético , Humanos , Peso Corporal , Diabetes Mellitus Tipo 2/sangue , Diabetes Mellitus Tipo 2/prevenção & controle , Registros Eletrônicos de Saúde , Inglaterra , Exercício Físico , Hemoglobinas Glicadas/análise , Promoção da Saúde/métodos , Promoção da Saúde/normas , Estilo de Vida , Lipídeos/sangue , Programas Nacionais de Saúde/normas , Estado Pré-Diabético/sangue , Estado Pré-Diabético/prevenção & controle , Atenção Primária à Saúde
11.
Trends Genet ; 40(5): 379-380, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38643035

RESUMO

Lennon et al. recently proposed a clinical polygenic score (PGS) pipeline as part of the Electronic Medical Records and Genomics (eMERGE) network initiative. In this spotlight article we discuss the broader context for the use of PGS in preventive medicine and highlight key limitations and challenges facing their inclusion in prediction models.


Assuntos
Herança Multifatorial , Herança Multifatorial/genética , Humanos , Genômica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Registros Eletrônicos de Saúde , Medicina Preventiva
12.
N Engl J Med ; 390(13): 1196-1206, 2024 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-38598574

RESUMO

BACKGROUND: Despite the availability of effective therapies for patients with chronic kidney disease, type 2 diabetes, and hypertension (the kidney-dysfunction triad), the results of large-scale trials examining the implementation of guideline-directed therapy to reduce the risk of death and complications in this population are lacking. METHODS: In this open-label, cluster-randomized trial, we assigned 11,182 patients with the kidney-dysfunction triad who were being treated at 141 primary care clinics either to receive an intervention that used a personalized algorithm (based on the patient's electronic health record [EHR]) to identify patients and practice facilitators to assist providers in delivering guideline-based interventions or to receive usual care. The primary outcome was hospitalization for any cause at 1 year. Secondary outcomes included emergency department visits, readmissions, cardiovascular events, dialysis, and death. RESULTS: We assigned 71 practices (enrolling 5690 patients) to the intervention group and 70 practices (enrolling 5492 patients) to the usual-care group. The hospitalization rate at 1 year was 20.7% (95% confidence interval [CI], 19.7 to 21.8) in the intervention group and 21.1% (95% CI, 20.1 to 22.2) in the usual-care group (between-group difference, 0.4 percentage points; P = 0.58). The risks of emergency department visits, readmissions, cardiovascular events, dialysis, or death from any cause were similar in the two groups. The risk of adverse events was also similar in the trial groups, except for acute kidney injury, which was observed in more patients in the intervention group (12.7% vs. 11.3%). CONCLUSIONS: In this pragmatic trial involving patients with the triad of chronic kidney disease, type 2 diabetes, and hypertension, the use of an EHR-based algorithm and practice facilitators embedded in primary care clinics did not translate into reduced hospitalization at 1 year. (Funded by the National Institutes of Health and others; ICD-Pieces ClinicalTrials.gov number, NCT02587936.).


Assuntos
Diabetes Mellitus Tipo 2 , Hospitalização , Hipertensão , Insuficiência Renal Crônica , Humanos , Diabetes Mellitus Tipo 2/epidemiologia , Diabetes Mellitus Tipo 2/terapia , Hospitalização/estatística & dados numéricos , Hipertensão/epidemiologia , Hipertensão/terapia , Diálise Renal , Insuficiência Renal Crônica/epidemiologia , Insuficiência Renal Crônica/terapia , Medicina de Precisão , Registros Eletrônicos de Saúde , Algoritmos , Atenção Primária à Saúde/estatística & dados numéricos
13.
Nature ; 592(7855): 629-633, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33828294

RESUMO

There is a growing focus on making clinical trials more inclusive but the design of trial eligibility criteria remains challenging1-3. Here we systematically evaluate the effect of different eligibility criteria on cancer trial populations and outcomes with real-world data using the computational framework of Trial Pathfinder. We apply Trial Pathfinder to emulate completed trials of advanced non-small-cell lung cancer using data from a nationwide database of electronic health records comprising 61,094 patients with advanced non-small-cell lung cancer. Our analyses reveal that many common criteria, including exclusions based on several laboratory values, had a minimal effect on the trial hazard ratios. When we used a data-driven approach to broaden restrictive criteria, the pool of eligible patients more than doubled on average and the hazard ratio of the overall survival decreased by an average of 0.05. This suggests that many patients who were not eligible under the original trial criteria could potentially benefit from the treatments. We further support our findings through analyses of other types of cancer and patient-safety data from diverse clinical trials. Our data-driven methodology for evaluating eligibility criteria can facilitate the design of more-inclusive trials while maintaining safeguards for patient safety.


Assuntos
Inteligência Artificial , Ensaios Clínicos como Assunto/métodos , Conjuntos de Dados como Assunto , Oncologia , Segurança do Paciente , Seleção de Pacientes , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Técnicas de Laboratório Clínico , Registros Eletrônicos de Saúde/estatística & dados numéricos , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Segurança do Paciente/normas , Seleção de Pacientes/ética , Modelos de Riscos Proporcionais , Reprodutibilidade dos Testes
14.
Nature ; 594(7862): 259-264, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33887749

RESUMO

The acute clinical manifestations of COVID-19 have been well characterized1,2, but the post-acute sequelae of this disease have not been comprehensively described. Here we use the national healthcare databases of the US Department of Veterans Affairs to systematically and comprehensively identify 6-month incident sequelae-including diagnoses, medication use and laboratory abnormalities-in patients with COVID-19 who survived for at least 30 days after diagnosis. We show that beyond the first 30 days of illness, people with COVID-19 exhibit a higher risk of death and use of health resources. Our high-dimensional approach identifies incident sequelae in the respiratory system, as well as several other sequelae that include nervous system and neurocognitive disorders, mental health disorders, metabolic disorders, cardiovascular disorders, gastrointestinal disorders, malaise, fatigue, musculoskeletal pain and anaemia. We show increased incident use of several therapeutic agents-including pain medications (opioids and non-opioids) as well as antidepressant, anxiolytic, antihypertensive and oral hypoglycaemic agents-as well as evidence of laboratory abnormalities in several organ systems. Our analysis of an array of prespecified outcomes reveals a risk gradient that increases according to the severity of the acute COVID-19 infection (that is, whether patients were not hospitalized, hospitalized or admitted to intensive care). Our findings show that a substantial burden of health loss that spans pulmonary and several extrapulmonary organ systems is experienced by patients who survive after the acute phase of COVID-19. These results will help to inform health system planning and the development of multidisciplinary care strategies to reduce chronic health loss among individuals with COVID-19.


Assuntos
COVID-19/complicações , SARS-CoV-2/patogenicidade , COVID-19/diagnóstico , COVID-19/fisiopatologia , COVID-19/psicologia , Estudos de Coortes , Bases de Dados Factuais , Conjuntos de Dados como Assunto , Registros Eletrônicos de Saúde , Feminino , Hospitalização/estatística & dados numéricos , Humanos , Influenza Humana/diagnóstico , Influenza Humana/tratamento farmacológico , Influenza Humana/fisiopatologia , Masculino , Pacientes Ambulatoriais/psicologia , Pacientes Ambulatoriais/estatística & dados numéricos , Risco , Fatores de Tempo , Estados Unidos , United States Department of Veterans Affairs , Síndrome de COVID-19 Pós-Aguda , Tratamento Farmacológico da COVID-19
15.
Annu Rev Pharmacol Toxicol ; 63: 65-76, 2023 01 20.
Artigo em Inglês | MEDLINE | ID: mdl-36662581

RESUMO

A long-standing recognition that information from human genetics studies has the potential to accelerate drug discovery has led to decades of research on how to leverage genetic and phenotypic information for drug discovery. Established simple and advanced statistical methods that allow the simultaneous analysis of genotype and clinical phenotype data by genome- and phenome-wide analyses, colocalization analyses with quantitative trait loci data from transcriptomics and proteomics data sets from different tissues, and Mendelian randomization are essential tools for drug development in the postgenomic era. Numerous studies have demonstrated how genomic data provide opportunities for the identification of new drug targets, the repurposing of drugs, and drug safety analyses. With an increase in the number of biobanks that enable linking in-depth omics data with rich repositories of phenotypic traits via electronic health records, more powerful ways for the evaluation and validation of drug targets will continue to expand across different disciplines of clinical research.


Assuntos
Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Fenótipo , Descoberta de Drogas
16.
Am J Hum Genet ; 110(7): 1021-1033, 2023 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-37343562

RESUMO

Two major goals of the Electronic Medical Record and Genomics (eMERGE) Network are to learn how best to return research results to patient/participants and the clinicians who care for them and also to assess the impact of placing these results in clinical care. Yet since its inception, the Network has confronted a host of challenges in achieving these goals, many of which had ethical, legal, or social implications (ELSIs) that required consideration. Here, we share impediments we encountered in recruiting participants, returning results, and assessing their impact, all of which affected our ability to achieve the goals of eMERGE, as well as the steps we took to attempt to address these obstacles. We divide the domains in which we experienced challenges into four broad categories: (1) study design, including recruitment of more diverse groups; (2) consent; (3) returning results to participants and their health care providers (HCPs); and (4) assessment of follow-up care of participants and measuring the impact of research on participants and their families. Since most phases of eMERGE have included children as well as adults, we also address the particular ELSI posed by including pediatric populations in this research. We make specific suggestions for improving translational genomic research to ensure that future projects can effectively return results and assess their impact on patient/participants and providers if the goals of genomic-informed medicine are to be achieved.


Assuntos
Registros Eletrônicos de Saúde , Genômica , Criança , Adulto , Humanos , Genoma , Pesquisa Translacional Biomédica , Grupos Populacionais
17.
Nat Rev Genet ; 21(8): 493-502, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32235907

RESUMO

Accurate prediction of disease risk based on the genetic make-up of an individual is essential for effective prevention and personalized treatment. Nevertheless, to date, individual genetic variants from genome-wide association studies have achieved only moderate prediction of disease risk. The aggregation of genetic variants under a polygenic model shows promising improvements in prediction accuracies. Increasingly, electronic health records (EHRs) are being linked to patient genetic data in biobanks, which provides new opportunities for developing and applying polygenic risk scores in the clinic, to systematically examine and evaluate patient susceptibilities to disease. However, the heterogeneous nature of EHR data brings forth many practical challenges along every step of designing and implementing risk prediction strategies. In this Review, we present the unique considerations for using genotype and phenotype data from biobank-linked EHRs for polygenic risk prediction.


Assuntos
Registros Eletrônicos de Saúde , Estudos de Associação Genética , Predisposição Genética para Doença , Herança Multifatorial , Algoritmos , Biologia Computacional/métodos , Estudo de Associação Genômica Ampla , Genômica/métodos , Genótipo , Humanos , Fenótipo , Reprodutibilidade dos Testes , Medição de Risco , Fatores de Risco
18.
Pharmacol Rev ; 75(4): 714-738, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36931724

RESUMO

Natural language processing (NLP) is an area of artificial intelligence that applies information technologies to process the human language, understand it to a certain degree, and use it in various applications. This area has rapidly developed in the past few years and now employs modern variants of deep neural networks to extract relevant patterns from large text corpora. The main objective of this work is to survey the recent use of NLP in the field of pharmacology. As our work shows, NLP is a highly relevant information extraction and processing approach for pharmacology. It has been used extensively, from intelligent searches through thousands of medical documents to finding traces of adversarial drug interactions in social media. We split our coverage into five categories to survey modern NLP: methodology, commonly addressed tasks, relevant textual data, knowledge bases, and useful programming libraries. We split each of the five categories into appropriate subcategories, describe their main properties and ideas, and summarize them in a tabular form. The resulting survey presents a comprehensive overview of the area, useful to practitioners and interested observers. SIGNIFICANCE STATEMENT: The main objective of this work is to survey the recent use of NLP in the field of pharmacology in order to provide a comprehensive overview of the current state in the area after the rapid developments that occurred in the past few years. The resulting survey will be useful to practitioners and interested observers in the domain.


Assuntos
Inteligência Artificial , Processamento de Linguagem Natural , Humanos , Armazenamento e Recuperação da Informação , Registros Eletrônicos de Saúde , Registros
19.
Am J Hum Genet ; 109(9): 1591-1604, 2022 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-35998640

RESUMO

Diagnosis for rare genetic diseases often relies on phenotype-driven methods, which hinge on the accuracy and completeness of the rare disease phenotypes in the underlying annotation knowledgebase. Existing knowledgebases are often manually curated with additional annotations found in published case reports. Despite their potential, real-world data such as electronic health records (EHRs) have not been fully exploited to derive rare disease annotations. Here, we present open annotation for rare diseases (OARD), a real-world-data-derived resource with annotation for rare-disease-related phenotypes. This resource is derived from the EHRs of two academic health institutions containing more than 10 million individuals spanning wide age ranges and different disease subgroups. By leveraging ontology mapping and advanced natural-language-processing (NLP) methods, OARD automatically and efficiently extracts concepts for both rare diseases and their phenotypic traits from billing codes and lab tests as well as over 100 million clinical narratives. The rare disease prevalence derived by OARD is highly correlated with those annotated in the original rare disease knowledgebase. By performing association analysis, we identified more than 1 million novel disease-phenotype association pairs that were previously missed by human annotation, and >60% were confirmed true associations via manual review of a list of sampled pairs. Compared to the manual curated annotation, OARD is 100% data driven and its pipeline can be shared across different institutions. By supporting privacy-preserving sharing of aggregated summary statistics, such as term frequencies and disease-phenotype associations, it fills an important gap to facilitate data-driven research in the rare disease community.


Assuntos
Processamento de Linguagem Natural , Doenças Raras , Registros Eletrônicos de Saúde , Humanos , Fenótipo , Doenças Raras/genética
20.
Am J Hum Genet ; 109(3): 433-445, 2022 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-35196515

RESUMO

Biobanks linked to massive, longitudinal electronic health record (EHR) data make numerous new genetic research questions feasible. One among these is the study of biomarker trajectories. For example, high blood pressure measurements over visits strongly predict stroke onset, and consistently high fasting glucose and Hb1Ac levels define diabetes. Recent research reveals that not only the mean level of biomarker trajectories but also their fluctuations, or within-subject (WS) variability, are risk factors for many diseases. Glycemic variation, for instance, is recently considered an important clinical metric in diabetes management. It is crucial to identify the genetic factors that shift the mean or alter the WS variability of a biomarker trajectory. Compared to traditional cross-sectional studies, trajectory analysis utilizes more data points and captures a complete picture of the impact of time-varying factors, including medication history and lifestyle. Currently, there are no efficient tools for genome-wide association studies (GWASs) of biomarker trajectories at the biobank scale, even for just mean effects. We propose TrajGWAS, a linear mixed effect model-based method for testing genetic effects that shift the mean or alter the WS variability of a biomarker trajectory. It is scalable to biobank data with 100,000 to 1,000,000 individuals and many longitudinal measurements and robust to distributional assumptions. Simulation studies corroborate that TrajGWAS controls the type I error rate and is powerful. Analysis of eleven biomarkers measured longitudinally and extracted from UK Biobank primary care data for more than 150,000 participants with 1,800,000 observations reveals loci that significantly alter the mean or WS variability.


Assuntos
Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Biomarcadores , Estudos Transversais , Registros Eletrônicos de Saúde , Humanos , Estudos Longitudinais
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa