Pesquisa | Portal Regional da BVS

Algorithmic Identification of Treatment-Emergent Adverse Events From Clinical Notes Using Large Language Models: A Pilot Study in Inflammatory Bowel Disease.

Silverman, Anna L; Sushil, Madhumita; Bhasuran, Balu; Ludwig, Dana; Buchanan, James; Racz, Rebecca; Parakala, Mahalakshmi; El-Kamary, Samer; Ahima, Ohenewaa; Belov, Artur; Choi, Lauren; Billings, Monisha; Li, Yan; Habal, Nadia; Liu, Qi; Tiwari, Jawahar; Butte, Atul J; Rudrapatna, Vivek A.

Clin Pharmacol Ther ; 115(6): 1391-1399, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38459719

RESUMO

Outpatient clinical notes are a rich source of information regarding drug safety. However, data in these notes are currently underutilized for pharmacovigilance due to methodological limitations in text mining. Large language models (LLMs) like Bidirectional Encoder Representations from Transformers (BERT) have shown progress in a range of natural language processing tasks but have not yet been evaluated on adverse event (AE) detection. We adapted a new clinical LLM, University of California - San Francisco (UCSF)-BERT, to identify serious AEs (SAEs) occurring after treatment with a non-steroid immunosuppressant for inflammatory bowel disease (IBD). We compared this model to other language models that have previously been applied to AE detection. We annotated 928 outpatient IBD notes corresponding to 928 individual patients with IBD for all SAE-associated hospitalizations occurring after treatment with a non-steroid immunosuppressant. These notes contained 703 SAEs in total, the most common of which was failure of intended efficacy. Out of eight candidate models, UCSF-BERT achieved the highest numerical performance on identifying drug-SAE pairs from this corpus (accuracy 88-92%, macro F1 61-68%), with 5-10% greater accuracy than previously published models. UCSF-BERT was significantly superior at identifying hospitalization events emergent to medication use (P < 0.01). LLMs like UCSF-BERT achieve numerically superior accuracy on the challenging task of SAE detection from clinical notes compared with prior methods. Future work is needed to adapt this methodology to improve model performance and evaluation using multicenter data and newer architectures like Generative pre-trained transformer (GPT). Our findings support the potential value of using large language models to enhance pharmacovigilance.

Assuntos

Algoritmos , Imunossupressores , Doenças Inflamatórias Intestinais , Processamento de Linguagem Natural , Farmacovigilância , Humanos , Projetos Piloto , Doenças Inflamatórias Intestinais/tratamento farmacológico , Imunossupressores/efeitos adversos , Mineração de Dados/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/diagnóstico , Sistemas de Notificação de Reações Adversas a Medicamentos , Registros Eletrônicos de Saúde , Feminino , Masculino , Hospitalização/estatística & dados numéricos

Algorithmic identification of treatment-emergent adverse events from clinical notes using large language models: a pilot study in inflammatory bowel disease.

medRxiv ; 2023 Sep 08.

Artigo em Inglês | MEDLINE | ID: mdl-37732220

RESUMO

Background and Aims: Outpatient clinical notes are a rich source of information regarding drug safety. However, data in these notes are currently underutilized for pharmacovigilance due to methodological limitations in text mining. Large language models (LLM) like BERT have shown progress in a range of natural language processing tasks but have not yet been evaluated on adverse event detection. Methods: We adapted a new clinical LLM, UCSF BERT, to identify serious adverse events (SAEs) occurring after treatment with a non-steroid immunosuppressant for inflammatory bowel disease (IBD). We compared this model to other language models that have previously been applied to AE detection. Results: We annotated 928 outpatient IBD notes corresponding to 928 individual IBD patients for all SAE-associated hospitalizations occurring after treatment with a non-steroid immunosuppressant. These notes contained 703 SAEs in total, the most common of which was failure of intended efficacy. Out of 8 candidate models, UCSF BERT achieved the highest numerical performance on identifying drug-SAE pairs from this corpus (accuracy 88-92%, macro F1 61-68%), with 5-10% greater accuracy than previously published models. UCSF BERT was significantly superior at identifying hospitalization events emergent to medication use (p < 0.01). Conclusions: LLMs like UCSF BERT achieve numerically superior accuracy on the challenging task of SAE detection from clinical notes compared to prior methods. Future work is needed to adapt this methodology to improve model performance and evaluation using multi-center data and newer architectures like GPT. Our findings support the potential value of using large language models to enhance pharmacovigilance.

Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.

Norgeot, Beau; Muenzen, Kathleen; Peterson, Thomas A; Fan, Xuancheng; Glicksberg, Benjamin S; Schenk, Gundolf; Rutenberg, Eugenia; Oskotsky, Boris; Sirota, Marina; Yazdany, Jinoos; Schmajuk, Gabriela; Ludwig, Dana; Goldstein, Theodore; Butte, Atul J.

NPJ Digit Med ; 3: 57, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32337372

RESUMO

There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected Health Information filter"). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods.

Scalable and accurate deep learning with electronic health records.

Rajkomar, Alvin; Oren, Eyal; Chen, Kai; Dai, Andrew M; Hajaj, Nissan; Hardt, Michaela; Liu, Peter J; Liu, Xiaobing; Marcus, Jake; Sun, Mimi; Sundberg, Patrik; Yee, Hector; Zhang, Kun; Zhang, Yi; Flores, Gerardo; Duggan, Gavin E; Irvine, Jamie; Le, Quoc; Litsch, Kurt; Mossin, Alexander; Tansuwan, Justin; Wang, De; Wexler, James; Wilson, Jimbo; Ludwig, Dana; Volchenboum, Samuel L; Chou, Katherine; Pearson, Michael; Madabushi, Srinivasan; Shah, Nigam H; Butte, Atul J; Howell, Michael D; Cui, Claire; Corrado, Greg S; Dean, Jeffrey.

NPJ Digit Med ; 1: 18, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-31304302

RESUMO

Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient's chart.

Using health-system-wide data to understand hepatitis B virus prophylaxis and reactivation outcomes in patients receiving rituximab.

Schmajuk, Gabriela; Tonner, Chris; Trupin, Laura; Li, Jing; Sarkar, Urmimala; Ludwig, Dana; Shiboski, Stephen; Sirota, Marina; Dudley, R Adams; Murray, Sara; Yazdany, Jinoos.

Medicine (Baltimore) ; 96(13): e6528, 2017 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-28353614

RESUMO

Hepatitis B virus (HBV) reactivation in the setting of rituximab use is a potentially fatal but preventable safety event. The rate of HBV screening and proportion of patients at risk who receive antiviral prophylaxis in patients initiating rituximab is unknown.We analyzed electronic health record (EHR) data from 2 health systems, a university center and a safety net health system, including diagnosis grouper codes, problem lists, medications, laboratory results, procedures codes, clinical encounter notes, and scanned documents. We identified all patients who received rituximab between 6/1/2012 and 1/1/2016. We calculated the proportion of rituximab users with inadequate screening for HBV according to the Centers for Disease Control guidelines for detecting latent HBV infection before their first rituximab infusion during the study period. We also assessed the proportion of patients with positive hepatitis B screening tests who were prescribed antiviral prophylaxis. Finally, we characterized safety failures and adverse events.We included 926 patients from the university and 132 patients from the safety net health system. Sixty-one percent of patients from the university had adequate screening for HBV compared with 90% from the safety net. Among patients at risk for reactivation based on results of HBV testing, 66% and 92% received antiviral prophylaxis at the university and safety net, respectively.We found wide variations in hepatitis B screening practices among patients receiving rituximab, resulting in unnecessary risks to patients. Interventions should be developed to improve patient safety procedures in this high-risk patient population.

Assuntos

Hepatite B/prevenção & controle , Fatores Imunológicos/efeitos adversos , Rituximab/efeitos adversos , Prevenção Secundária , Adolescente , Adulto , Idoso , Registros Eletrônicos de Saúde , Feminino , Hepatite B/induzido quimicamente , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Adulto Jovem

Characterizing Race/Ethnicity and Genetic Ancestry for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort.

Banda, Yambazi; Kvale, Mark N; Hoffmann, Thomas J; Hesselson, Stephanie E; Ranatunga, Dilrini; Tang, Hua; Sabatti, Chiara; Croen, Lisa A; Dispensa, Brad P; Henderson, Mary; Iribarren, Carlos; Jorgenson, Eric; Kushi, Lawrence H; Ludwig, Dana; Olberg, Diane; Quesenberry, Charles P; Rowell, Sarah; Sadler, Marianne; Sakoda, Lori C; Sciortino, Stanley; Shen, Ling; Smethurst, David; Somkin, Carol P; Van Den Eeden, Stephen K; Walter, Lawrence; Whitmer, Rachel A; Kwok, Pui-Yan; Schaefer, Catherine; Risch, Neil.

Genetics ; 200(4): 1285-95, 2015 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-26092716

RESUMO

Using genome-wide genotypes, we characterized the genetic structure of 103,006 participants in the Kaiser Permanente Northern California multi-ethnic Genetic Epidemiology Research on Adult Health and Aging Cohort and analyzed the relationship to self-reported race/ethnicity. Participants endorsed any of 23 race/ethnicity/nationality categories, which were collapsed into seven major race/ethnicity groups. By self-report the cohort is 80.8% white and 19.2% minority; 93.8% endorsed a single race/ethnicity group, while 6.2% endorsed two or more. Principal component (PC) and admixture analyses were generally consistent with prior studies. Approximately 17% of subjects had genetic ancestry from more than one continent, and 12% were genetically admixed, considering only nonadjacent geographical origins. Self-reported whites were spread on a continuum along the first two PCs, indicating extensive mixing among European nationalities. Self-identified East Asian nationalities correlated with genetic clustering, consistent with extensive endogamy. Individuals of mixed East Asian-European genetic ancestry were easily identified; we also observed a modest amount of European genetic ancestry in individuals self-identified as Filipinos. Self-reported African Americans and Latinos showed extensive European and African genetic ancestry, and Native American genetic ancestry for the latter. Among 3741 genetically identified parent-child pairs, 93% were concordant for self-reported race/ethnicity; among 2018 genetically identified full-sib pairs, 96% were concordant; the lower rate for parent-child pairs was largely due to intermarriage. The parent-child pairs revealed a trend toward increasing exogamy over time; the presence in the cohort of individuals endorsing multiple race/ethnicity categories creates interesting challenges and future opportunities for genetic epidemiologic studies.

Assuntos

Envelhecimento/genética , Etnicidade/genética , Genômica , Saúde , Grupos Raciais/genética , Adulto , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Epidemiologia Molecular , Linhagem , Análise de Componente Principal

Automated Assay of Telomere Length Measurement and Informatics for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort.

Lapham, Kyle; Kvale, Mark N; Lin, Jue; Connell, Sheryl; Croen, Lisa A; Dispensa, Brad P; Fang, Lynn; Hesselson, Stephanie; Hoffmann, Thomas J; Iribarren, Carlos; Jorgenson, Eric; Kushi, Lawrence H; Ludwig, Dana; Matsuguchi, Tetsuya; McGuire, William B; Miles, Sunita; Quesenberry, Charles P; Rowell, Sarah; Sadler, Marianne; Sakoda, Lori C; Smethurst, David; Somkin, Carol P; Van Den Eeden, Stephen K; Walter, Lawrence; Whitmer, Rachel A; Kwok, Pui-Yan; Risch, Neil; Schaefer, Catherine; Blackburn, Elizabeth H.

Genetics ; 200(4): 1061-72, 2015 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-26092717

RESUMO

The Kaiser Permanente Research Program on Genes, Environment, and Health (RPGEH) Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort includes DNA specimens extracted from saliva samples of 110,266 individuals. Because of its relationship to aging, telomere length measurement was considered an important biomarker to develop on these subjects. To assay relative telomere length (TL) on this large cohort over a short time period, we created a novel high throughput robotic system for TL analysis and informatics. Samples were run in triplicate, along with control samples, in a randomized design. As part of quality control, we determined the within-sample variability and employed thresholds for the elimination of outlying measurements. Of 106,902 samples assayed, 105,539 (98.7%) passed all quality control (QC) measures. As expected, TL in general showed a decline with age and a sex difference. While telomeres showed a negative correlation with age up to 75 years, in those older than 75 years, age positively correlated with longer telomeres, indicative of an association of longer telomeres with more years of survival in those older than 75. Furthermore, while females in general had longer telomeres than males, this difference was significant only for those older than age 50. An additional novel finding was that the variance of TL between individuals increased with age. This study establishes reliable assay and analysis methodologies for measurement of TL in large, population-based human studies. The GERA cohort represents the largest currently available such resource, linked to comprehensive electronic health and genotype data for analysis.

Assuntos

Envelhecimento/genética , Biologia Computacional/métodos , Saúde , Telômero/genética , Adulto , Automação , Estudos de Coortes , Feminino , Genótipo , Humanos , Leucócitos Mononucleares/metabolismo , Masculino , Epidemiologia Molecular , Caracteres Sexuais

Genotyping Informatics and Quality Control for 100,000 Subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort.

Kvale, Mark N; Hesselson, Stephanie; Hoffmann, Thomas J; Cao, Yang; Chan, David; Connell, Sheryl; Croen, Lisa A; Dispensa, Brad P; Eshragh, Jasmin; Finn, Andrea; Gollub, Jeremy; Iribarren, Carlos; Jorgenson, Eric; Kushi, Lawrence H; Lao, Richard; Lu, Yontao; Ludwig, Dana; Mathauda, Gurpreet K; McGuire, William B; Mei, Gangwu; Miles, Sunita; Mittman, Michael; Patil, Mohini; Quesenberry, Charles P; Ranatunga, Dilrini; Rowell, Sarah; Sadler, Marianne; Sakoda, Lori C; Shapero, Michael; Shen, Ling; Shenoy, Tanu; Smethurst, David; Somkin, Carol P; Van Den Eeden, Stephen K; Walter, Lawrence; Wan, Eunice; Webster, Teresa; Whitmer, Rachel A; Wong, Simon; Zau, Chia; Zhan, Yiping; Schaefer, Catherine; Kwok, Pui-Yan; Risch, Neil.

Genetics ; 200(4): 1051-60, 2015 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-26092718

RESUMO

The Kaiser Permanente (KP) Research Program on Genes, Environment and Health (RPGEH), in collaboration with the University of California-San Francisco, undertook genome-wide genotyping of >100,000 subjects that constitute the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. The project, which generated >70 billion genotypes, represents the first large-scale use of the Affymetrix Axiom Genotyping Solution. Because genotyping took place over a short 14-month period, creating a near-real-time analysis pipeline for experimental assay quality control and final optimized analyses was critical. Because of the multi-ethnic nature of the cohort, four different ethnic-specific arrays were employed to enhance genome-wide coverage. All assays were performed on DNA extracted from saliva samples. To improve sample call rates and significantly increase genotype concordance, we partitioned the cohort into disjoint packages of plates with similar assay contexts. Using strict QC criteria, the overall genotyping success rate was 103,067 of 109,837 samples assayed (93.8%), with a range of 92.1-95.4% for the four different arrays. Similarly, the SNP genotyping success rate ranged from 98.1 to 99.4% across the four arrays, the variation depending mostly on how many SNPs were included as single copy vs. double copy on a particular array. The high quality and large scale of genotype data created on this cohort, in conjunction with comprehensive longitudinal data from the KP electronic health records of participants, will enable a broad range of highly powered genome-wide association studies on a diversity of traits and conditions.

Assuntos

Envelhecimento/genética , Biologia Computacional/métodos , Técnicas de Genotipagem/métodos , Saúde , Adulto , Estudos de Coortes , Feminino , Humanos , Masculino , Epidemiologia Molecular , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo Único , Controle de Qualidade

Next generation genome-wide association tool: design and coverage of a high-throughput European-optimized SNP array.

Hoffmann, Thomas J; Kvale, Mark N; Hesselson, Stephanie E; Zhan, Yiping; Aquino, Christine; Cao, Yang; Cawley, Simon; Chung, Elaine; Connell, Sheryl; Eshragh, Jasmin; Ewing, Marcia; Gollub, Jeremy; Henderson, Mary; Hubbell, Earl; Iribarren, Carlos; Kaufman, Jay; Lao, Richard Z; Lu, Yontao; Ludwig, Dana; Mathauda, Gurpreet K; McGuire, William; Mei, Gangwu; Miles, Sunita; Purdy, Matthew M; Quesenberry, Charles; Ranatunga, Dilrini; Rowell, Sarah; Sadler, Marianne; Shapero, Michael H; Shen, Ling; Shenoy, Tanushree R; Smethurst, David; Van den Eeden, Stephen K; Walter, Larry; Wan, Eunice; Wearley, Reid; Webster, Teresa; Wen, Christopher C; Weng, Li; Whitmer, Rachel A; Williams, Alan; Wong, Simon C; Zau, Chia; Finn, Andrea; Schaefer, Catherine; Kwok, Pui-Yan; Risch, Neil.

Genomics ; 98(2): 79-89, 2011 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-21565264

RESUMO

The success of genome-wide association studies has paralleled the development of efficient genotyping technologies. We describe the development of a next-generation microarray based on the new highly-efficient Affymetrix Axiom genotyping technology that we are using to genotype individuals of European ancestry from the Kaiser Permanente Research Program on Genes, Environment and Health (RPGEH). The array contains 674,517 SNPs, and provides excellent genome-wide as well as gene-based and candidate-SNP coverage. Coverage was calculated using an approach based on imputation and cross validation. Preliminary results for the first 80,301 saliva-derived DNA samples from the RPGEH demonstrate very high quality genotypes, with sample success rates above 94% and over 98% of successful samples having SNP call rates exceeding 98%. At steady state, we have produced 462 million genotypes per week for each Axiom system. The new array provides a valuable addition to the repertoire of tools for large scale genome-wide association studies.

Assuntos

Estudo de Associação Genômica Ampla/métodos , Ensaios de Triagem em Larga Escala , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Polimorfismo de Nucleotídeo Único/genética , População Branca/genética , Humanos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA