Search | VHL Regional Portal

1.

Deep learning on electronic medical records identifies distinct subphenotypes of diabetic kidney disease driven by genetic variations in the Rho pathway.

Paranjpe, Ishan; Wang, Xuan; Anandakrishnan, Nanditha; Haydak, Jonathan C; Van Vleck, Tielman; DeFronzo, Stefanie; Li, Zhengzhe; Mendoza, Anthony; Liu, Ruijie; Fu, Jia; Forrest, Iain; Zhou, Weibin; Lee, Kyung; O'Hagan, Ross; Dellepiane, Sergio; Menon, Kartikeya M; Gulamali, Faris; Kamat, Samir; Gusella, Gabriele Luca; Charney, Alexander W; Hofer, Ira; Cho, Judy H; Do, Ron; Glicksberg, Benjamin S; He, John C; Nadkarni, Girish N; Azeloglu, Evren U.

medRxiv ; 2023 Sep 07.

Article in English | MEDLINE | ID: mdl-37732187

ABSTRACT

Kidney disease affects 50% of all diabetic patients; however, prediction of disease progression has been challenging due to inherent disease heterogeneity. We use deep learning to identify novel genetic signatures prognostically associated with outcomes. Using autoencoders and unsupervised clustering of electronic health record data on 1,372 diabetic kidney disease patients, we establish two clusters with differential prevalence of end-stage kidney disease. Exome-wide associations identify a novel variant in ARHGEF18, a Rho guanine exchange factor specifically expressed in glomeruli. Overexpression of ARHGEF18 in human podocytes leads to impairments in focal adhesion architecture, cytoskeletal dynamics, cellular motility, and RhoA/Rac1 activation. Mutant GEF18 is resistant to ubiquitin mediated degradation leading to pathologically increased protein levels. Our findings uncover the first known disease-causing genetic variant that affects protein stability of a cytoskeletal regulator through impaired degradation, a potentially novel class of expression quantitative trait loci that can be therapeutically targeted.

2.

Identifying high-impact variants and genes in exomes of Ashkenazi Jewish inflammatory bowel disease patients.

Wu, Yiming; Gettler, Kyle; Kars, Meltem Ece; Giri, Mamta; Li, Dalin; Bayrak, Cigdem Sevim; Zhang, Peng; Jain, Aayushee; Maffucci, Patrick; Sabic, Ksenija; Van Vleck, Tielman; Nadkarni, Girish; Denson, Lee A; Ostrer, Harry; Levine, Adam P; Schiff, Elena R; Segal, Anthony W; Kugathasan, Subra; Stenson, Peter D; Cooper, David N; Philip Schumm, L; Snapper, Scott; Daly, Mark J; Haritunians, Talin; Duerr, Richard H; Silverberg, Mark S; Rioux, John D; Brant, Steven R; McGovern, Dermot P B; Cho, Judy H; Itan, Yuval.

Nat Commun ; 14(1): 2256, 2023 04 20.

Article in English | MEDLINE | ID: mdl-37080976

ABSTRACT

Inflammatory bowel disease (IBD) is a group of chronic digestive tract inflammatory conditions whose genetic etiology is still poorly understood. The incidence of IBD is particularly high among Ashkenazi Jews. Here, we identify 8 novel and plausible IBD-causing genes from the exomes of 4453 genetically identified Ashkenazi Jewish IBD cases (1734) and controls (2719). Various biological pathway analyses are performed, along with bulk and single-cell RNA sequencing, to demonstrate the likely physiological relatedness of the novel genes to IBD. Importantly, we demonstrate that the rare and high impact genetic architecture of Ashkenazi Jewish adult IBD displays significant overlap with very early onset-IBD genetics. Moreover, by performing biobank phenome-wide analyses, we find that IBD genes have pleiotropic effects that involve other immune responses. Finally, we show that polygenic risk score analyses based on genome-wide high impact variants have high power to predict IBD susceptibility.

Subject(s)

Inflammatory Bowel Diseases , Jews , Adult , Humans , Jews/genetics , Exome/genetics , Inflammatory Bowel Diseases/genetics , Risk Assessment , Genetic Predisposition to Disease

3.

Natural Language Processing Basics.

Arivazhagan, Naveen; Van Vleck, Tielman T.

Clin J Am Soc Nephrol ; 18(3): 400-401, 2023 03 01.

Article in English | MEDLINE | ID: mdl-36763809

Subject(s)

Natural Language Processing

4.

Natural Language Processing in Nephrology.

Van Vleck, Tielman T; Farrell, Douglas; Chan, Lili.

Adv Chronic Kidney Dis ; 29(5): 465-471, 2022 09.

Article in English | MEDLINE | ID: mdl-36253030

ABSTRACT

Unstructured data in the electronic health records contain essential patient information. Natural language processing (NLP), teaching a computer to read, allows us to tap into these data without needing the time and effort of manual chart abstraction. The core first step for all NLP algorithms is preprocessing the text to identify the core words that differentiate the text while filtering out the noise. Traditional NLP uses a rule-based approach, applying grammatical rules to infer meaning from the text. Newer NLP approaches use machine learning/deep learning which can infer meaning without explicitly being programmed. NLP use in nephrology research has focused on identifying distinct disease processes, such as CKD, and extraction of patient-oriented outcomes such as symptoms with high sensitivity. NLP can identify patient features from clinical text associated with acute kidney injury and progression of CKD. Lastly, inclusion of features extracted using NLP improved the performance of risk-prediction models compared to models that only use structured data. Implementation of NLP algorithms has been slow, partially hindered by the lack of external validation of NLP algorithms. However, NLP allows for extraction of key patient characteristics from free text, an infrequently used resource in nephrology.

Subject(s)

Nephrology , Renal Insufficiency, Chronic , Algorithms , Electronic Health Records , Humans , Natural Language Processing , Renal Insufficiency, Chronic/therapy

5.

Identification of discriminative gene-level and protein-level features associated with pathogenic gain-of-function and loss-of-function variants.

Sevim Bayrak, Cigdem; Stein, David; Jain, Aayushee; Chaudhary, Kumardeep; Nadkarni, Girish N; Van Vleck, Tielman T; Puel, Anne; Boisson-Dupuis, Stephanie; Okada, Satoshi; Stenson, Peter D; Cooper, David N; Schlessinger, Avner; Itan, Yuval.

Am J Hum Genet ; 108(12): 2301-2318, 2021 12 02.

Article in English | MEDLINE | ID: mdl-34762822

ABSTRACT

Identifying whether a given genetic mutation results in a gene product with increased (gain-of-function; GOF) or diminished (loss-of-function; LOF) activity is an important step toward understanding disease mechanisms because they may result in markedly different clinical phenotypes. Here, we generated an extensive database of documented germline GOF and LOF pathogenic variants by employing natural language processing (NLP) on the available abstracts in the Human Gene Mutation Database. We then investigated various gene- and protein-level features of GOF and LOF variants and applied machine learning and statistical analyses to identify discriminative features. We found that GOF variants were enriched in essential genes, for autosomal-dominant inheritance, and in protein binding and interaction domains, whereas LOF variants were enriched in singleton genes, for protein-truncating variants, and in protein core regions. We developed a user-friendly web-based interface that enables the extraction of selected subsets from the GOF/LOF database by a broad set of annotated features and downloading of up-to-date versions. These results improve our understanding of how variants affect gene/protein function and may ultimately guide future treatment options.

Subject(s)

Databases, Genetic , Gain of Function Mutation , Loss of Function Mutation , Proteins/genetics , Cloud Computing , Genetic Predisposition to Disease , Genome, Human , Germ-Line Mutation , Humans , Internet-Based Intervention , Machine Learning

6.

Prognostic value of polygenic risk scores for adults with psychosis.

Landi, Isotta; Kaji, Deepak A; Cotter, Liam; Van Vleck, Tielman; Belbin, Gillian; Preuss, Michael; Loos, Ruth J F; Kenny, Eimear; Glicksberg, Benjamin S; Beckmann, Noam D; O'Reilly, Paul; Schadt, Eric E; Achtyes, Eric D; Buckley, Peter F; Lehrer, Douglas; Malaspina, Dolores P; McCarroll, Steven A; Rapaport, Mark H; Fanous, Ayman H; Pato, Michele T; Pato, Carlos N; Bigdeli, Tim B; Nadkarni, Girish N; Charney, Alexander W.

Nat Med ; 27(9): 1576-1581, 2021 09.

Article in English | MEDLINE | ID: mdl-34489608

ABSTRACT

Polygenic risk scores (PRS) summarize genetic liability to a disease at the individual level, and the aim is to use them as biomarkers of disease and poor outcomes in real-world clinical practice. To date, few studies have assessed the prognostic value of PRS relative to standards of care. Schizophrenia (SCZ), the archetypal psychotic illness, is an ideal test case for this because the predictive power of the SCZ PRS exceeds that of most other common diseases. Here, we analyzed clinical and genetic data from two multi-ethnic cohorts totaling 8,541 adults with SCZ and related psychotic disorders, to assess whether the SCZ PRS improves the prediction of poor outcomes relative to clinical features captured in a standard psychiatric interview. For all outcomes investigated, the SCZ PRS did not improve the performance of predictive models, an observation that was generally robust to divergent case ascertainment strategies and the ancestral background of the study participants.

Subject(s)

Genetic Predisposition to Disease , Multifactorial Inheritance/genetics , Psychotic Disorders/genetics , Schizophrenia/genetics , Adult , Female , Genome-Wide Association Study , Humans , Male , Middle Aged , Prognosis , Psychotic Disorders/pathology , Risk Factors , Schizophrenia/pathology

7.

Genome-wide polygenic risk score for retinopathy of type 2 diabetes.

Forrest, Iain S; Chaudhary, Kumardeep; Paranjpe, Ishan; Vy, Ha My T; Marquez-Luna, Carla; Rocheleau, Ghislain; Saha, Aparna; Chan, Lili; Van Vleck, Tielman; Loos, Ruth J F; Cho, Judy; Pasquale, Louis R; Nadkarni, Girish N; Do, Ron.

Hum Mol Genet ; 30(10): 952-960, 2021 05 29.

Article in English | MEDLINE | ID: mdl-33704450

ABSTRACT

Diabetic retinopathy (DR) is a common consequence in type 2 diabetes (T2D) and a leading cause of blindness in working-age adults. Yet, its genetic predisposition is largely unknown. Here, we examined the polygenic architecture underlying DR by deriving and assessing a genome-wide polygenic risk score (PRS) for DR. We evaluated the PRS in 6079 individuals with T2D of European, Hispanic, African and other ancestries from a large-scale multi-ethnic biobank. Main outcomes were PRS association with DR diagnosis, symptoms and complications, and time to diagnosis, and transferability to non-European ancestries. We observed that PRS was significantly associated with DR. A standard deviation increase in PRS was accompanied by an adjusted odds ratio (OR) of 1.12 [95% confidence interval (CI) 1.04-1.20; P = 0.001] for DR diagnosis. When stratified by ancestry, PRS was associated with the highest OR in European ancestry (OR = 1.22, 95% CI 1.02-1.41; P = 0.049), followed by African (OR = 1.15, 95% CI 1.03-1.28; P = 0.028) and Hispanic ancestries (OR = 1.10, 95% CI 1.00-1.10; P = 0.050). Individuals in the top PRS decile had a 1.8-fold elevated risk for DR versus the bottom decile (P = 0.002). Among individuals without DR diagnosis, the top PRS decile had more DR symptoms than the bottom decile (P = 0.008). The PRS was associated with retinal hemorrhage (OR = 1.44, 95% CI 1.03-2.02; P = 0.03) and earlier DR presentation (10% probability of DR by 4 years in the top PRS decile versus 8 years in the bottom decile). These results establish the significant polygenic underpinnings of DR and indicate the need for more diverse ancestries in biobanks to develop multi-ancestral PRS.

Subject(s)

Diabetes Mellitus, Type 2/epidemiology , Diabetic Retinopathy/epidemiology , Genetic Predisposition to Disease , Genome-Wide Association Study , Adult , Aged , Black People/genetics , Diabetes Mellitus, Type 2/complications , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/pathology , Diabetic Retinopathy/complications , Diabetic Retinopathy/genetics , Diabetic Retinopathy/pathology , Hispanic or Latino/genetics , Humans , Middle Aged , Multifactorial Inheritance/genetics , Risk Assessment , Risk Factors , White People/genetics

8.

Comparison of Approaches for Prediction of Renal Replacement Therapy-Free Survival in Patients with Acute Kidney Injury.

Pattharanitima, Pattharawin; Vaid, Akhil; Jaladanki, Suraj K; Paranjpe, Ishan; O'Hagan, Ross; Chauhan, Kinsuk; Van Vleck, Tielman T; Duffy, Aine; Chaudhary, Kumardeep; Glicksberg, Benjamin S; Neyra, Javier A; Coca, Steven G; Chan, Lili; Nadkarni, Girish N.

Blood Purif ; 50(4-5): 621-627, 2021.

Article in English | MEDLINE | ID: mdl-33631752

ABSTRACT

BACKGROUND/AIMS: Acute kidney injury (AKI) in critically ill patients is common, and continuous renal replacement therapy (CRRT) is a preferred mode of renal replacement therapy (RRT) in hemodynamically unstable patients. Prediction of clinical outcomes in patients on CRRT is challenging. We utilized several approaches to predict RRT-free survival (RRTFS) in critically ill patients with AKI requiring CRRT. METHODS: We used the Medical Information Mart for Intensive Care (MIMIC-III) database to identify patients ≥18 years old with AKI on CRRT, after excluding patients who had ESRD on chronic dialysis, and kidney transplantation. We defined RRTFS as patients who were discharged alive and did not require RRT ≥7 days prior to hospital discharge. We utilized all available biomedical data up to CRRT initiation. We evaluated 7 approaches, including logistic regression (LR), random forest (RF), support vector machine (SVM), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), multilayer perceptron (MLP), and MLP with long short-term memory (MLP + LSTM). We evaluated model performance by using area under the receiver operating characteristic (AUROC) curves. RESULTS: Out of 684 patients with AKI on CRRT, 205 (30%) patients had RRTFS. The median age of patients was 63 years and their median Simplified Acute Physiology Score (SAPS) II was 67 (interquartile range 52-84). The MLP + LSTM showed the highest AUROC (95% CI) of 0.70 (0.67-0.73), followed by MLP 0.59 (0.54-0.64), LR 0.57 (0.52-0.62), SVM 0.51 (0.46-0.56), AdaBoost 0.51 (0.46-0.55), RF 0.44 (0.39-0.48), and XGBoost 0.43 (CI 0.38-0.47). CONCLUSIONS: A MLP + LSTM model outperformed other approaches for predicting RRTFS. Performance could be further improved by incorporating other data types.

Subject(s)

Acute Kidney Injury/therapy , Renal Replacement Therapy , Acute Kidney Injury/diagnosis , Age Factors , Aged , Critical Care , Female , Humans , Logistic Models , Machine Learning , Male , Middle Aged , Prognosis

9.

Utilization of Deep Learning for Subphenotype Identification in Sepsis-Associated Acute Kidney Injury.

Chaudhary, Kumardeep; Vaid, Akhil; Duffy, Áine; Paranjpe, Ishan; Jaladanki, Suraj; Paranjpe, Manish; Johnson, Kipp; Gokhale, Avantee; Pattharanitima, Pattharawin; Chauhan, Kinsuk; O'Hagan, Ross; Van Vleck, Tielman; Coca, Steven G; Cooper, Richard; Glicksberg, Benjamin; Bottinger, Erwin P; Chan, Lili; Nadkarni, Girish N.

Clin J Am Soc Nephrol ; 15(11): 1557-1565, 2020 11 06.

Article in English | MEDLINE | ID: mdl-33033164

ABSTRACT

BACKGROUND AND OBJECTIVES: Sepsis-associated AKI is a heterogeneous clinical entity. We aimed to agnostically identify sepsis-associated AKI subphenotypes using deep learning on routinely collected data in electronic health records. DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS: We used the Medical Information Mart for Intensive Care III database, which consists of electronic health record data from intensive care units in a tertiary care hospital in the United States. We included patients ≥18 years with sepsis who developed AKI within 48 hours of intensive care unit admission. We then used deep learning to utilize all available vital signs, laboratory measurements, and comorbidities to identify subphenotypes. Outcomes were mortality 28 days after AKI and dialysis requirement. RESULTS: We identified 4001 patients with sepsis-associated AKI. We utilized 2546 combined features for K-means clustering, identifying three subphenotypes. Subphenotype 1 had 1443 patients, and subphenotype 2 had 1898 patients, whereas subphenotype 3 had 660 patients. Subphenotype 1 had the lowest proportion of liver disease and lowest Simplified Acute Physiology Score II scores compared with subphenotypes 2 and 3. The proportions of patients with CKD were similar between subphenotypes 1 and 3 (15%) but highest in subphenotype 2 (21%). Subphenotype 1 had lower median bilirubin levels, aspartate aminotransferase, and alanine aminotransferase compared with subphenotypes 2 and 3. Patients in subphenotype 1 also had lower median lactate, lactate dehydrogenase, and white blood cell count than patients in subphenotypes 2 and 3. Subphenotype 1 also had lower creatinine and BUN than subphenotypes 2 and 3. Dialysis requirement was lowest in subphenotype 1 (4% versus 7% [subphenotype 2] versus 26% [subphenotype 3]). The mortality 28 days after AKI was lowest in subphenotype 1 (23% versus 35% [subphenotype 2] versus 49% [subphenotype 3]). After adjustment, the adjusted odds ratio for mortality for subphenotype 3, with subphenotype 1 as a reference, was 1.9 (95% confidence interval, 1.5 to 2.4). CONCLUSIONS: Utilizing routinely collected laboratory variables, vital signs, and comorbidities, we were able to identify three distinct subphenotypes of sepsis-associated AKI with differing outcomes.

Subject(s)

Acute Kidney Injury/classification , Acute Kidney Injury/mortality , Deep Learning , Liver Diseases/epidemiology , Sepsis/complications , Acute Kidney Injury/microbiology , Acute Kidney Injury/therapy , Aged , Alanine Transaminase/blood , Bilirubin/blood , Blood Urea Nitrogen , Comorbidity , Creatinine/blood , Databases, Factual , Electronic Health Records , Female , Glutamyl Aminopeptidase/blood , Humans , L-Lactate Dehydrogenase/blood , Lactic Acid/blood , Leukocyte Count , Male , Middle Aged , Phenotype , Prognosis , Renal Dialysis , Simplified Acute Physiology Score , United States/epidemiology

10.

Prevalence and Impact of Myocardial Injury in Patients Hospitalized With COVID-19 Infection.

Lala, Anuradha; Johnson, Kipp W; Januzzi, James L; Russak, Adam J; Paranjpe, Ishan; Richter, Felix; Zhao, Shan; Somani, Sulaiman; Van Vleck, Tielman; Vaid, Akhil; Chaudhry, Fayzan; De Freitas, Jessica K; Fayad, Zahi A; Pinney, Sean P; Levin, Matthew; Charney, Alexander; Bagiella, Emilia; Narula, Jagat; Glicksberg, Benjamin S; Nadkarni, Girish; Mancini, Donna M; Fuster, Valentin.

J Am Coll Cardiol ; 76(5): 533-546, 2020 08 04.

Article in English | MEDLINE | ID: mdl-32517963

ABSTRACT

BACKGROUND: The degree of myocardial injury, as reflected by troponin elevation, and associated outcomes among U.S. hospitalized patients with coronavirus disease-2019 (COVID-19) are unknown. OBJECTIVES: The purpose of this study was to describe the degree of myocardial injury and associated outcomes in a large hospitalized cohort with laboratory-confirmed COVID-19. METHODS: Patients with COVID-19 admitted to 1 of 5 Mount Sinai Health System hospitals in New York City between February 27, 2020, and April 12, 2020, with troponin-I (normal value <0.03 ng/ml) measured within 24 h of admission were included (n = 2,736). Demographics, medical histories, admission laboratory results, and outcomes were captured from the hospitals' electronic health records. RESULTS: The median age was 66.4 years, with 59.6% men. Cardiovascular disease (CVD), including coronary artery disease, atrial fibrillation, and heart failure, was more prevalent in patients with higher troponin concentrations, as were hypertension and diabetes. A total of 506 (18.5%) patients died during hospitalization. In all, 985 (36%) patients had elevated troponin concentrations. After adjusting for disease severity and relevant clinical factors, even small amounts of myocardial injury (e.g., troponin I >0.03 to 0.09 ng/ml; n = 455; 16.6%) were significantly associated with death (adjusted hazard ratio: 1.75; 95% CI: 1.37 to 2.24; p < 0.001) while greater amounts (e.g., troponin I >0.09 ng/dl; n = 530; 19.4%) were significantly associated with higher risk (adjusted HR: 3.03; 95% CI: 2.42 to 3.80; p < 0.001). CONCLUSIONS: Myocardial injury is prevalent among patients hospitalized with COVID-19; however, troponin concentrations were generally present at low levels. Patients with CVD are more likely to have myocardial injury than patients without CVD. Troponin elevation among patients hospitalized with COVID-19 is associated with higher risk of mortality.

Subject(s)

Cardiovascular Diseases/complications , Comorbidity , Coronavirus Infections/complications , Myocardial Infarction/complications , Myocardium/pathology , Pneumonia, Viral/complications , Troponin I/blood , Adolescent , Adult , Aged , Aged, 80 and over , COVID-19 , Cardiovascular Diseases/epidemiology , Coronavirus Infections/epidemiology , Electronic Health Records , Female , Heart Injuries/complications , Heart Injuries/epidemiology , Hospitalization , Humans , Incidence , Male , Middle Aged , Myocardial Infarction/epidemiology , New York City , Pandemics , Pneumonia, Viral/epidemiology , Prevalence , Risk Factors , Treatment Outcome , Young Adult

11.

A common variant in PNPLA3 is associated with age at diagnosis of NAFLD in patients from a multi-ethnic biobank.

Walker, Ryan W; Belbin, Gillian M; Sorokin, Elena P; Van Vleck, Tielman; Wojcik, Genevieve L; Moscati, Arden; Gignoux, Christopher R; Cho, Judy; Abul-Husn, Noura S; Nadkarni, Girish; Kenny, Eimear E; Loos, Ruth J F.

J Hepatol ; 72(6): 1070-1081, 2020 06.

Article in English | MEDLINE | ID: mdl-32145261

ABSTRACT

BACKGROUND & AIMS: The Ile138Met variant (rs738409) in the PNPLA3 gene has the largest effect on non-alcoholic fatty liver disease (NAFLD), increasing the risk of progression to severe forms of liver disease. It remains unknown if the variant plays a role in age of NAFLD onset. We aimed to determine if rs738409 impacts on the age of NAFLD diagnosis. METHODS: We applied a novel natural language processing (NLP) algorithm to a longitudinal electronic health records (EHR) dataset of >27,000 individuals with genetic data from a multi-ethnic biobank, defining NAFLD cases (n = 1,703) and confirming controls (n = 8,119). We conducted i) a survival analysis to determine if age at diagnosis differed by rs738409 genotype, ii) a receiver operating characteristics analysis to assess the utility of the rs738409 genotype in discriminating NAFLD cases from controls, and iii) a phenome-wide association study (PheWAS) between rs738409 and 10,095 EHR-derived disease diagnoses. RESULTS: The PNPLA3 G risk allele was associated with: i) earlier age of NAFLD diagnosis, with the strongest effect in Hispanics (hazard ratio 1.33; 95% CI 1.15-1.53; p <0.0001) among whom a NAFLD diagnosis was 15% more likely in risk allele carriers vs. non-carriers; ii) increased NAFLD risk (odds ratio 1.61; 95% CI 1.349-1.73; p <0.0001), with the strongest effect among Hispanics (odds ratio 1.43; 95% CI 1.28-1.59; p <0.0001); iii) additional liver diseases in a PheWAS (p <4.95 × 10-6) where the risk variant also associated with earlier age of diagnosis. CONCLUSION: Given the role of the rs738409 in NAFLD diagnosis age, our results suggest that stratifying risk within populations known to have an enhanced risk of liver disease, such as Hispanic carriers of the rs738409 variant, would be effective in earlier identification of those who would benefit most from early NAFLD prevention and treatment strategies. LAY SUMMARY: Despite clear associations between the PNPLA3 rs738409 variant and elevated risk of progression from non-alcoholic fatty liver disease (NAFLD) to more severe forms of liver disease, it remains unknown if PNPLA3 rs738409 plays a role in the age of NAFLD onset. Herein, we found that this risk variant is associated with an earlier age of NAFLD and other liver disease diagnoses; an observation most pronounced in Hispanic Americans. We conclude that PNPLA3 rs738409 could be used to better understand liver disease risk within vulnerable populations and identify patients that may benefit from early prevention strategies.

Subject(s)

Biological Specimen Banks , Lipase/genetics , Membrane Proteins/genetics , Non-alcoholic Fatty Liver Disease/diagnosis , Non-alcoholic Fatty Liver Disease/genetics , Polymorphism, Single Nucleotide , Adolescent , Adult , Age Factors , Aged , Aged, 80 and over , Alleles , Case-Control Studies , Child , Child, Preschool , Electronic Health Records , Female , Gene Frequency , Genetic Predisposition to Disease , Genotype , Hispanic or Latino/genetics , Humans , Infant , Infant, Newborn , Kaplan-Meier Estimate , Longitudinal Studies , Male , Middle Aged , Non-alcoholic Fatty Liver Disease/ethnology , Non-alcoholic Fatty Liver Disease/mortality , Young Adult

12.

Natural language processing of electronic health records is superior to billing codes to identify symptom burden in hemodialysis patients.

Chan, Lili; Beers, Kelly; Yau, Amy A; Chauhan, Kinsuk; Duffy, Áine; Chaudhary, Kumardeep; Debnath, Neha; Saha, Aparna; Pattharanitima, Pattharawin; Cho, Judy; Kotanko, Peter; Federman, Alex; Coca, Steven G; Van Vleck, Tielman; Nadkarni, Girish N.

Kidney Int ; 97(2): 383-392, 2020 02.

Article in English | MEDLINE | ID: mdl-31883805

ABSTRACT

Symptoms are common in patients on maintenance hemodialysis but identification is challenging. New informatics approaches including natural language processing (NLP) can be utilized to identify symptoms from narrative clinical documentation. Here we utilized NLP to identify seven patient symptoms from notes of maintenance hemodialysis patients of the BioMe Biobank and validated our findings using a separate cohort and the MIMIC-III database. NLP performance was compared for symptom detection with International Classification of Diseases (ICD)-9/10 codes and the performance of both methods were validated against manual chart review. From 1034 and 519 hemodialysis patients within BioMe and MIMIC-III databases, respectively, the most frequently identified symptoms by NLP were fatigue, pain, and nausea/vomiting. In BioMe, sensitivity for NLP (0.85 - 0.99) was higher than for ICD codes (0.09 - 0.59) for all symptoms with similar results in the BioMe validation cohort and MIMIC-III. ICD codes were significantly more specific for nausea/vomiting in BioMe and more specific for fatigue, depression, and pain in the MIMIC-III database. A majority of patients in both cohorts had four or more symptoms. Patients with more symptoms identified by NLP, ICD, and chart review had more clinical encounters. NLP had higher specificity in inpatient notes but higher sensitivity in outpatient notes and performed similarly across pain severity subgroups. Thus, NLP had higher sensitivity compared to ICD codes for identification of seven common hemodialysis-related symptoms, with comparable specificity between the two methods. Hence, NLP may be useful for the high-throughput identification of patient-centered outcomes when using electronic health records.

Subject(s)

Electronic Health Records , Natural Language Processing , Algorithms , Databases, Factual , Humans , Renal Dialysis/adverse effects

13.

Association of the V122I Hereditary Transthyretin Amyloidosis Genetic Variant With Heart Failure Among Individuals of African or Hispanic/Latino Ancestry.

Damrauer, Scott M; Chaudhary, Kumardeep; Cho, Judy H; Liang, Lusha W; Argulian, Edgar; Chan, Lili; Dobbyn, Amanda; Guerraty, Marie A; Judy, Renae; Kay, Jenna; Kember, Rachel L; Levin, Michael G; Saha, Aparna; Van Vleck, Tielman; Verma, Shefali S; Weaver, JoEllen; Abul-Husn, Noura S; Baras, Aris; Chirinos, Julio A; Drachman, Brian; Kenny, Eimear E; Loos, Ruth J F; Narula, Jagat; Overton, John; Reid, Jeffrey; Ritchie, Marylyn; Sirugo, Giorgio; Nadkarni, Girish; Rader, Daniel J; Do, Ron.

JAMA ; 322(22): 2191-2202, 2019 12 10.

Article in English | MEDLINE | ID: mdl-31821430

ABSTRACT

Importance: Hereditary transthyretin (TTR) amyloid cardiomyopathy (hATTR-CM) due to the TTR V122I variant is an autosomal-dominant disorder that causes heart failure in elderly individuals of African ancestry. The clinical associations of carrying the variant, its effect in other African ancestry populations including Hispanic/Latino individuals, and the rates of achieving a clinical diagnosis in carriers are unknown. Objective: To assess the association between the TTR V122I variant and heart failure and identify rates of hATTR-CM diagnosis among carriers with heart failure. Design, Setting, and Participants: Cross-sectional analysis of carriers and noncarriers of TTR V122I of African ancestry aged 50 years or older enrolled in the Penn Medicine Biobank between 2008 and 2017 using electronic health record data from 1996 to 2017. Case-control study in participants of African and Hispanic/Latino ancestry with and without heart failure in the Mount Sinai BioMe Biobank enrolled between 2007 and 2015 using electronic health record data from 2007 to 2018. Exposures: TTR V122I carrier status. Main Outcomes and Measures: The primary outcome was prevalent heart failure. The rate of diagnosis with hATTR-CM among TTR V122I carriers with heart failure was measured. Results: The cross-sectional cohort included 3724 individuals of African ancestry with a median age of 64 years (interquartile range, 57-71); 1755 (47%) were male, 2896 (78%) had a diagnosis of hypertension, and 753 (20%) had a history of myocardial infarction or coronary revascularization. There were 116 TTR V122I carriers (3.1%); 1121 participants (30%) had heart failure. The case-control study consisted of 2307 individuals of African ancestry and 3663 Hispanic/Latino individuals; the median age was 73 years (interquartile range, 68-80), 2271 (38%) were male, 4709 (79%) had a diagnosis of hypertension, and 1008 (17%) had a history of myocardial infarction or coronary revascularization. There were 1376 cases of heart failure. TTR V122I was associated with higher rates of heart failure (cross-sectional cohort: n = 51/116 TTR V122I carriers [44%], n = 1070/3608 noncarriers [30%], adjusted odds ratio, 1.7 [95% CI, 1.2-2.4], P = .006; case-control study: n = 36/1376 heart failure cases [2.6%], n = 82/4594 controls [1.8%], adjusted odds ratio, 1.8 [95% CI, 1.2-2.7], P = .008). Ten of 92 TTR V122I carriers with heart failure (11%) were diagnosed as having hATTR-CM; the median time from onset of symptoms to clinical diagnosis was 3 years. Conclusions and Relevance: Among individuals of African or Hispanic/Latino ancestry enrolled in 2 academic medical center-based biobanks, the TTR V122I genetic variant was significantly associated with heart failure.

Subject(s)

Amyloid Neuropathies, Familial/genetics , Black or African American/genetics , Heart Failure/genetics , Hispanic or Latino/genetics , Prealbumin/genetics , Academic Medical Centers , Aged , Amyloid Neuropathies, Familial/complications , Amyloid Neuropathies, Familial/ethnology , Biological Specimen Banks , Case-Control Studies , Cross-Sectional Studies , Female , Genetic Variation , Heart Failure/ethnology , Humans , Male , Middle Aged

14.

Augmented intelligence with natural language processing applied to electronic health records for identifying patients with non-alcoholic fatty liver disease at risk for disease progression.

Van Vleck, Tielman T; Chan, Lili; Coca, Steven G; Craven, Catherine K; Do, Ron; Ellis, Stephen B; Kannry, Joseph L; Loos, Ruth J F; Bonis, Peter A; Cho, Judy; Nadkarni, Girish N.

Int J Med Inform ; 129: 334-341, 2019 09.

Article in English | MEDLINE | ID: mdl-31445275

ABSTRACT

OBJECTIVE: Electronic health record (EHR) systems contain structured data (such as diagnostic codes) and unstructured data (clinical documentation). Clinical insights can be derived from analyzing both. The use of natural language processing (NLP) algorithms to effectively analyze unstructured data has been well demonstrated. Here we examine the utility of NLP for the identification of patients with non-alcoholic fatty liver disease, assess patterns of disease progression, and identify gaps in care related to breakdown in communication among providers. MATERIALS AND METHODS: All clinical notes available on the 38,575 patients enrolled in the Mount Sinai BioMe cohort were loaded into the NLP system. We compared analysis of structured and unstructured EHR data using NLP, free-text search, and diagnostic codes with validation against expert adjudication. We then used the NLP findings to measure physician impression of progression from early-stage NAFLD to NASH or cirrhosis. Similarly, we used the same NLP findings to identify mentions of NAFLD in radiology reports that did not persist into clinical notes. RESULTS: Out of 38,575 patients, we identified 2,281 patients with NAFLD. From the remainder, 10,653 patients with similar data density were selected as a control group. NLP outperformed ICD and text search in both sensitivity (NLP: 0.93, ICD: 0.28, text search: 0.81) and F2 score (NLP: 0.92, ICD: 0.34, text search: 0.81). Of 2281 NAFLD patients, 673 (29.5%) were believed to have progressed to NASH or cirrhosis. Among 176 where NAFLD was noted prior to NASH, the average progression time was 410 days. 619 (27.1%) NAFLD patients had it documented only in radiology notes and not acknowledged in other forms of clinical documentation. Of these, 170 (28.4%) were later identified as having likely developed NASH or cirrhosis after a median 1057.3 days. DISCUSSION: NLP-based approaches were more accurate at identifying NAFLD within the EHR than ICD/text search-based approaches. Suspected NAFLD on imaging is often not acknowledged in subsequent clinical documentation. Many such patients are later found to have more advanced liver disease. Analysis of information flows demonstrated loss of key information that could have been used to help prevent the progression of early NAFLD (NAFL) to NASH or cirrhosis. CONCLUSION: For identification of NAFLD, NLP performed better than alternative selection modalities. It then facilitated analysis of knowledge flow between physician and enabled the identification of breakdowns where key information was lost that could have slowed or prevented later disease progression.

Subject(s)

Electronic Health Records , Natural Language Processing , Non-alcoholic Fatty Liver Disease/diagnosis , Algorithms , Cohort Studies , Disease Progression , Female , Humans , Male , Middle Aged

15.

Rate of Correction of Hypernatremia and Health Outcomes in Critically Ill Patients.

Chauhan, Kinsuk; Pattharanitima, Pattharawin; Patel, Niralee; Duffy, Aine; Saha, Aparna; Chaudhary, Kumardeep; Debnath, Neha; Van Vleck, Tielman; Chan, Lili; Nadkarni, Girish N; Coca, Steven G.

Clin J Am Soc Nephrol ; 14(5): 656-663, 2019 05 07.

Article in English | MEDLINE | ID: mdl-30948456

ABSTRACT

BACKGROUND AND OBJECTIVES: Hypernatremia is common in hospitalized, critically ill patients. Although there are no clear guidelines on sodium correction rate for hypernatremia, some studies suggest a reduction rate not to exceed 0.5 mmol/L per hour. However, the data supporting this recommendation and the optimal rate of hypernatremia correction in hospitalized adults are unclear. DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS: We assessed the association of hypernatremia correction rates with neurologic outcomes and mortality in critically ill patients with hypernatremia at admission and those that developed hypernatremia during hospitalization. We used data from the Medical Information Mart for Intensive Care-III and identified patients with hypernatremia (serum sodium level >155 mmol/L) on admission (n=122) and hospital-acquired (n=327). We calculated different ranges of rapid correction rates (>0.5 mmol/L per hour overall and >8, >10, and >12 mmol/L per 24 hours) and utilized logistic regression to generate adjusted odds ratios (aOR) with 95% confidence intervals (95% CIs) to examine association with outcomes. RESULTS: We had complete data on 122 patients with severe hypernatremia on admission and 327 patients who developed hospital-acquired hypernatremia. The difference in in-hospital 30-day mortality proportion between rapid (>0.5 mmol/L per hour) and slower (≤0.5 mmol/L per hour) correction rates were not significant either in patients with hypernatremia at admission with rapid versus slow correction (25% versus 28%; P=0.80) or in patients with hospital-acquired hypernatremia with rapid versus slow correction (44% versus 40%; P=0.50). There was no difference in aOR of mortality for rapid versus slow correction in either admission (aOR, 1.3; 95% CI, 0.5 to 3.7) or hospital-acquired hypernatremia (aOR, 1.3; 95% CI, 0.8 to 2.3). Manual chart review of all suspected chronic hypernatremia patients, which included all 122 with hypernatremia at admission, 128 of the 327 hospital-acquired hypernatremia, and an additional 28 patients with ICD-9 codes for cerebral edema, seizures and/or alteration of consciousness, did not reveal a single case of cerebral edema attributable to rapid hyprnatremia correction. CONCLUSIONS: We did not find any evidence that rapid correction of hypernatremia is associated with a higher risk for mortality, seizure, alteration of consciousness, and/or cerebral edema in critically ill adult patients with either admission or hospital-acquired hypernatremia.

Subject(s)

Critical Illness , Hypernatremia/therapy , Aged , Aged, 80 and over , Cohort Studies , Female , Hospital Mortality , Humans , Hypernatremia/complications , Hypernatremia/mortality , Male , Middle Aged , Sodium/blood

16.

Genetic identification of a common collagen disease in puerto ricans via identity-by-descent mapping in a health system.

Belbin, Gillian Morven; Odgis, Jacqueline; Sorokin, Elena P; Yee, Muh-Ching; Kohli, Sumita; Glicksberg, Benjamin S; Gignoux, Christopher R; Wojcik, Genevieve L; Van Vleck, Tielman; Jeff, Janina M; Linderman, Michael; Schurmann, Claudia; Ruderfer, Douglas; Cai, Xiaoqiang; Merkelson, Amanda; Justice, Anne E; Young, Kristin L; Graff, Misa; North, Kari E; Peters, Ulrike; James, Regina; Hindorff, Lucia; Kornreich, Ruth; Edelmann, Lisa; Gottesman, Omri; Stahl, Eli Ea; Cho, Judy H; Loos, Ruth Jf; Bottinger, Erwin P; Nadkarni, Girish N; Abul-Husn, Noura S; Kenny, Eimear E.

Elife ; 62017 09 12.

Article in English | MEDLINE | ID: mdl-28895531

ABSTRACT

Achieving confidence in the causality of a disease locus is a complex task that often requires supporting data from both statistical genetics and clinical genomics. Here we describe a combined approach to identify and characterize a genetic disorder that leverages distantly related patients in a health system and population-scale mapping. We utilize genomic data to uncover components of distant pedigrees, in the absence of recorded pedigree information, in the multi-ethnic BioMe biobank in New York City. By linking to medical records, we discover a locus associated with both elevated genetic relatedness and extreme short stature. We link the gene, COL27A1, with a little-known genetic disease, previously thought to be rare and recessive. We demonstrate that disease manifests in both heterozygotes and homozygotes, indicating a common collagen disorder impacting up to 2% of individuals of Puerto Rican ancestry, leading to a better understanding of the continuum of complex and Mendelian disease.

Subject(s)

Collagen Diseases/epidemiology , Collagen Diseases/genetics , Fibrillar Collagens/genetics , Molecular Epidemiology , Pedigree , Adolescent , Adult , Aged , Child , Female , Genotype , Heterozygote , Hispanic or Latino , Homozygote , Humans , Male , Middle Aged , Multigene Family , Musculoskeletal Diseases/epidemiology , Musculoskeletal Diseases/genetics , New York City/epidemiology , New York City/ethnology , Whole Genome Sequencing , Young Adult

17.

Corpus-Based Problem Selection for EHR Note Summarization.

Van Vleck, Tielman T; Elhadad, Noémie.

AMIA Annu Symp Proc ; 2010: 817-21, 2010 Nov 13.

Article in English | MEDLINE | ID: mdl-21347092

ABSTRACT

Physicians have access to patient notes in volumes far greater than what is practical to read within the context of a standard clinical scenario. As a preliminary step toward being able to provide a longitudinal summary of patient history, methods are examined for the automated extraction of relevant patient problems from existing clinical notes. We explore a grounded approach to identifying important patient problems from patient history. Methods build on existing NLP and text-summarization methodologies and leverage features observed in a relevant corpus.

Subject(s)

Electronic Health Records , Physicians , Humans , Natural Language Processing

18.

Content and structure of clinical problem lists: a corpus analysis.

Van Vleck, Tielman T; Wilcox, Adam; Stetson, Peter D; Johnson, Stephen B; Elhadad, Noémie.

AMIA Annu Symp Proc ; : 753-7, 2008 Nov 06.

Article in English | MEDLINE | ID: mdl-18999284

ABSTRACT

In the interest of designing an automated high-level, longitudinal clinical summary of a patient record, we analyze traditional ways in which medical problems pertaining to the patient are summarized in the electronic health record. The patient problem list has become a commonly used proxy for a summary of patient history and automated methods have been proposed to generate it. However, little research has been conducted on how to structure the problem list in a manner most effective for supporting clinical care. This study analyzes the structure and content of the Past Medical History (PMH) sections of a large corpus of clinical notes, as a proxy for problem lists. Findings show that when listing patients history, physicians convey several semantic types of information, not only problems. Furthermore, they often group related concepts in a single line of the PMH. In contrast, traditional problem lists allow only a simple enumeration of coded terms. Content analysis goes on to reiterate the value of more complex representations as well as provide valuable data and guidelines for automated generation of a clinical summary.

Subject(s)

Information Storage and Retrieval/methods , Medical History Taking/methods , Medical Records Systems, Computerized/statistics & numerical data , Medical Records, Problem-Oriented/statistics & numerical data , Natural Language Processing , Pattern Recognition, Automated/methods , Algorithms , Artificial Intelligence , Clinical Protocols , New York , Subject Headings

19.

An electronic health record based on structured narrative.

Johnson, Stephen B; Bakken, Suzanne; Dine, Daniel; Hyun, Sookyung; Mendonça, Eneida; Morrison, Frances; Bright, Tiffani; Van Vleck, Tielman; Wrenn, Jesse; Stetson, Peter.

J Am Med Inform Assoc ; 15(1): 54-64, 2008.

Article in English | MEDLINE | ID: mdl-17947628

ABSTRACT

OBJECTIVE: To develop an electronic health record that facilitates rapid capture of detailed narrative observations from clinicians, with partial structuring of narrative information for integration and reuse. DESIGN: We propose a design in which unstructured text and coded data are fused into a single model called structured narrative. Each major clinical event (e.g., encounter or procedure) is represented as a document that is marked up to identify gross structure (sections, fields, paragraphs, lists) as well as fine structure within sentences (concepts, modifiers, relationships). Marked up items are associated with standardized codes that enable linkage to other events, as well as efficient reuse of information, which can speed up data entry by clinicians. Natural language processing is used to identify fine structure, which can reduce the need for form-based entry. VALIDATION: The model is validated through an example of use by a clinician, with discussion of relevant aspects of the user interface, data structures and processing rules. DISCUSSION: The proposed model represents all patient information as documents with standardized gross structure (templates). Clinicians enter their data as free text, which is coded by natural language processing in real time making it immediately usable for other computation, such as alerts or critiques. In addition, the narrative data annotates and augments structured data with temporal relations, severity and degree modifiers, causal connections, clinical explanations and rationale. CONCLUSION: Structured narrative has potential to facilitate capture of data directly from clinicians by allowing freedom of expression, giving immediate feedback, supporting reuse of clinical information and structuring data for subsequent processing, such as quality assurance and clinical research.

Subject(s)

Medical Records Systems, Computerized , Natural Language Processing , User-Computer Interface , Documentation , Humans , Information Storage and Retrieval/methods , Medical History Taking , Software , Systems Integration , Vocabulary, Controlled

20.

Assessing data relevance for automated generation of a clinical summary.

Van Vleck, Tielman T; Stein, Daniel M; Stetson, Peter D; Johnson, Stephen B.

AMIA Annu Symp Proc ; : 761-5, 2007 Oct 11.

Article in English | MEDLINE | ID: mdl-18693939

ABSTRACT

Clinicians perform many tasks in their daily work requiring summarization of clinical data. However, as technology makes more data available, the challenges of data overload become ever more significant. As interoperable data exchange between hospitals becomes more common, there is an increased need for tools to summarize information. Our goal is to develop automated tools to aid clinical data summarization. Structured interviews were conducted on physicians to identify information from an electronic health record they considered relevant to explaining the patients medical history. Desirable data types were systematically evaluated using qualitative and quantitative analysis to assess data categories and patterns of data use. We report here on the implications of these results for the design of automated tools for summarization of patient history.

Subject(s)

Attitude of Health Personnel , Medical Records Systems, Computerized , Documentation/standards , Electronic Data Processing , Humans , Interviews as Topic , Medical Record Linkage , Medical Records Systems, Computerized/standards

ABSTRACT

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL