RESUMEN
Kidney disease affects 50% of all diabetic patients; however, prediction of disease progression has been challenging due to inherent disease heterogeneity. We use deep learning to identify novel genetic signatures prognostically associated with outcomes. Using autoencoders and unsupervised clustering of electronic health record data on 1,372 diabetic kidney disease patients, we establish two clusters with differential prevalence of end-stage kidney disease. Exome-wide associations identify a novel variant in ARHGEF18, a Rho guanine exchange factor specifically expressed in glomeruli. Overexpression of ARHGEF18 in human podocytes leads to impairments in focal adhesion architecture, cytoskeletal dynamics, cellular motility, and RhoA/Rac1 activation. Mutant GEF18 is resistant to ubiquitin mediated degradation leading to pathologically increased protein levels. Our findings uncover the first known disease-causing genetic variant that affects protein stability of a cytoskeletal regulator through impaired degradation, a potentially novel class of expression quantitative trait loci that can be therapeutically targeted.
RESUMEN
Inflammatory bowel disease (IBD) is a group of chronic digestive tract inflammatory conditions whose genetic etiology is still poorly understood. The incidence of IBD is particularly high among Ashkenazi Jews. Here, we identify 8 novel and plausible IBD-causing genes from the exomes of 4453 genetically identified Ashkenazi Jewish IBD cases (1734) and controls (2719). Various biological pathway analyses are performed, along with bulk and single-cell RNA sequencing, to demonstrate the likely physiological relatedness of the novel genes to IBD. Importantly, we demonstrate that the rare and high impact genetic architecture of Ashkenazi Jewish adult IBD displays significant overlap with very early onset-IBD genetics. Moreover, by performing biobank phenome-wide analyses, we find that IBD genes have pleiotropic effects that involve other immune responses. Finally, we show that polygenic risk score analyses based on genome-wide high impact variants have high power to predict IBD susceptibility.
Asunto(s)
Enfermedades Inflamatorias del Intestino , Judíos , Adulto , Humanos , Judíos/genética , Exoma/genética , Enfermedades Inflamatorias del Intestino/genética , Medición de Riesgo , Predisposición Genética a la EnfermedadRESUMEN
Unstructured data in the electronic health records contain essential patient information. Natural language processing (NLP), teaching a computer to read, allows us to tap into these data without needing the time and effort of manual chart abstraction. The core first step for all NLP algorithms is preprocessing the text to identify the core words that differentiate the text while filtering out the noise. Traditional NLP uses a rule-based approach, applying grammatical rules to infer meaning from the text. Newer NLP approaches use machine learning/deep learning which can infer meaning without explicitly being programmed. NLP use in nephrology research has focused on identifying distinct disease processes, such as CKD, and extraction of patient-oriented outcomes such as symptoms with high sensitivity. NLP can identify patient features from clinical text associated with acute kidney injury and progression of CKD. Lastly, inclusion of features extracted using NLP improved the performance of risk-prediction models compared to models that only use structured data. Implementation of NLP algorithms has been slow, partially hindered by the lack of external validation of NLP algorithms. However, NLP allows for extraction of key patient characteristics from free text, an infrequently used resource in nephrology.
Asunto(s)
Nefrología , Insuficiencia Renal Crónica , Algoritmos , Registros Electrónicos de Salud , Humanos , Procesamiento de Lenguaje Natural , Insuficiencia Renal Crónica/terapiaRESUMEN
Identifying whether a given genetic mutation results in a gene product with increased (gain-of-function; GOF) or diminished (loss-of-function; LOF) activity is an important step toward understanding disease mechanisms because they may result in markedly different clinical phenotypes. Here, we generated an extensive database of documented germline GOF and LOF pathogenic variants by employing natural language processing (NLP) on the available abstracts in the Human Gene Mutation Database. We then investigated various gene- and protein-level features of GOF and LOF variants and applied machine learning and statistical analyses to identify discriminative features. We found that GOF variants were enriched in essential genes, for autosomal-dominant inheritance, and in protein binding and interaction domains, whereas LOF variants were enriched in singleton genes, for protein-truncating variants, and in protein core regions. We developed a user-friendly web-based interface that enables the extraction of selected subsets from the GOF/LOF database by a broad set of annotated features and downloading of up-to-date versions. These results improve our understanding of how variants affect gene/protein function and may ultimately guide future treatment options.
Asunto(s)
Bases de Datos Genéticas , Mutación con Ganancia de Función , Mutación con Pérdida de Función , Proteínas/genética , Nube Computacional , Predisposición Genética a la Enfermedad , Genoma Humano , Mutación de Línea Germinal , Humanos , Intervención basada en la Internet , Aprendizaje AutomáticoRESUMEN
Polygenic risk scores (PRS) summarize genetic liability to a disease at the individual level, and the aim is to use them as biomarkers of disease and poor outcomes in real-world clinical practice. To date, few studies have assessed the prognostic value of PRS relative to standards of care. Schizophrenia (SCZ), the archetypal psychotic illness, is an ideal test case for this because the predictive power of the SCZ PRS exceeds that of most other common diseases. Here, we analyzed clinical and genetic data from two multi-ethnic cohorts totaling 8,541 adults with SCZ and related psychotic disorders, to assess whether the SCZ PRS improves the prediction of poor outcomes relative to clinical features captured in a standard psychiatric interview. For all outcomes investigated, the SCZ PRS did not improve the performance of predictive models, an observation that was generally robust to divergent case ascertainment strategies and the ancestral background of the study participants.
Asunto(s)
Predisposición Genética a la Enfermedad , Herencia Multifactorial/genética , Trastornos Psicóticos/genética , Esquizofrenia/genética , Adulto , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Persona de Mediana Edad , Pronóstico , Trastornos Psicóticos/patología , Factores de Riesgo , Esquizofrenia/patologíaRESUMEN
Diabetic retinopathy (DR) is a common consequence in type 2 diabetes (T2D) and a leading cause of blindness in working-age adults. Yet, its genetic predisposition is largely unknown. Here, we examined the polygenic architecture underlying DR by deriving and assessing a genome-wide polygenic risk score (PRS) for DR. We evaluated the PRS in 6079 individuals with T2D of European, Hispanic, African and other ancestries from a large-scale multi-ethnic biobank. Main outcomes were PRS association with DR diagnosis, symptoms and complications, and time to diagnosis, and transferability to non-European ancestries. We observed that PRS was significantly associated with DR. A standard deviation increase in PRS was accompanied by an adjusted odds ratio (OR) of 1.12 [95% confidence interval (CI) 1.04-1.20; P = 0.001] for DR diagnosis. When stratified by ancestry, PRS was associated with the highest OR in European ancestry (OR = 1.22, 95% CI 1.02-1.41; P = 0.049), followed by African (OR = 1.15, 95% CI 1.03-1.28; P = 0.028) and Hispanic ancestries (OR = 1.10, 95% CI 1.00-1.10; P = 0.050). Individuals in the top PRS decile had a 1.8-fold elevated risk for DR versus the bottom decile (P = 0.002). Among individuals without DR diagnosis, the top PRS decile had more DR symptoms than the bottom decile (P = 0.008). The PRS was associated with retinal hemorrhage (OR = 1.44, 95% CI 1.03-2.02; P = 0.03) and earlier DR presentation (10% probability of DR by 4 years in the top PRS decile versus 8 years in the bottom decile). These results establish the significant polygenic underpinnings of DR and indicate the need for more diverse ancestries in biobanks to develop multi-ancestral PRS.
Asunto(s)
Diabetes Mellitus Tipo 2/epidemiología , Retinopatía Diabética/epidemiología , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Adulto , Anciano , Población Negra/genética , Diabetes Mellitus Tipo 2/complicaciones , Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/patología , Retinopatía Diabética/complicaciones , Retinopatía Diabética/genética , Retinopatía Diabética/patología , Hispánicos o Latinos/genética , Humanos , Persona de Mediana Edad , Herencia Multifactorial/genética , Medición de Riesgo , Factores de Riesgo , Población Blanca/genéticaRESUMEN
BACKGROUND/AIMS: Acute kidney injury (AKI) in critically ill patients is common, and continuous renal replacement therapy (CRRT) is a preferred mode of renal replacement therapy (RRT) in hemodynamically unstable patients. Prediction of clinical outcomes in patients on CRRT is challenging. We utilized several approaches to predict RRT-free survival (RRTFS) in critically ill patients with AKI requiring CRRT. METHODS: We used the Medical Information Mart for Intensive Care (MIMIC-III) database to identify patients ≥18 years old with AKI on CRRT, after excluding patients who had ESRD on chronic dialysis, and kidney transplantation. We defined RRTFS as patients who were discharged alive and did not require RRT ≥7 days prior to hospital discharge. We utilized all available biomedical data up to CRRT initiation. We evaluated 7 approaches, including logistic regression (LR), random forest (RF), support vector machine (SVM), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost), multilayer perceptron (MLP), and MLP with long short-term memory (MLP + LSTM). We evaluated model performance by using area under the receiver operating characteristic (AUROC) curves. RESULTS: Out of 684 patients with AKI on CRRT, 205 (30%) patients had RRTFS. The median age of patients was 63 years and their median Simplified Acute Physiology Score (SAPS) II was 67 (interquartile range 52-84). The MLP + LSTM showed the highest AUROC (95% CI) of 0.70 (0.67-0.73), followed by MLP 0.59 (0.54-0.64), LR 0.57 (0.52-0.62), SVM 0.51 (0.46-0.56), AdaBoost 0.51 (0.46-0.55), RF 0.44 (0.39-0.48), and XGBoost 0.43 (CI 0.38-0.47). CONCLUSIONS: A MLP + LSTM model outperformed other approaches for predicting RRTFS. Performance could be further improved by incorporating other data types.
Asunto(s)
Lesión Renal Aguda/terapia , Terapia de Reemplazo Renal , Lesión Renal Aguda/diagnóstico , Factores de Edad , Anciano , Cuidados Críticos , Femenino , Humanos , Modelos Logísticos , Aprendizaje Automático , Masculino , Persona de Mediana Edad , PronósticoRESUMEN
BACKGROUND AND OBJECTIVES: Sepsis-associated AKI is a heterogeneous clinical entity. We aimed to agnostically identify sepsis-associated AKI subphenotypes using deep learning on routinely collected data in electronic health records. DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS: We used the Medical Information Mart for Intensive Care III database, which consists of electronic health record data from intensive care units in a tertiary care hospital in the United States. We included patients ≥18 years with sepsis who developed AKI within 48 hours of intensive care unit admission. We then used deep learning to utilize all available vital signs, laboratory measurements, and comorbidities to identify subphenotypes. Outcomes were mortality 28 days after AKI and dialysis requirement. RESULTS: We identified 4001 patients with sepsis-associated AKI. We utilized 2546 combined features for K-means clustering, identifying three subphenotypes. Subphenotype 1 had 1443 patients, and subphenotype 2 had 1898 patients, whereas subphenotype 3 had 660 patients. Subphenotype 1 had the lowest proportion of liver disease and lowest Simplified Acute Physiology Score II scores compared with subphenotypes 2 and 3. The proportions of patients with CKD were similar between subphenotypes 1 and 3 (15%) but highest in subphenotype 2 (21%). Subphenotype 1 had lower median bilirubin levels, aspartate aminotransferase, and alanine aminotransferase compared with subphenotypes 2 and 3. Patients in subphenotype 1 also had lower median lactate, lactate dehydrogenase, and white blood cell count than patients in subphenotypes 2 and 3. Subphenotype 1 also had lower creatinine and BUN than subphenotypes 2 and 3. Dialysis requirement was lowest in subphenotype 1 (4% versus 7% [subphenotype 2] versus 26% [subphenotype 3]). The mortality 28 days after AKI was lowest in subphenotype 1 (23% versus 35% [subphenotype 2] versus 49% [subphenotype 3]). After adjustment, the adjusted odds ratio for mortality for subphenotype 3, with subphenotype 1 as a reference, was 1.9 (95% confidence interval, 1.5 to 2.4). CONCLUSIONS: Utilizing routinely collected laboratory variables, vital signs, and comorbidities, we were able to identify three distinct subphenotypes of sepsis-associated AKI with differing outcomes.
Asunto(s)
Lesión Renal Aguda/clasificación , Lesión Renal Aguda/mortalidad , Aprendizaje Profundo , Hepatopatías/epidemiología , Sepsis/complicaciones , Lesión Renal Aguda/microbiología , Lesión Renal Aguda/terapia , Anciano , Alanina Transaminasa/sangre , Bilirrubina/sangre , Nitrógeno de la Urea Sanguínea , Comorbilidad , Creatinina/sangre , Bases de Datos Factuales , Registros Electrónicos de Salud , Femenino , Glutamil Aminopeptidasa/sangre , Humanos , L-Lactato Deshidrogenasa/sangre , Ácido Láctico/sangre , Recuento de Leucocitos , Masculino , Persona de Mediana Edad , Fenotipo , Pronóstico , Diálisis Renal , Puntuación Fisiológica Simplificada Aguda , Estados Unidos/epidemiologíaRESUMEN
BACKGROUND: The degree of myocardial injury, as reflected by troponin elevation, and associated outcomes among U.S. hospitalized patients with coronavirus disease-2019 (COVID-19) are unknown. OBJECTIVES: The purpose of this study was to describe the degree of myocardial injury and associated outcomes in a large hospitalized cohort with laboratory-confirmed COVID-19. METHODS: Patients with COVID-19 admitted to 1 of 5 Mount Sinai Health System hospitals in New York City between February 27, 2020, and April 12, 2020, with troponin-I (normal value <0.03 ng/ml) measured within 24 h of admission were included (n = 2,736). Demographics, medical histories, admission laboratory results, and outcomes were captured from the hospitals' electronic health records. RESULTS: The median age was 66.4 years, with 59.6% men. Cardiovascular disease (CVD), including coronary artery disease, atrial fibrillation, and heart failure, was more prevalent in patients with higher troponin concentrations, as were hypertension and diabetes. A total of 506 (18.5%) patients died during hospitalization. In all, 985 (36%) patients had elevated troponin concentrations. After adjusting for disease severity and relevant clinical factors, even small amounts of myocardial injury (e.g., troponin I >0.03 to 0.09 ng/ml; n = 455; 16.6%) were significantly associated with death (adjusted hazard ratio: 1.75; 95% CI: 1.37 to 2.24; p < 0.001) while greater amounts (e.g., troponin I >0.09 ng/dl; n = 530; 19.4%) were significantly associated with higher risk (adjusted HR: 3.03; 95% CI: 2.42 to 3.80; p < 0.001). CONCLUSIONS: Myocardial injury is prevalent among patients hospitalized with COVID-19; however, troponin concentrations were generally present at low levels. Patients with CVD are more likely to have myocardial injury than patients without CVD. Troponin elevation among patients hospitalized with COVID-19 is associated with higher risk of mortality.
Asunto(s)
Enfermedades Cardiovasculares/complicaciones , Comorbilidad , Infecciones por Coronavirus/complicaciones , Infarto del Miocardio/complicaciones , Miocardio/patología , Neumonía Viral/complicaciones , Troponina I/sangre , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , COVID-19 , Enfermedades Cardiovasculares/epidemiología , Infecciones por Coronavirus/epidemiología , Registros Electrónicos de Salud , Femenino , Lesiones Cardíacas/complicaciones , Lesiones Cardíacas/epidemiología , Hospitalización , Humanos , Incidencia , Masculino , Persona de Mediana Edad , Infarto del Miocardio/epidemiología , Ciudad de Nueva York , Pandemias , Neumonía Viral/epidemiología , Prevalencia , Factores de Riesgo , Resultado del Tratamiento , Adulto JovenRESUMEN
BACKGROUND & AIMS: The Ile138Met variant (rs738409) in the PNPLA3 gene has the largest effect on non-alcoholic fatty liver disease (NAFLD), increasing the risk of progression to severe forms of liver disease. It remains unknown if the variant plays a role in age of NAFLD onset. We aimed to determine if rs738409 impacts on the age of NAFLD diagnosis. METHODS: We applied a novel natural language processing (NLP) algorithm to a longitudinal electronic health records (EHR) dataset of >27,000 individuals with genetic data from a multi-ethnic biobank, defining NAFLD cases (n = 1,703) and confirming controls (n = 8,119). We conducted i) a survival analysis to determine if age at diagnosis differed by rs738409 genotype, ii) a receiver operating characteristics analysis to assess the utility of the rs738409 genotype in discriminating NAFLD cases from controls, and iii) a phenome-wide association study (PheWAS) between rs738409 and 10,095 EHR-derived disease diagnoses. RESULTS: The PNPLA3 G risk allele was associated with: i) earlier age of NAFLD diagnosis, with the strongest effect in Hispanics (hazard ratio 1.33; 95% CI 1.15-1.53; p <0.0001) among whom a NAFLD diagnosis was 15% more likely in risk allele carriers vs. non-carriers; ii) increased NAFLD risk (odds ratio 1.61; 95% CI 1.349-1.73; p <0.0001), with the strongest effect among Hispanics (odds ratio 1.43; 95% CI 1.28-1.59; p <0.0001); iii) additional liver diseases in a PheWAS (p <4.95 × 10-6) where the risk variant also associated with earlier age of diagnosis. CONCLUSION: Given the role of the rs738409 in NAFLD diagnosis age, our results suggest that stratifying risk within populations known to have an enhanced risk of liver disease, such as Hispanic carriers of the rs738409 variant, would be effective in earlier identification of those who would benefit most from early NAFLD prevention and treatment strategies. LAY SUMMARY: Despite clear associations between the PNPLA3 rs738409 variant and elevated risk of progression from non-alcoholic fatty liver disease (NAFLD) to more severe forms of liver disease, it remains unknown if PNPLA3 rs738409 plays a role in the age of NAFLD onset. Herein, we found that this risk variant is associated with an earlier age of NAFLD and other liver disease diagnoses; an observation most pronounced in Hispanic Americans. We conclude that PNPLA3 rs738409 could be used to better understand liver disease risk within vulnerable populations and identify patients that may benefit from early prevention strategies.
Asunto(s)
Bancos de Muestras Biológicas , Lipasa/genética , Proteínas de la Membrana/genética , Enfermedad del Hígado Graso no Alcohólico/diagnóstico , Enfermedad del Hígado Graso no Alcohólico/genética , Polimorfismo de Nucleótido Simple , Adolescente , Adulto , Factores de Edad , Anciano , Anciano de 80 o más Años , Alelos , Estudios de Casos y Controles , Niño , Preescolar , Registros Electrónicos de Salud , Femenino , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Genotipo , Hispánicos o Latinos/genética , Humanos , Lactante , Recién Nacido , Estimación de Kaplan-Meier , Estudios Longitudinales , Masculino , Persona de Mediana Edad , Enfermedad del Hígado Graso no Alcohólico/etnología , Enfermedad del Hígado Graso no Alcohólico/mortalidad , Adulto JovenRESUMEN
Symptoms are common in patients on maintenance hemodialysis but identification is challenging. New informatics approaches including natural language processing (NLP) can be utilized to identify symptoms from narrative clinical documentation. Here we utilized NLP to identify seven patient symptoms from notes of maintenance hemodialysis patients of the BioMe Biobank and validated our findings using a separate cohort and the MIMIC-III database. NLP performance was compared for symptom detection with International Classification of Diseases (ICD)-9/10 codes and the performance of both methods were validated against manual chart review. From 1034 and 519 hemodialysis patients within BioMe and MIMIC-III databases, respectively, the most frequently identified symptoms by NLP were fatigue, pain, and nausea/vomiting. In BioMe, sensitivity for NLP (0.85 - 0.99) was higher than for ICD codes (0.09 - 0.59) for all symptoms with similar results in the BioMe validation cohort and MIMIC-III. ICD codes were significantly more specific for nausea/vomiting in BioMe and more specific for fatigue, depression, and pain in the MIMIC-III database. A majority of patients in both cohorts had four or more symptoms. Patients with more symptoms identified by NLP, ICD, and chart review had more clinical encounters. NLP had higher specificity in inpatient notes but higher sensitivity in outpatient notes and performed similarly across pain severity subgroups. Thus, NLP had higher sensitivity compared to ICD codes for identification of seven common hemodialysis-related symptoms, with comparable specificity between the two methods. Hence, NLP may be useful for the high-throughput identification of patient-centered outcomes when using electronic health records.
Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Algoritmos , Bases de Datos Factuales , Humanos , Diálisis Renal/efectos adversosRESUMEN
Importance: Hereditary transthyretin (TTR) amyloid cardiomyopathy (hATTR-CM) due to the TTR V122I variant is an autosomal-dominant disorder that causes heart failure in elderly individuals of African ancestry. The clinical associations of carrying the variant, its effect in other African ancestry populations including Hispanic/Latino individuals, and the rates of achieving a clinical diagnosis in carriers are unknown. Objective: To assess the association between the TTR V122I variant and heart failure and identify rates of hATTR-CM diagnosis among carriers with heart failure. Design, Setting, and Participants: Cross-sectional analysis of carriers and noncarriers of TTR V122I of African ancestry aged 50 years or older enrolled in the Penn Medicine Biobank between 2008 and 2017 using electronic health record data from 1996 to 2017. Case-control study in participants of African and Hispanic/Latino ancestry with and without heart failure in the Mount Sinai BioMe Biobank enrolled between 2007 and 2015 using electronic health record data from 2007 to 2018. Exposures: TTR V122I carrier status. Main Outcomes and Measures: The primary outcome was prevalent heart failure. The rate of diagnosis with hATTR-CM among TTR V122I carriers with heart failure was measured. Results: The cross-sectional cohort included 3724 individuals of African ancestry with a median age of 64 years (interquartile range, 57-71); 1755 (47%) were male, 2896 (78%) had a diagnosis of hypertension, and 753 (20%) had a history of myocardial infarction or coronary revascularization. There were 116 TTR V122I carriers (3.1%); 1121 participants (30%) had heart failure. The case-control study consisted of 2307 individuals of African ancestry and 3663 Hispanic/Latino individuals; the median age was 73 years (interquartile range, 68-80), 2271 (38%) were male, 4709 (79%) had a diagnosis of hypertension, and 1008 (17%) had a history of myocardial infarction or coronary revascularization. There were 1376 cases of heart failure. TTR V122I was associated with higher rates of heart failure (cross-sectional cohort: n = 51/116 TTR V122I carriers [44%], n = 1070/3608 noncarriers [30%], adjusted odds ratio, 1.7 [95% CI, 1.2-2.4], P = .006; case-control study: n = 36/1376 heart failure cases [2.6%], n = 82/4594 controls [1.8%], adjusted odds ratio, 1.8 [95% CI, 1.2-2.7], P = .008). Ten of 92 TTR V122I carriers with heart failure (11%) were diagnosed as having hATTR-CM; the median time from onset of symptoms to clinical diagnosis was 3 years. Conclusions and Relevance: Among individuals of African or Hispanic/Latino ancestry enrolled in 2 academic medical center-based biobanks, the TTR V122I genetic variant was significantly associated with heart failure.
Asunto(s)
Neuropatías Amiloides Familiares/genética , Negro o Afroamericano/genética , Insuficiencia Cardíaca/genética , Hispánicos o Latinos/genética , Prealbúmina/genética , Centros Médicos Académicos , Anciano , Neuropatías Amiloides Familiares/complicaciones , Neuropatías Amiloides Familiares/etnología , Bancos de Muestras Biológicas , Estudios de Casos y Controles , Estudios Transversales , Femenino , Variación Genética , Insuficiencia Cardíaca/etnología , Humanos , Masculino , Persona de Mediana EdadRESUMEN
OBJECTIVE: Electronic health record (EHR) systems contain structured data (such as diagnostic codes) and unstructured data (clinical documentation). Clinical insights can be derived from analyzing both. The use of natural language processing (NLP) algorithms to effectively analyze unstructured data has been well demonstrated. Here we examine the utility of NLP for the identification of patients with non-alcoholic fatty liver disease, assess patterns of disease progression, and identify gaps in care related to breakdown in communication among providers. MATERIALS AND METHODS: All clinical notes available on the 38,575 patients enrolled in the Mount Sinai BioMe cohort were loaded into the NLP system. We compared analysis of structured and unstructured EHR data using NLP, free-text search, and diagnostic codes with validation against expert adjudication. We then used the NLP findings to measure physician impression of progression from early-stage NAFLD to NASH or cirrhosis. Similarly, we used the same NLP findings to identify mentions of NAFLD in radiology reports that did not persist into clinical notes. RESULTS: Out of 38,575 patients, we identified 2,281 patients with NAFLD. From the remainder, 10,653 patients with similar data density were selected as a control group. NLP outperformed ICD and text search in both sensitivity (NLP: 0.93, ICD: 0.28, text search: 0.81) and F2 score (NLP: 0.92, ICD: 0.34, text search: 0.81). Of 2281 NAFLD patients, 673 (29.5%) were believed to have progressed to NASH or cirrhosis. Among 176 where NAFLD was noted prior to NASH, the average progression time was 410 days. 619 (27.1%) NAFLD patients had it documented only in radiology notes and not acknowledged in other forms of clinical documentation. Of these, 170 (28.4%) were later identified as having likely developed NASH or cirrhosis after a median 1057.3 days. DISCUSSION: NLP-based approaches were more accurate at identifying NAFLD within the EHR than ICD/text search-based approaches. Suspected NAFLD on imaging is often not acknowledged in subsequent clinical documentation. Many such patients are later found to have more advanced liver disease. Analysis of information flows demonstrated loss of key information that could have been used to help prevent the progression of early NAFLD (NAFL) to NASH or cirrhosis. CONCLUSION: For identification of NAFLD, NLP performed better than alternative selection modalities. It then facilitated analysis of knowledge flow between physician and enabled the identification of breakdowns where key information was lost that could have slowed or prevented later disease progression.
Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Enfermedad del Hígado Graso no Alcohólico/diagnóstico , Algoritmos , Estudios de Cohortes , Progresión de la Enfermedad , Femenino , Humanos , Masculino , Persona de Mediana EdadRESUMEN
BACKGROUND AND OBJECTIVES: Hypernatremia is common in hospitalized, critically ill patients. Although there are no clear guidelines on sodium correction rate for hypernatremia, some studies suggest a reduction rate not to exceed 0.5 mmol/L per hour. However, the data supporting this recommendation and the optimal rate of hypernatremia correction in hospitalized adults are unclear. DESIGN, SETTING, PARTICIPANTS, & MEASUREMENTS: We assessed the association of hypernatremia correction rates with neurologic outcomes and mortality in critically ill patients with hypernatremia at admission and those that developed hypernatremia during hospitalization. We used data from the Medical Information Mart for Intensive Care-III and identified patients with hypernatremia (serum sodium level >155 mmol/L) on admission (n=122) and hospital-acquired (n=327). We calculated different ranges of rapid correction rates (>0.5 mmol/L per hour overall and >8, >10, and >12 mmol/L per 24 hours) and utilized logistic regression to generate adjusted odds ratios (aOR) with 95% confidence intervals (95% CIs) to examine association with outcomes. RESULTS: We had complete data on 122 patients with severe hypernatremia on admission and 327 patients who developed hospital-acquired hypernatremia. The difference in in-hospital 30-day mortality proportion between rapid (>0.5 mmol/L per hour) and slower (≤0.5 mmol/L per hour) correction rates were not significant either in patients with hypernatremia at admission with rapid versus slow correction (25% versus 28%; P=0.80) or in patients with hospital-acquired hypernatremia with rapid versus slow correction (44% versus 40%; P=0.50). There was no difference in aOR of mortality for rapid versus slow correction in either admission (aOR, 1.3; 95% CI, 0.5 to 3.7) or hospital-acquired hypernatremia (aOR, 1.3; 95% CI, 0.8 to 2.3). Manual chart review of all suspected chronic hypernatremia patients, which included all 122 with hypernatremia at admission, 128 of the 327 hospital-acquired hypernatremia, and an additional 28 patients with ICD-9 codes for cerebral edema, seizures and/or alteration of consciousness, did not reveal a single case of cerebral edema attributable to rapid hyprnatremia correction. CONCLUSIONS: We did not find any evidence that rapid correction of hypernatremia is associated with a higher risk for mortality, seizure, alteration of consciousness, and/or cerebral edema in critically ill adult patients with either admission or hospital-acquired hypernatremia.
Asunto(s)
Enfermedad Crítica , Hipernatremia/terapia , Anciano , Anciano de 80 o más Años , Estudios de Cohortes , Femenino , Mortalidad Hospitalaria , Humanos , Hipernatremia/complicaciones , Hipernatremia/mortalidad , Masculino , Persona de Mediana Edad , Sodio/sangreRESUMEN
Achieving confidence in the causality of a disease locus is a complex task that often requires supporting data from both statistical genetics and clinical genomics. Here we describe a combined approach to identify and characterize a genetic disorder that leverages distantly related patients in a health system and population-scale mapping. We utilize genomic data to uncover components of distant pedigrees, in the absence of recorded pedigree information, in the multi-ethnic BioMe biobank in New York City. By linking to medical records, we discover a locus associated with both elevated genetic relatedness and extreme short stature. We link the gene, COL27A1, with a little-known genetic disease, previously thought to be rare and recessive. We demonstrate that disease manifests in both heterozygotes and homozygotes, indicating a common collagen disorder impacting up to 2% of individuals of Puerto Rican ancestry, leading to a better understanding of the continuum of complex and Mendelian disease.
Asunto(s)
Enfermedades del Colágeno/epidemiología , Enfermedades del Colágeno/genética , Colágenos Fibrilares/genética , Epidemiología Molecular , Linaje , Adolescente , Adulto , Anciano , Niño , Femenino , Genotipo , Heterocigoto , Hispánicos o Latinos , Homocigoto , Humanos , Masculino , Persona de Mediana Edad , Familia de Multigenes , Enfermedades Musculoesqueléticas/epidemiología , Enfermedades Musculoesqueléticas/genética , Ciudad de Nueva York/epidemiología , Ciudad de Nueva York/etnología , Secuenciación Completa del Genoma , Adulto JovenRESUMEN
Physicians have access to patient notes in volumes far greater than what is practical to read within the context of a standard clinical scenario. As a preliminary step toward being able to provide a longitudinal summary of patient history, methods are examined for the automated extraction of relevant patient problems from existing clinical notes. We explore a grounded approach to identifying important patient problems from patient history. Methods build on existing NLP and text-summarization methodologies and leverage features observed in a relevant corpus.
Asunto(s)
Registros Electrónicos de Salud , Médicos , Humanos , Procesamiento de Lenguaje NaturalRESUMEN
In the interest of designing an automated high-level, longitudinal clinical summary of a patient record, we analyze traditional ways in which medical problems pertaining to the patient are summarized in the electronic health record. The patient problem list has become a commonly used proxy for a summary of patient history and automated methods have been proposed to generate it. However, little research has been conducted on how to structure the problem list in a manner most effective for supporting clinical care. This study analyzes the structure and content of the Past Medical History (PMH) sections of a large corpus of clinical notes, as a proxy for problem lists. Findings show that when listing patients history, physicians convey several semantic types of information, not only problems. Furthermore, they often group related concepts in a single line of the PMH. In contrast, traditional problem lists allow only a simple enumeration of coded terms. Content analysis goes on to reiterate the value of more complex representations as well as provide valuable data and guidelines for automated generation of a clinical summary.
Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Anamnesis/métodos , Sistemas de Registros Médicos Computarizados/estadística & datos numéricos , Registros Médicos Orientados a Problemas/estadística & datos numéricos , Procesamiento de Lenguaje Natural , Reconocimiento de Normas Patrones Automatizadas/métodos , Algoritmos , Inteligencia Artificial , Protocolos Clínicos , New York , DescriptoresRESUMEN
OBJECTIVE: To develop an electronic health record that facilitates rapid capture of detailed narrative observations from clinicians, with partial structuring of narrative information for integration and reuse. DESIGN: We propose a design in which unstructured text and coded data are fused into a single model called structured narrative. Each major clinical event (e.g., encounter or procedure) is represented as a document that is marked up to identify gross structure (sections, fields, paragraphs, lists) as well as fine structure within sentences (concepts, modifiers, relationships). Marked up items are associated with standardized codes that enable linkage to other events, as well as efficient reuse of information, which can speed up data entry by clinicians. Natural language processing is used to identify fine structure, which can reduce the need for form-based entry. VALIDATION: The model is validated through an example of use by a clinician, with discussion of relevant aspects of the user interface, data structures and processing rules. DISCUSSION: The proposed model represents all patient information as documents with standardized gross structure (templates). Clinicians enter their data as free text, which is coded by natural language processing in real time making it immediately usable for other computation, such as alerts or critiques. In addition, the narrative data annotates and augments structured data with temporal relations, severity and degree modifiers, causal connections, clinical explanations and rationale. CONCLUSION: Structured narrative has potential to facilitate capture of data directly from clinicians by allowing freedom of expression, giving immediate feedback, supporting reuse of clinical information and structuring data for subsequent processing, such as quality assurance and clinical research.
Asunto(s)
Sistemas de Registros Médicos Computarizados , Procesamiento de Lenguaje Natural , Interfaz Usuario-Computador , Documentación , Humanos , Almacenamiento y Recuperación de la Información/métodos , Anamnesis , Programas Informáticos , Integración de Sistemas , Vocabulario ControladoRESUMEN
Clinicians perform many tasks in their daily work requiring summarization of clinical data. However, as technology makes more data available, the challenges of data overload become ever more significant. As interoperable data exchange between hospitals becomes more common, there is an increased need for tools to summarize information. Our goal is to develop automated tools to aid clinical data summarization. Structured interviews were conducted on physicians to identify information from an electronic health record they considered relevant to explaining the patients medical history. Desirable data types were systematically evaluated using qualitative and quantitative analysis to assess data categories and patterns of data use. We report here on the implications of these results for the design of automated tools for summarization of patient history.