Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 210
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Biomed Inform ; 153: 104642, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38621641

RESUMO

OBJECTIVE: To develop a natural language processing (NLP) package to extract social determinants of health (SDoH) from clinical narratives, examine the bias among race and gender groups, test the generalizability of extracting SDoH for different disease groups, and examine population-level extraction ratio. METHODS: We developed SDoH corpora using clinical notes identified at the University of Florida (UF) Health. We systematically compared 7 transformer-based large language models (LLMs) and developed an open-source package - SODA (i.e., SOcial DeterminAnts) to facilitate SDoH extraction from clinical narratives. We examined the performance and potential bias of SODA for different race and gender groups, tested the generalizability of SODA using two disease domains including cancer and opioid use, and explored strategies for improvement. We applied SODA to extract 19 categories of SDoH from the breast (n = 7,971), lung (n = 11,804), and colorectal cancer (n = 6,240) cohorts to assess patient-level extraction ratio and examine the differences among race and gender groups. RESULTS: We developed an SDoH corpus using 629 clinical notes of cancer patients with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH, and another cross-disease validation corpus using 200 notes from opioid use patients with 4,342 SDoH concepts/attributes. We compared 7 transformer models and the GatorTron model achieved the best mean average strict/lenient F1 scores of 0.9122 and 0.9367 for SDoH concept extraction and 0.9584 and 0.9593 for linking attributes to SDoH concepts. There is a small performance gap (∼4%) between Males and Females, but a large performance gap (>16 %) among race groups. The performance dropped when we applied the cancer SDoH model to the opioid cohort; fine-tuning using a smaller opioid SDoH corpus improved the performance. The extraction ratio varied in the three cancer cohorts, in which 10 SDoH could be extracted from over 70 % of cancer patients, but 9 SDoH could be extracted from less than 70 % of cancer patients. Individuals from the White and Black groups have a higher extraction ratio than other minority race groups. CONCLUSIONS: Our SODA package achieved good performance in extracting 19 categories of SDoH from clinical narratives. The SODA package with pre-trained transformer models is available at https://github.com/uf-hobi-informatics-lab/SODA_Docker.


Assuntos
Narração , Processamento de Linguagem Natural , Determinantes Sociais da Saúde , Humanos , Feminino , Masculino , Viés , Registros Eletrônicos de Saúde , Documentação/métodos , Mineração de Dados/métodos
2.
Psychol Med ; 53(6): 2634-2642, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-34763736

RESUMO

BACKGROUND: Several social determinants of health (SDoH) have been associated with the onset of major depressive disorder (MDD). However, prior studies largely focused on individual SDoH and thus less is known about the relative importance (RI) of SDoH variables, especially in older adults. Given that risk factors for MDD may differ across the lifespan, we aimed to identify the SDoH that was most strongly related to newly diagnosed MDD in a cohort of older adults. METHODS: We used self-reported health-related survey data from 41 174 older adults (50-89 years, median age = 67 years) who participated in the Mayo Clinic Biobank, and linked ICD codes for MDD in the participants' electronic health records. Participants with a history of clinically documented or self-reported MDD prior to survey completion were excluded from analysis (N = 10 938, 27%). We used Cox proportional hazards models with a gradient boosting machine approach to quantify the RI of 30 pre-selected SDoH variables on the risk of future MDD diagnosis. RESULTS: Following biobank enrollment, 2073 older participants were diagnosed with MDD during the follow-up period (median duration = 6.7 years). The most influential SDoH was perceived level of social activity (RI = 0.17). Lower level of social activity was associated with a higher risk of MDD [hazard ratio = 2.27 (95% CI 2.00-2.50) for highest v. lowest level]. CONCLUSION: Across a range of SDoH variables, perceived level of social activity is most strongly related to MDD in older adults. Monitoring changes in the level of social activity may help identify older adults at an increased risk of MDD.


Assuntos
Transtorno Depressivo Maior , Humanos , Idoso , Transtorno Depressivo Maior/diagnóstico , Depressão , Fatores de Risco , Determinantes Sociais da Saúde
3.
Psychol Med ; 53(15): 7368-7374, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38078748

RESUMO

BACKGROUND: Depression and anxiety are common and highly comorbid, and their comorbidity is associated with poorer outcomes posing clinical and public health concerns. We evaluated the polygenic contribution to comorbid depression and anxiety, and to each in isolation. METHODS: Diagnostic codes were extracted from electronic health records for four biobanks [N = 177 865 including 138 632 European (77.9%), 25 612 African (14.4%), and 13 621 Hispanic (7.7%) ancestry participants]. The outcome was a four-level variable representing the depression/anxiety diagnosis group: neither, depression-only, anxiety-only, and comorbid. Multinomial regression was used to test for association of depression and anxiety polygenic risk scores (PRSs) with the outcome while adjusting for principal components of ancestry. RESULTS: In total, 132 960 patients had neither diagnosis (74.8%), 16 092 depression-only (9.0%), 13 098 anxiety-only (7.4%), and 16 584 comorbid (9.3%). In the European meta-analysis across biobanks, both PRSs were higher in each diagnosis group compared to controls. Notably, depression-PRS (OR 1.20 per s.d. increase in PRS; 95% CI 1.18-1.23) and anxiety-PRS (OR 1.07; 95% CI 1.05-1.09) had the largest effect when the comorbid group was compared with controls. Furthermore, the depression-PRS was significantly higher in the comorbid group than the depression-only group (OR 1.09; 95% CI 1.06-1.12) and the anxiety-only group (OR 1.15; 95% CI 1.11-1.19) and was significantly higher in the depression-only group than the anxiety-only group (OR 1.06; 95% CI 1.02-1.09), showing a genetic risk gradient across the conditions and the comorbidity. CONCLUSIONS: This study suggests that depression and anxiety have partially independent genetic liabilities and the genetic vulnerabilities to depression and anxiety make distinct contributions to comorbid depression and anxiety.


Assuntos
Depressão , Registros Eletrônicos de Saúde , Humanos , Ansiedade/epidemiologia , Ansiedade/genética , Transtornos de Ansiedade/epidemiologia , Transtornos de Ansiedade/genética , Comorbidade , Depressão/epidemiologia , Depressão/genética , Herança Multifatorial , Fatores de Risco
4.
J Biomed Inform ; 144: 104442, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37429512

RESUMO

OBJECTIVE: We develop a deep learning framework based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model using unstructured clinical notes from electronic health records (EHRs) to predict the risk of disease progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). METHODS: We identified 3657 patients diagnosed with MCI together with their progress notes from Northwestern Medicine Enterprise Data Warehouse (NMEDW) between 2000 and 2020. The progress notes no later than the first MCI diagnosis were used for the prediction. We first preprocessed the notes by deidentification, cleaning and splitting into sections, and then pre-trained a BERT model for AD (named AD-BERT) based on the publicly available Bio+Clinical BERT on the preprocessed notes. All sections of a patient were embedded into a vector representation by AD-BERT and then combined by global MaxPooling and a fully connected network to compute the probability of MCI-to-AD progression. For validation, we conducted a similar set of experiments on 2563 MCI patients identified at Weill Cornell Medicine (WCM) during the same timeframe. RESULTS: Compared with the 7 baseline models, the AD-BERT model achieved the best performance on both datasets, with Area Under receiver operating characteristic Curve (AUC) of 0.849 and F1 score of 0.440 on NMEDW dataset, and AUC of 0.883 and F1 score of 0.680 on WCM dataset. CONCLUSION: The use of EHRs for AD-related research is promising, and AD-BERT shows superior predictive performance in modeling MCI-to-AD progression prediction. Our study demonstrates the utility of pre-trained language models and clinical notes in predicting MCI-to-AD progression, which could have important implications for improving early detection and intervention for AD.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Humanos , Doença de Alzheimer/diagnóstico , Disfunção Cognitiva/diagnóstico , Progressão da Doença
5.
Int J Eat Disord ; 56(8): 1581-1592, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37194359

RESUMO

OBJECTIVES: To describe and compare the association between suicidality and subsequent readmission for patients hospitalized for eating disorder treatment, within 2 years of discharge, at two large academic medical centers in two different countries. METHODS: Over an 8-year study window from January 2009 to March 2017, we identified all inpatient eating disorder admissions at Weill Cornell Medicine, New York, USA (WCM) and South London and Maudsley Foundation NHS Trust, London, UK (SLaM). To establish each patient's-suicidality profile, we applied two natural language processing (NLP) algorithms, independently developed at the two institutions, and detected suicidality in clinical notes documented in the first week of admission. We calculated the odds ratios (OR) for any subsequent readmission within 2 years postdischarge and determined whether this was to another eating disorder unit, other psychiatric unit, a general medical hospital admission or emergency room attendance. RESULTS: We identified 1126 and 420 eating disorder inpatient admissions at WCM and SLaM, respectively. In the WCM cohort, evidence of above average suicidality during the first week of admission was significantly associated with an increased risk of noneating disorder-related psychiatric readmission (OR 3.48 95% CI = 2.03-5.99, p-value < .001), but a similar pattern was not observed in the SLaM cohort (OR 1.34, 95% CI = 0.75-2.37, p = .32), there was no significant increase in risk of admission. In both cohorts, personality disorder increased the risk of any psychiatric readmission within 2 years. DISCUSSION: Patterns of increased risk of psychiatric readmission from above average suicidality detected via NLP during inpatient eating disorder admissions differed in our two patient cohorts. However, comorbid diagnoses such as personality disorder increased the risk of any psychiatric readmission across both cohorts. PUBLIC SIGNIFICANCE: Suicidality amongst is eating disorders is an extremely common presentation and it is important we further our understanding of identifying those most at risk. This research also provides a novel study design, comparing two NLP algorithms on electronic health record data based in the United States and United Kingdom on eating disorder inpatients. Studies researching both UK and US mental health patients are sparse therefore this study provides novel data.


Assuntos
Transtornos da Alimentação e da Ingestão de Alimentos , Suicídio , Humanos , Readmissão do Paciente , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Assistência ao Convalescente , Alta do Paciente
6.
BMC Health Serv Res ; 23(1): 621, 2023 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-37312121

RESUMO

BACKGROUND: A significant number of late middle-aged adults with depression have a high illness burden resulting from chronic conditions which put them at high risk of hospitalization. Many late middle-aged adults are covered by commercial health insurance, but such insurance claims have not been used to identify the risk of hospitalization in individuals with depression. In the present study, we developed and validated a non-proprietary model to identify late middle-aged adults with depression at risk for hospitalization, using machine learning methods. METHODS: This retrospective cohort study involved 71,682 commercially insured older adults aged 55-64 years diagnosed with depression. National health insurance claims were used to capture demographics, health care utilization, and health status during the base year. Health status was captured using 70 chronic health conditions, and 46 mental health conditions. The outcomes were 1- and 2-year preventable hospitalization. For each of our two outcomes, we evaluated seven modelling approaches: four prediction models utilized logistic regression with different combinations of predictors to evaluate the relative contribution of each group of variables, and three prediction models utilized machine learning approaches - logistic regression with LASSO penalty, random forests (RF), and gradient boosting machine (GBM). RESULTS: Our predictive model for 1-year hospitalization achieved an AUC of 0.803, with a sensitivity of 72% and a specificity of 76% under the optimum threshold of 0.463, and our predictive model for 2-year hospitalization achieved an AUC of 0.793, with a sensitivity of 76% and a specificity of 71% under the optimum threshold of 0.452. For predicting both 1-year and 2-year risk of preventable hospitalization, our best performing models utilized the machine learning approach of logistic regression with LASSO penalty which outperformed more black-box machine learning models like RF and GBM. CONCLUSIONS: Our study demonstrates the feasibility of identifying depressed middle-aged adults at higher risk of future hospitalization due to burden of chronic illnesses using basic demographic information and diagnosis codes recorded in health insurance claims. Identifying this population may assist health care planners in developing effective screening strategies and management approaches and in efficient allocation of public healthcare resources as this population transitions to publicly funded healthcare programs, e.g., Medicare in the US.


Assuntos
Depressão , Medicare , Estados Unidos/epidemiologia , Pessoa de Meia-Idade , Humanos , Idoso , Depressão/diagnóstico , Depressão/epidemiologia , Estudos Retrospectivos , Hospitalização , Medição de Risco
7.
Brief Bioinform ; 20(4): 1308-1321, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-29304188

RESUMO

Recent advances in biomedical research have generated a large volume of drug-related data. To effectively handle this flood of data, many initiatives have been taken to help researchers make good use of them. As the results of these initiatives, many drug knowledge bases have been constructed. They range from simple ones with specific focuses to comprehensive ones that contain information on almost every aspect of a drug. These curated drug knowledge bases have made significant contributions to the development of efficient and effective health information technologies for better health-care service delivery. Understanding and comparing existing drug knowledge bases and how they are applied in various biomedical studies will help us recognize the state of the art and design better knowledge bases in the future. In addition, researchers can get insights on novel applications of the drug knowledge bases through a review of successful use cases. In this study, we provide a review of existing popular drug knowledge bases and their applications in drug-related studies. We discuss challenges in constructing and using drug knowledge bases as well as future research directions toward a better ecosystem of drug knowledge bases.


Assuntos
Bases de Dados de Produtos Farmacêuticos , Bases de Conhecimento , Algoritmos , Biologia Computacional/métodos , Biologia Computacional/tendências , Mineração de Dados , Bases de Dados de Produtos Farmacêuticos/estatística & dados numéricos , Bases de Dados de Produtos Farmacêuticos/tendências , Desenvolvimento de Medicamentos , Interações Medicamentosas , Reposicionamento de Medicamentos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Aprendizado de Máquina , Testes Farmacogenômicos , Mídias Sociais , Integração de Sistemas
8.
Med Care ; 59: S58-S64, 2021 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-33438884

RESUMO

BACKGROUND: Suicide prevention is a public health priority, but risk factors for suicide after medical hospitalization remain understudied. This problem is critical for women, for whom suicide rates in the United States are disproportionately increasing. OBJECTIVE: To differentiate the risk of suicide attempt and self-harm following general medical hospitalization among women with depression, bipolar disorder, and chronic psychosis. METHODS: We developed a machine learning algorithm that identified risk factors of suicide attempt and self-harm after general hospitalization using electronic health record data from 1628 women in the University of California Los Angeles Integrated Clinical and Research Data Repository. To assess replicability, we applied the algorithm to a larger sample of 140,848 women in the New York City Clinical Data Research Network. RESULTS: The classification tree algorithm identified risk groups in University of California Los Angeles Integrated Clinical and Research Data Repository (area under the curve 0.73, sensitivity 73.4, specificity 84.1, accuracy 0.84), and predictor combinations characterizing key risk groups were replicated in New York City Clinical Data Research Network (area under the curve 0.71, sensitivity 83.3, specificity 82.2, and accuracy 0.84). Predictors included medical comorbidity, history of pregnancy-related mental illness, age, and history of suicide-related behavior. Women with antecedent medical illness and history of pregnancy-related mental illness were at high risk (6.9%-17.2% readmitted for suicide-related behavior), as were women below 55 years old without antecedent medical illness (4.0%-7.5% readmitted). CONCLUSIONS: Prevention of suicide attempt and self-harm among women following acute medical illness may be improved by screening for sex-specific predictors including perinatal mental health history.


Assuntos
Hospitalização , Transtornos Mentais/psicologia , Comportamento Autodestrutivo/psicologia , Tentativa de Suicídio/psicologia , Aprendizado de Máquina Supervisionado , Mulheres/psicologia , Adulto , Idoso , Algoritmos , Estudos de Coortes , Registros Eletrônicos de Saúde , Feminino , Humanos , Pessoa de Meia-Idade , Readmissão do Paciente , Reprodutibilidade dos Testes , Estudos Retrospectivos , Fatores de Risco , Sensibilidade e Especificidade , Adulto Jovem
9.
BMC Pregnancy Childbirth ; 21(1): 630, 2021 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-34535116

RESUMO

BACKGROUND: Postpartum depression is a widespread disorder, adversely affecting the well-being of mothers and their newborns. We aim to utilize machine learning for predicting risk of postpartum depression (PPD) using primary care electronic health records (EHR) data, and to evaluate the potential value of EHR-based prediction in improving the accuracy of PPD screening and in early identification of women at risk. METHODS: We analyzed EHR data of 266,544 women from the UK who gave first live birth between 2000 and 2017. We extracted a multitude of socio-demographic and medical variables and constructed a machine learning model that predicts the risk of PPD during the year following childbirth. We evaluated the model's performance using multiple validation methodologies and measured its accuracy as a stand-alone tool and as an adjunct to the standard questionnaire-based screening by Edinburgh postnatal depression scale (EPDS). RESULTS: The prevalence of PPD in the analyzed cohort was 13.4%. Combing EHR-based prediction with EPDS score increased the area under the receiver operator characteristics curve (AUC) from 0.805 to 0.844 and the sensitivity from 0.72 to 0.76, at specificity of 0.80. The AUC of the EHR-based prediction model alone varied from 0.72 to 0.74 and decreased by only 0.01-0.02 when applied as early as before the beginning of pregnancy. CONCLUSIONS: PPD risk prediction using EHR data may provide a complementary quantitative and objective tool for PPD screening, allowing earlier (pre-pregnancy) and more accurate identification of women at risk, timely interventions and potentially improved outcomes for the mother and child.


Assuntos
Depressão Pós-Parto/epidemiologia , Medição de Risco/métodos , Adolescente , Adulto , Área Sob a Curva , Estudos de Coortes , Registros Eletrônicos de Saúde , Feminino , Humanos , Aprendizado de Máquina , Pessoa de Meia-Idade , Gravidez , Fatores de Risco , Reino Unido/epidemiologia , Adulto Jovem
10.
BMC Pregnancy Childbirth ; 21(1): 599, 2021 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-34481472

RESUMO

BACKGROUNDS: Risk factors related to the built environment have been associated with women's mental health and preventive care. This study sought to identify built environment factors that are associated with variations in prenatal care and subsequent pregnancy-related outcomes in an urban setting. METHODS: In a retrospective observational study, we characterized the types and frequency of prenatal care events that are associated with the various built environment factors of the patients' residing neighborhoods. In comparison to women living in higher-quality built environments, we hypothesize that women who reside in lower-quality built environments experience different patterns of clinical events that may increase the risk for adverse outcomes. Using machine learning, we performed pattern detection to characterize the variability in prenatal care concerning encounter types, clinical problems, and medication prescriptions. Structural equation modeling was used to test the associations among built environment, prenatal care variation, and pregnancy outcome. The main outcome is postpartum depression (PPD) diagnosis within 1 year following childbirth. The exposures were the quality of the built environment in the patients' residing neighborhoods. Electronic health records (EHR) data of pregnant women (n = 8,949) who had live delivery at an urban academic medical center from 2015 to 2017 were included in the study. RESULTS: We discovered prenatal care patterns that were summarized into three common types. Women who experienced the prenatal care pattern with the highest rates of PPD were more likely to reside in neighborhoods with homogeneous land use, lower walkability, lower air pollutant concentration, and lower retail floor ratios after adjusting for age, neighborhood average education level, marital status, and income inequality. CONCLUSIONS: In an urban setting, multi-purpose and walkable communities were found to be associated with a lower risk of PPD. Findings may inform urban design policies and provide awareness for care providers on the association of patients' residing neighborhoods and healthy pregnancy.


Assuntos
Ambiente Construído/estatística & dados numéricos , Depressão Pós-Parto/epidemiologia , Cuidado Pré-Natal/estatística & dados numéricos , Características de Residência/estatística & dados numéricos , População Urbana/estatística & dados numéricos , Adulto , Depressão Pós-Parto/diagnóstico , Feminino , Humanos , Aprendizado de Máquina , Saúde Mental , Cidade de Nova Iorque/epidemiologia , Gravidez , Resultado da Gravidez , Gestantes , Estudos Retrospectivos , Saúde da Mulher , Adulto Jovem
11.
J Card Fail ; 26(12): 1060-1066, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32755626

RESUMO

BACKGROUND: There is interest in leveraging the electronic medical records (EMRs) to improve knowledge and understanding of patients' characteristics and outcomes of patients with ambulatory heart failure (HF). However, the diagnostic performance of International Classification of Diseases (ICD) -10 diagnosis codes from the EMRs for patients with HF and with reduced or preserved ejection fraction (HFrEF or HFpEF) in the ambulatory setting are unknown. METHODS: We examined a cohort of patients aged ≥ 18 with at least 1 outpatient encounter for HF between January 2016 and June 2018 and an echocardiogram conducted within 180 days of the outpatient encounter for HF. We defined HFrEF encounters as those with ICD-10 codes of I50.2x (systolic heart failure); and we defined HFpEF encounters as those with ICD-10 codes of I50.3x (diastolic heart failure). The referent definitions of HFrEF and HFpEF were based on echocardiograms conducted within 180 days of the ambulatory encounter for HF RESULTS: We examined 68,952 encounters of 14,796 unique patients with HF. The diagnostic performance parameters for HFrEF (based on ICD-10 I50.2x only) depended on LVEF cutoff, with a sensitivity ranging from 68%-72%, specificity 63%-68%, positive predictive value 47%-63%, and negative predictive value 73%-84%. The diagnostic performance parameters for HFpEF depended on left ventricular ejection fraction cut-off, with sensitivity ranging from 34%-39%, specificity 92%-94%, positive predictive value 86%-93%, and negative predictive value 39%-54%. CONCLUSIONS: ICD-10 coding abstracted from the EMR for HFrEF vs HFpEF in the ambulatory setting had suboptimal diagnostic performance and, thus, should not be used alone to examine HFrEF and HFpEF in the ambulatory setting.


Assuntos
Insuficiência Cardíaca , Registros Eletrônicos de Saúde , Insuficiência Cardíaca/diagnóstico , Insuficiência Cardíaca/epidemiologia , Humanos , Prognóstico , Volume Sistólico , Função Ventricular Esquerda
12.
J Biomed Inform ; 102: 103361, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31911172

RESUMO

Acute Kidney Injury (AKI) is a common clinical syndrome characterized by the rapid loss of kidney excretory function, which aggravates the clinical severity of other diseases in a large number of hospitalized patients. Accurate early prediction of AKI can enable in-time interventions and treatments. However, AKI is highly heterogeneous, thus identification of AKI sub-phenotypes can lead to an improved understanding of the disease pathophysiology and development of more targeted clinical interventions. This study used a memory network-based deep learning approach to discover AKI sub-phenotypes using structured and unstructured electronic health record (EHR) data of patients before AKI diagnosis. We leveraged a real world critical care EHR corpus including 37,486 ICU stays. Our approach identified three distinct sub-phenotypes: sub-phenotype I is with an average age of 63.03±17.25 years, and is characterized by mild loss of kidney excretory function (Serum Creatinine (SCr) 1.55±0.34 mg/dL, estimated Glomerular Filtration Rate Test (eGFR) 107.65±54.98 mL/min/1.73 m2). These patients are more likely to develop stage I AKI. Sub-phenotype II is with average age 66.81±10.43 years, and was characterized by severe loss of kidney excretory function (SCr 1.96±0.49 mg/dL, eGFR 82.19±55.92 mL/min/1.73 m2). These patients are more likely to develop stage III AKI. Sub-phenotype III is with average age 65.07±11.32 years, and was characterized moderate loss of kidney excretory function and thus more likely to develop stage II AKI (SCr 1.69±0.32 mg/dL, eGFR 93.97±56.53 mL/min/1.73 m2). Both SCr and eGFR are significantly different across the three sub-phenotypes with statistical testing plus postdoc analysis, and the conclusion still holds after age adjustment.


Assuntos
Injúria Renal Aguda , Registros Eletrônicos de Saúde , Injúria Renal Aguda/diagnóstico , Idoso , Creatinina , Taxa de Filtração Glomerular , Humanos , Pessoa de Meia-Idade , Fenótipo
14.
J Biomed Inform ; 99: 103310, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31622801

RESUMO

BACKGROUND: Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR). METHODS: We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources - Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database. RESULTS: Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals' judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models. CONCLUSIONS: The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.


Assuntos
Registros Eletrônicos de Saúde/classificação , Interoperabilidade da Informação em Saúde , Obesidade/epidemiologia , Alta do Paciente , Adulto , Algoritmos , Índice de Massa Corporal , Comorbidade , Feminino , Humanos , Aprendizado de Máquina , Masculino , Fenótipo , Software
15.
BMC Med Inform Decis Mak ; 19(Suppl 3): 78, 2019 04 04.
Artigo em Inglês | MEDLINE | ID: mdl-30943974

RESUMO

BACKGROUND: This paper presents a portable phenotyping system that is capable of integrating both rule-based and statistical machine learning based approaches. METHODS: Our system utilizes UMLS to extract clinically relevant features from the unstructured text and then facilitates portability across different institutions and data systems by incorporating OHDSI's OMOP Common Data Model (CDM) to standardize necessary data elements. Our system can also store the key components of rule-based systems (e.g., regular expression matches) in the format of OMOP CDM, thus enabling the reuse, adaptation and extension of many existing rule-based clinical NLP systems. We experimented with our system on the corpus from i2b2's Obesity Challenge as a pilot study. RESULTS: Our system facilitates portable phenotyping of obesity and its 15 comorbidities based on the unstructured patient discharge summaries, while achieving a performance that often ranked among the top 10 of the challenge participants. CONCLUSION: Our system of standardization enables a consistent application of numerous rule-based and machine learning based classification techniques downstream across disparate datasets which may originate across different institutions and data systems.


Assuntos
Armazenamento e Recuperação da Informação , Aprendizado de Máquina , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação/métodos , Obesidade , Projetos Piloto
16.
J Biomed Inform ; 87: 88-95, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30300713

RESUMO

OBJECTIVE: We present a method for comparing association networks in a matched case-control design, which provides a high-level comparison of co-occurrence patterns of features after adjusting for confounding factors. We demonstrate this approach by examining the differential distribution of chronic medical conditions in patients with major depressive disorder (MDD) compared to the distribution of these conditions in their matched controls. MATERIALS AND METHODS: Newly diagnosed MDD patients were matched to controls based on their demographic characteristics, socioeconomic status, place of residence, and healthcare service utilization in the Korean National Health Insurance Service's National Sample Cohort. Differences in the networks of chronic medical conditions in newly diagnosed MDD cases treated with antidepressants, and their matched controls, were prioritized with a permutation test accounting for the false discovery rate. Sensitivity analyses for the associations between prioritized pairs of chronic medical conditions and new MDD diagnosis were performed with regression modeling. RESULTS: By comparing the association networks of chronic medical conditions in newly diagnosed depression patients and their matched controls, five pairs of such conditions were prioritized among 105 possible pairs after controlling the false discovery rate at 5%. In sensitivity analyses using regression modeling, four out of the five prioritized pairs were statistically significant for the interaction terms. CONCLUSION: Association networks in a matched case-control design can provide a high-level comparison of comorbid features after adjusting for confounding factors, thereby supplementing traditional clinical study approaches. We demonstrate the differential co-occurrence pattern of chronic medical conditions in patients with MDD and prioritize the chronic conditions that have statistically significant interactions in regression models for depression.


Assuntos
Antidepressivos/farmacologia , Comorbidade , Transtorno Depressivo Maior/complicações , Transtorno Depressivo Maior/epidemiologia , Adulto , Idoso , Estudos de Casos e Controles , Doença Crônica/terapia , Estudos de Coortes , Coleta de Dados , Mineração de Dados/métodos , Transtorno Depressivo Maior/diagnóstico , Reações Falso-Positivas , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Análise de Regressão , República da Coreia , Classe Social
17.
Am Heart J ; 187: 70-77, 2017 May.
Artigo em Inglês | MEDLINE | ID: mdl-28454810

RESUMO

BACKGROUND: Achieving a therapeutic international normalized ratio (INR) before hospital discharge is an important inpatient goal for patients undergoing mechanical cardiac valve replacement (MCVR). The use of clinical algorithms has reduced the time to achieve therapeutic INR (TTI) with warfarin therapy. Whether TTI prolongs length of stay (LOS) is unknown. METHODS: Patients who underwent MCVR over a consecutive 42-month period were included. Clinical data were obtained from the Society of Thoracic Surgeons Adult Cardiac Surgery database and electronic medical records. Therapeutic INR was defined as per standard guidelines. Warfarin dose was prescribed using an inpatient pharmacy-managed algorithm and computer-based dosing tool. International normalized ratio trajectory, procedural needs, and drug interactions were included in warfarin dose determination. RESULTS: There were 708 patients who underwent MCVR, of which 159 were excluded for reasons that would preclude or interrupt warfarin use. Among the remainder of 549 patients, the average LOS was 6.4days and mean TTI was 3.5days. Landmark analysis showed that subjects in hospital on day 4 (n=542) who achieved therapeutic INR were more likely to be discharged by day 6 compared with those who did not achieve therapeutic INR (75% vs 59%, P<.001). Multivariable proportional hazards regression with TTI as a time-dependent effect showed a strong association with discharge (P=.0096, hazard ratio1.3) after adjustment for other significant clinical covariates. CONCLUSIONS: Time to achieve therapeutic INR is an independent predictor of LOS in patients requiring anticoagulation with warfarin after MCVR surgery. Alternative dosing and anticoagulation strategies will need to be adopted to reduce LOS in these patients.


Assuntos
Anticoagulantes/uso terapêutico , Monitoramento de Medicamentos/métodos , Implante de Prótese de Valva Cardíaca , Coeficiente Internacional Normatizado , Tempo de Internação , Varfarina/uso terapêutico , Algoritmos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
18.
J Biomed Inform ; 60: 260-9, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26844760

RESUMO

Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) with the probabilistic loss function, to develop and validate prognostic risk models to predict 1, 2, and 5year survival in heart failure (HF) using data from electronic health records (EHRs) at Mayo Clinic. The CPXR(Log) constructs a pattern aided logistic regression model defined by several patterns and corresponding local logistic regression models. One of the models generated by CPXR(Log) achieved an AUC and accuracy of 0.94 and 0.91, respectively, and significantly outperformed prognostic models reported in prior studies. Data extracted from EHRs allowed incorporation of patient co-morbidities into our models which helped improve the performance of the CPXR(Log) models (15.9% AUC improvement), although did not improve the accuracy of the models built by other classifiers. We also propose a probabilistic loss function to determine the large error and small error instances. The new loss function used in the algorithm outperforms other functions used in the previous studies by 1% improvement in the AUC. This study revealed that using EHR data to build prediction models can be very challenging using existing classification methods due to the high dimensionality and complexity of EHR data. The risk models developed by CPXR(Log) also reveal that HF is a highly heterogeneous disease, i.e., different subgroups of HF patients require different types of considerations with their diagnosis and treatment. Our risk models provided two valuable insights for application of predictive modeling techniques in biomedicine: Logistic risk models often make systematic prediction errors, and it is prudent to use subgroup based prediction models such as those given by CPXR(Log) when investigating heterogeneous diseases.


Assuntos
Registros Eletrônicos de Saúde , Insuficiência Cardíaca/diagnóstico , Informática Médica/métodos , Idoso , Algoritmos , Área Sob a Curva , Estudos de Coortes , Comorbidade , Reações Falso-Positivas , Feminino , Humanos , Modelos Lineares , Modelos Logísticos , Masculino , Pessoa de Meia-Idade , Probabilidade , Prognóstico , Curva ROC , Análise de Regressão , Reprodutibilidade dos Testes , Risco
19.
J Biomed Inform ; 62: 232-42, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-27392645

RESUMO

The Quality Data Model (QDM) is an information model developed by the National Quality Forum for representing electronic health record (EHR)-based electronic clinical quality measures (eCQMs). In conjunction with the HL7 Health Quality Measures Format (HQMF), QDM contains core elements that make it a promising model for representing EHR-driven phenotype algorithms for clinical research. However, the current QDM specification is available only as descriptive documents suitable for human readability and interpretation, but not for machine consumption. The objective of the present study is to develop and evaluate a data element repository (DER) for providing machine-readable QDM data element service APIs to support phenotype algorithm authoring and execution. We used the ISO/IEC 11179 metadata standard to capture the structure for each data element, and leverage Semantic Web technologies to facilitate semantic representation of these metadata. We observed there are a number of underspecified areas in the QDM, including the lack of model constraints and pre-defined value sets. We propose a harmonization with the models developed in HL7 Fast Healthcare Interoperability Resources (FHIR) and Clinical Information Modeling Initiatives (CIMI) to enhance the QDM specification and enable the extensibility and better coverage of the DER. We also compared the DER with the existing QDM implementation utilized within the Measure Authoring Tool (MAT) to demonstrate the scalability and extensibility of our DER-based approach.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Fenótipo , Pesquisa Biomédica , Bases de Dados Factuais , Humanos , Semântica
20.
BMC Psychiatry ; 16: 114, 2016 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-27112538

RESUMO

BACKGROUND: Major depressive disorder (MDD) is often comorbid with other chronic mental and physical health conditions. Although the literature widely acknowledges the association of many chronic conditions with the risk of MDD, the relative importance of these conditions on MDD risk in the presence of other conditions is not well investigated. In this study, we aimed to quantify the relative contribution of selected chronic conditions to identify the conditions most influential to MDD risk in adults and identify differences by age. METHODS: This study used electronic health record (EHR) data on patients empanelled with primary care at Mayo Clinic in June 2013. A validated EHR-based algorithm was applied to identify newly diagnosed MDD patients between 2000 and 2013. Non-MDD controls were matched 1:1 to MDD cases on birth year (±2 years), sex, and outpatient clinic visits in the same year of MDD case diagnosis. Twenty-four chronic conditions defined by Chronic Conditions Data Warehouse were ascertained in both cases and controls using diagnosis codes within 5 years of index dates (diagnosis dates for cases, and the first clinic visit dates for matched controls). For each age group (45 years or younger, between 46 and 60, and over 60 years), conditional logistic regression models were used to test the association between each condition and subsequent MDD risk, adjusting for educational attainment and obesity. The relative influence of these conditions on the risk of MDD was quantified using gradient boosting machine models. RESULTS: A total of 11,375 incident MDD cases were identified between 2000 and 2013. Most chronic conditions (except for eye conditions) were associated with risk of MDD, with different association patterns observed depending on age. Among 24 chronic conditions, the greatest relative contribution was observed for diabetes mellitus for subjects aged ≤ 60 years and rheumatoid arthritis/osteoarthritis for those over 60 years. CONCLUSIONS: Our results suggest that specific chronic conditions such as diabetes mellitus and rheumatoid arthritis/osteoarthritis may have greater influence than others on the risk of MDD.


Assuntos
Transtorno Depressivo Maior/diagnóstico , Transtorno Depressivo Maior/epidemiologia , Registros Eletrônicos de Saúde/estatística & dados numéricos , Atenção Primária à Saúde/normas , Adulto , Idoso , Doença Crônica/epidemiologia , Estudos de Coortes , Comorbidade , Feminino , Humanos , Modelos Logísticos , Masculino , Pessoa de Meia-Idade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA