RESUMEN
BACKGROUND: Individuals of South Asian ancestry represent 23% of the global population, corresponding to 1.8 billion people, and have substantially higher risk of atherosclerotic cardiovascular disease compared with most other ethnicities. US practice guidelines now recognize South Asian ancestry as an important risk-enhancing factor. The magnitude of enhanced risk within the context of contemporary clinical care, the extent to which it is captured by existing risk estimators, and its potential mechanisms warrant additional study. METHODS: Within the UK Biobank prospective cohort study, 8124 middle-aged participants of South Asian ancestry and 449 349 participants of European ancestry who were free of atherosclerotic cardiovascular disease at the time of enrollment were examined. The relationship of ancestry to risk of incident atherosclerotic cardiovascular disease-defined as myocardial infarction, coronary revascularization, or ischemic stroke-was assessed with Cox proportional hazards regression, along with examination of a broad range of clinical, anthropometric, and lifestyle mediators. RESULTS: The mean age at study enrollment was 57 years, and 202 405 (44%) were male. Over a median follow-up of 11 years, 554 of 8124 (6.8%) individuals of South Asian ancestry experienced an atherosclerotic cardiovascular disease event compared with 19 756 of 449 349 (4.4%) individuals of European ancestry, corresponding to an adjusted hazard ratio of 2.03 (95% CI, 1.86-2.22; P<0.001). This higher relative risk was largely consistent across a range of age, sex, and clinical subgroups. Despite the >2-fold higher observed risk, the predicted 10-year risk of cardiovascular disease according to the American Heart Association/American College of Cardiology Pooled Cohort equations and QRISK3 equations was nearly identical for individuals of South Asian and European ancestry. Adjustment for a broad range of clinical, anthropometric, and lifestyle risk factors led to only modest attenuation of the observed hazard ratio to 1.45 (95% CI, 1.28-1.65, P<0.001). Assessment of variance explained by 18 candidate risk factors suggested greater importance of hypertension, diabetes, and central adiposity in South Asian individuals. CONCLUSIONS: Within a large prospective study, South Asian individuals had substantially higher risk of atherosclerotic cardiovascular disease compared with individuals of European ancestry, and this risk was not captured by the Pooled Cohort Equations.
Asunto(s)
Pueblo Asiatico , Aterosclerosis/epidemiología , Aterosclerosis/etiología , Adulto , Anciano , Bancos de Muestras Biológicas , Susceptibilidad a Enfermedades , Femenino , Estudios de Seguimiento , Factores de Riesgo de Enfermedad Cardiaca , Humanos , Masculino , Persona de Mediana Edad , Vigilancia de la Población , Modelos de Riesgos Proporcionales , Medición de Riesgo , Factores de Riesgo , Reino Unido/epidemiología , Reino Unido/etnologíaRESUMEN
The article introduces a new type of an authentication technique denoted as memory-memory (M2). A core component of M2 is its ability to collect and populate a voice profile database and use it to perform the verification process. The method relies on a database that includes voice profiles in the form of audio recordings of individuals; the profiles are interconnected based on known relationships between people such that relationships can be used to determine which voice profiles to select to test a person's knowledge of the identity of the people in the recordings (e.g., their names, their relation to each other). Combining widely known concepts (e.g., humans are superior to computers in processing voices and computers are superior to humans in handling data) expects to significantly enhance existing authentication methods (e.g., passwords, biometrics-based).
Asunto(s)
Voz , Biometría , Bases de Datos Factuales , Humanos , ConocimientoRESUMEN
To accelerate the adoption of a new method with a high potential to replace or extend an existing, presumably less accurate, medical scoring system, evaluation should begin days after the new concept is presented publicly, not years or even decades later. Metaphorically speaking, as chameleons capable of quickly changing colors to help their bodies adjust to changes in temperature or light, health-care decision makers should be capable of more quickly evaluating new data-driven insights and tools and should integrate the highest performing ones into national and international care systems. Doing so is essential, because it will truly save the lives of many individuals.
Asunto(s)
Minería de Datos/ética , Difusión de la Información/ética , Informática Médica/ética , Minería de Datos/tendencias , Humanos , Informática Médica/tendencias , Sistemas de Registros Médicos Computarizados/ética , Atención Primaria de Salud/ética , Garantía de la Calidad de Atención de Salud/éticaRESUMEN
OBJECTIVES: Among adults with nonalcoholic fatty liver disease (NAFLD), 25% of deaths are attributable to cardiovascular disease (CVD). CVD risk reduction in NAFLD requires not only modification of traditional CVD risk factors but identification of risk factors unique to NAFLD. METHODS: In a NAFLD cohort, we sought to identify non-traditional risk factors associated with CVD. NAFLD was determined by a previously described algorithm and a multivariable logistic regression model determined predictors of CVD. RESULTS: Of the 8,409 individuals with NAFLD, 3,243 had CVD and 5,166 did not. On multivariable analysis, CVD among NAFLD patients was associated with traditional CVD risk factors including family history of CVD (OR 4.25, P=0.0007), hypertension (OR 2.54, P=0.0017), renal failure (OR 1.59, P=0.04), and age (OR 1.05, P<0.0001). Several non-traditional CVD risk factors including albumin, sodium, and Model for End-Stage Liver Disease (MELD) score were associated with CVD. On multivariable analysis, an increased MELD score (OR 1.10, P<0.0001) was associated with an increased risk of CVD. Albumin (OR 0.52, P<0.0001) and sodium (OR 0.96, P=0.037) were inversely associated with CVD. In addition, CVD was more common among those with a NAFLD fibrosis score >0.676 than those with a score ≤0.676 (39 vs. 20%, P<0.0001). CONCLUSIONS: CVD in NAFLD is associated with traditional CVD risk factors, as well as higher MELD scores and lower albumin and sodium levels. Individuals with evidence of advanced fibrosis were more likely to have CVD. These findings suggest that the drivers of NAFLD may also promote CVD development and progression.
Asunto(s)
Enfermedades Cardiovasculares/epidemiología , Enfermedad del Hígado Graso no Alcohólico/complicaciones , Adulto , Anciano , Algoritmos , Estudios de Cohortes , Bases de Datos Factuales , Registros Electrónicos de Salud , Femenino , Humanos , Modelos Logísticos , Masculino , Persona de Mediana Edad , Prevalencia , Factores de RiesgoRESUMEN
BACKGROUND AND AIMS: Nonalcoholic fatty liver disease (NAFLD) is the most common cause of chronic liver disease worldwide. Risk factors for NAFLD disease progression and liver-related outcomes remain incompletely understood due to the lack of computational identification methods. The present study sought to design a classification algorithm for NAFLD within the electronic medical record (EMR) for the development of large-scale longitudinal cohorts. METHODS: We implemented feature selection using logistic regression with adaptive LASSO. A training set of 620 patients was randomly selected from the Research Patient Data Registry at Partners Healthcare. To assess a true diagnosis for NAFLD we performed chart reviews and considered either a documentation of a biopsy or a clinical diagnosis of NAFLD. We included in our model variables laboratory measurements, diagnosis codes, and concepts extracted from medical notes. Variables with P < 0.05 were included in the multivariable analysis. RESULTS: The NAFLD classification algorithm included number of natural language mentions of NAFLD in the EMR, lifetime number of ICD-9 codes for NAFLD, and triglyceride level. This classification algorithm was superior to an algorithm using ICD-9 data alone with AUC of 0.85 versus 0.75 (P < 0.0001) and leads to the creation of a new independent cohort of 8458 individuals with a high probability for NAFLD. CONCLUSIONS: The NAFLD classification algorithm is superior to ICD-9 billing data alone. This approach is simple to develop, deploy, and can be applied across different institutions to create EMR-based cohorts of individuals with NAFLD.
Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Enfermedad del Hígado Graso no Alcohólico , Adulto , Anciano , Alanina Transaminasa/sangre , Aspartato Aminotransferasas/sangre , Biopsia , Estudios de Cohortes , Recolección de Datos , Diabetes Mellitus/epidemiología , Femenino , Humanos , Clasificación Internacional de Enfermedades , Modelos Logísticos , Masculino , Persona de Mediana Edad , Enfermedad del Hígado Graso no Alcohólico/sangre , Enfermedad del Hígado Graso no Alcohólico/epidemiología , Prevalencia , Triglicéridos/sangre , Estados Unidos/epidemiologíaRESUMEN
This paper addresses the challenge of binary relation classification in biomedical Natural Language Processing (NLP), focusing on diverse domains including gene-disease associations, compound protein interactions, and social determinants of health (SDOH). We evaluate different approaches, including fine-tuning Bidirectional Encoder Representations from Transformers (BERT) models and generative Large Language Models (LLMs), and examine their performance in zero and few-shot settings. We also introduce a novel dataset of biomedical text annotated with social and clinical entities to facilitate research into relation classification. Our results underscore the continued complexity of this task for both humans and models. BERT-based models trained on domain-specific data excelled in certain domains and achieved comparable performance and generalization power to generative LLMs in others. Despite these encouraging results, these models are still far from achieving human-level performance. We also highlight the significance of high-quality training data and domain-specific fine-tuning on the performance of all the considered models.
RESUMEN
Chronic gastrointestinal (GI) conditions, such as inflammatory bowel diseases (IBD), offer a promising opportunity to create classification systems that can enhance the accuracy of predicting the most effective therapies and prognosis for each patient. Here, we present a novel methodology to explore disease subtypes using our open-sourced BiomedSciAI toolkit. Applying methods available in this toolkit on the UK Biobank, including subpopulation-based feature selection and multi-dimensional subset scanning, we aimed to discover unique subgroups from GI surgery cohorts. Of a 12,073-patient cohort, a subgroup of 440 IBD patients was discovered with an increased risk of a subsequent GI surgery (OR: 2.21, 95% CI [1.81-2.69]). We iteratively demonstrate the discovery process using an additional cohort (with a narrower definition of GI surgery). Our results show that the iterative process can refine the subgroup discovery process and generate novel hypotheses to investigate determinants of treatment response.
Asunto(s)
Enfermedades Inflamatorias del Intestino , Biobanco del Reino Unido , Humanos , Bancos de Muestras Biológicas , Enfermedades Inflamatorias del Intestino/cirugía , Pronóstico , Enfermedad Crónica , Resultado del TratamientoRESUMEN
Prediction models are commonly used to estimate risk for cardiovascular diseases, to inform diagnosis and management. However, performance may vary substantially across relevant subgroups of the population. Here we investigated heterogeneity of accuracy and fairness metrics across a variety of subgroups for risk prediction of two common diseases: atrial fibrillation (AF) and atherosclerotic cardiovascular disease (ASCVD). We calculated the Cohorts for Heart and Aging in Genomic Epidemiology Atrial Fibrillation (CHARGE-AF) score for AF and the Pooled Cohort Equations (PCE) score for ASCVD in three large datasets: Explorys Life Sciences Dataset (Explorys, n = 21,809,334), Mass General Brigham (MGB, n = 520,868), and the UK Biobank (UKBB, n = 502,521). Our results demonstrate important performance heterogeneity across subpopulations defined by age, sex, and presence of preexisting disease, with fairly consistent patterns across both scores. For example, using CHARGE-AF, discrimination declined with increasing age, with a concordance index of 0.72 [95% CI 0.72-0.73] for the youngest (45-54 years) subgroup to 0.57 [0.56-0.58] for the oldest (85-90 years) subgroup in Explorys. Even though sex is not included in CHARGE-AF, the statistical parity difference (i.e., likelihood of being classified as high risk) was considerable between males and females within the 65-74 years subgroup with a value of - 0.33 [95% CI - 0.33 to - 0.33]. We also observed weak discrimination (i.e., < 0.7) and suboptimal calibration (i.e., calibration slope outside of 0.7-1.3) in large subsets of the population; for example, all individuals aged 75 years or older in Explorys (17.4%). Our findings highlight the need to characterize and quantify the behavior of clinical risk models within specific subpopulations so they can be used appropriately to facilitate more accurate, consistent, and equitable assessment of disease risk.
Asunto(s)
Aterosclerosis , Fibrilación Atrial , Enfermedades Cardiovasculares , Aterosclerosis/epidemiología , Fibrilación Atrial/diagnóstico , Fibrilación Atrial/epidemiología , Fibrilación Atrial/genética , Enfermedades Cardiovasculares/epidemiología , Femenino , Factores de Riesgo de Enfermedad Cardiaca , Humanos , Masculino , Persona de Mediana Edad , Medición de Riesgo/métodos , Factores de RiesgoRESUMEN
To support point-of-care decision making by presenting outcomes of past treatment choices for cohorts of similar patients based on observational data from electronic health records (EHRs), a machine-learning precision cohort treatment option (PCTO) workflow consisting of (1) data extraction, (2) similarity model training, (3) precision cohort identification, and (4) treatment options analysis was developed. The similarity model is used to dynamically create a cohort of similar patients, to inform clinical decisions about an individual patient. The workflow was implemented using EHR data from a large health care provider for three different highly prevalent chronic diseases: hypertension (HTN), type 2 diabetes mellitus (T2DM), and hyperlipidemia (HL). A retrospective analysis demonstrated that treatment options with better outcomes were available for a majority of cases (75%, 74%, 85% for HTN, T2DM, HL, respectively). The models for HTN and T2DM were deployed in a pilot study with primary care physicians using it during clinic visits. A novel data-analytic workflow was developed to create patient-similarity models that dynamically generate personalized treatment insights at the point-of-care. By leveraging both knowledge-driven treatment guidelines and data-driven EHR data, physicians can incorporate real-world evidence in their medical decision-making process when considering treatment options for individual patients.
Asunto(s)
Diabetes Mellitus Tipo 2/terapia , Hiperlipidemias/terapia , Hipertensión/terapia , Estudios de Cohortes , Minería de Datos , Registros Electrónicos de Salud , Humanos , Aprendizaje Automático , Medicina de Precisión , Estudios Retrospectivos , Flujo de TrabajoRESUMEN
OBJECTIVE: To present clinicians at the point-of-care with real-world data on the effectiveness of various treatment options in a precision cohort of patients closely matched to the index patient. MATERIALS AND METHODS: We developed disease-specific, machine-learning, patient-similarity models for hypertension (HTN), type II diabetes mellitus (T2DM), and hyperlipidemia (HL) using data on approximately 2.5 million patients in a large medical group practice. For each identified decision point, an encounter during which the patient's condition was not controlled, we compared the actual outcome of the treatment decision administered to that of the best-achieved outcome for similar patients in similar clinical situations. RESULTS: For the majority of decision points (66.8%, 59.0%, and 83.5% for HTN, T2DM, and HL, respectively), there were alternative treatment options administered to patients in the precision cohort that resulted in a significantly increased proportion of patients under control than the treatment option chosen for the index patient. The expected percentage of patients whose condition would have been controlled if the best-practice treatment option had been chosen would have been better than the actual percentage by: 36% (65.1% vs 48.0%, HTN), 68% (37.7% vs 22.5%, T2DM), and 138% (75.3% vs 31.7%, HL). CONCLUSION: Clinical guidelines are primarily based on the results of randomized controlled trials, which apply to a homogeneous subject population. Providing the effectiveness of various treatment options used in a precision cohort of patients similar to the index patient can provide complementary information to tailor guideline recommendations for individual patients and potentially improve outcomes.
Asunto(s)
Toma de Decisiones Asistida por Computador , Diabetes Mellitus Tipo 2/terapia , Hiperlipidemias/terapia , Hipertensión/terapia , Aprendizaje Automático , Manejo de Atención al Paciente/métodos , Guías de Práctica Clínica como Asunto , Registros Electrónicos de Salud , Medicina Basada en la Evidencia , Humanos , Resultado del TratamientoRESUMEN
BACKGROUND: Atrial fibrillation (AF) is associated with increased risks of stroke and heart failure. Electronic health record (EHR)-based AF risk prediction may facilitate efficient deployment of interventions to diagnose or prevent AF altogether. METHODS: We externally validated an electronic health record AF (EHR-AF) score in IBM Explorys Life Sciences, a multi-institutional dataset containing statistically deidentified EHR data for over 21 million individuals (Explorys Dataset). We included individuals with complete AF risk data, ≥2 office visits within 2 years, and no prevalent AF. We compared EHR-AF to existing scores including CHARGE-AF (Cohorts for Heart and Aging Research in Genomic Epidemiology Atrial Fibrillation), C2HEST (coronary artery disease or chronic obstructive pulmonary disease, hypertension, elderly, systolic heart failure, thyroid disease), and CHA2DS2-VASc. We assessed association between AF risk scores and 5-year incident AF, stroke, and heart failure using Cox proportional hazards modeling, 5-year AF discrimination using C indices, and calibration of predicted AF risk to observed AF incidence. RESULTS: Of 21 825 853 individuals in the Explorys Dataset, 4 508 180 comprised the analysis (age 62.5, 56.3% female). AF risk scores were strongly associated with 5-year incident AF (hazard ratio per SD increase 1.85 using CHA2DS2-VASc to 2.88 using EHR-AF), stroke (1.61 using C2HEST to 1.92 using CHARGE-AF), and heart failure (1.91 using CHA2DS2-VASc to 2.58 using EHR-AF). EHR-AF (C index, 0.808 [95% CI, 0.807-0.809]) demonstrated favorable AF discrimination compared to CHARGE-AF (0.806 [95% CI, 0.805-0.807]), C2HEST (0.683 [95% CI, 0.682-0.684]), and CHA2DS2-VASc (0.720 [95% CI, 0.719-0.722]). Of the scores, EHR-AF demonstrated the best calibration to incident AF (calibration slope, 1.002 [95% CI, 0.997-1.007]). In subgroup analyses, AF discrimination using EHR-AF was lower in individuals with stroke (C index, 0.696 [95% CI, 0.692-0.700]) and heart failure (0.621 [95% CI, 0.617-0.625]). CONCLUSIONS: EHR-AF demonstrates predictive accuracy for incident AF using readily ascertained EHR data. AF risk is associated with incident stroke and heart failure. Use of such risk scores may facilitate decision support and population health management efforts focused on minimizing AF-related morbidity.
Asunto(s)
Fibrilación Atrial/diagnóstico , Medición de Riesgo/métodos , Accidente Cerebrovascular/epidemiología , Factores de Edad , Fibrilación Atrial/complicaciones , Femenino , Salud Global , Humanos , Incidencia , Masculino , Persona de Mediana Edad , Accidente Cerebrovascular/etiología , Tasa de Supervivencia/tendenciasRESUMEN
The ability to identify patients who are likely to have an adverse outcome is an essential component of good clinical care. Therefore, predictive risk stratification models play an important role in clinical decision making. Determining whether a given predictive model is suitable for clinical use usually involves evaluating the model's performance on large patient datasets using standard statistical measures of success (e.g., accuracy, discriminatory ability). However, as these metrics correspond to averages over patients who have a range of different characteristics, it is difficult to discern whether an individual prediction on a given patient should be trusted using these measures alone. In this paper, we introduce a new method for identifying patient subgroups where a predictive model is expected to be poor, thereby highlighting when a given prediction is misleading and should not be trusted. The resulting "unreliability score" can be computed for any clinical risk model and is suitable in the setting of large class imbalance, a situation often encountered in healthcare settings. Using data from more than 40,000 patients in the Global Registry of Acute Coronary Events (GRACE), we demonstrate that patients with high unreliability scores form a subgroup in which the predictive model has both decreased accuracy and decreased discriminatory ability.
RESUMEN
Aims: To assess demographic and clinical characteristics associated with clinical inertia in a real-world cohort of type 2 diabetes mellitus patients not at hemoglobin A1c goal (<7%) on metformin monotherapy.Methods: Adult (≥18 years) type 2 diabetes mellitus patients who received care at Massachusetts General Hospital/Brigham and Women's Hospital and received a new metformin prescription between 1992 and 2010 were included in the analysis. Clinical inertia was defined as two consecutive hemoglobin A1c measures ≥7% ≥3 months apart while remaining on metformin monotherapy (i.e. without add-on therapy). The association between clinical inertia and demographic and clinical characteristics was examined via logistic regression.Results: Of 2848 eligible patients, 43% did not achieve a hemoglobin A1c goal of <7% 3 months after metformin monotherapy initiation. A sub-group of 1533 patients was included in the clinical inertia analysis, of which 36% experienced clinical inertia. Asian race was associated with an increased likelihood of clinical inertia (OR = 2.43; 95% CI = 1.48-3.96), while congestive heart failure had a decreased likelihood (OR = 0.58; 95% CI = 0.32-0.98). Chronic kidney disease and cardiovascular/cerebrovascular disease had weaker associations but were directionally similar to congestive heart failure.Conclusions: Asian patients were at an increased risk of clinical inertia, whereas patients with comorbidities appeared to have their treatment more appropriately intensified. A better understanding of these factors may inform efforts to decrease the likelihood for clinical inertia.
Asunto(s)
Diabetes Mellitus Tipo 2 , Hemoglobina Glucada/análisis , Administración del Tratamiento Farmacológico/normas , Metformina/uso terapéutico , Pautas de la Práctica en Medicina/normas , Diabetes Mellitus Tipo 2/sangre , Diabetes Mellitus Tipo 2/tratamiento farmacológico , Diabetes Mellitus Tipo 2/epidemiología , Registros Electrónicos de Salud/estadística & datos numéricos , Femenino , Necesidades y Demandas de Servicios de Salud , Humanos , Hipoglucemiantes/uso terapéutico , Masculino , Persona de Mediana Edad , Estudios Retrospectivos , Factores de Riesgo , Estados Unidos/epidemiologíaRESUMEN
Pancreatic ductal adenocarcinoma (PDAC) exhibits a variety of phenotypes with regard to disease progression and treatment response. This variability complicates clinical decision-making despite the improvement of survival due to the recent introduction of FOLFIRINOX (FFX) and nab-paclitaxel. Questions remain as to the timing and sequence of therapies and the role of radiotherapy for unresectable PDAC. Here we developed a computational analysis platform to investigate the dynamics of growth, metastasis and treatment response to FFX, gemcitabine (GEM), and GEM+nab-paclitaxel. Our approach was informed using data of 1,089 patients treated at the Massachusetts General Hospital and validated using an independent cohort from Osaka Medical College. Our framework establishes a logistic growth pattern of PDAC and defines the Local Advancement Index (LAI), which determines the eventual primary tumor size and predicts the number of metastases. We found that a smaller LAI leads to a larger metastatic burden. Furthermore, our analyses ascertain that i) radiotherapy after induction chemotherapy improves survival in cases receiving induction FFX or with larger LAI, ii) neoadjuvant chemotherapy improves survival in cases with resectable PDAC, and iii) temporary cessations of chemotherapies do not impact overall survival, which supports the feasibility of treatment holidays for patients with FFX-associated adverse effects. Our findings inform clinical decision-making for PDAC patients and allow for the rational design of clinical strategies using FFX, GEM, GEM+nab-paclitaxel, neoadjuvant chemotherapy, and radiation.