RESUMEN
HIV-1 broadly neutralizing antibodies (bnAbs) are difficult to induce with vaccines but are generated in â¼50% of HIV-1-infected individuals. Understanding the molecular mechanisms of host control of bnAb induction is critical to vaccine design. Here, we performed a transcriptome analysis of blood mononuclear cells from 47 HIV-1-infected individuals who made bnAbs and 46 HIV-1-infected individuals who did not and identified in bnAb individuals upregulation of RAB11FIP5, encoding a Rab effector protein associated with recycling endosomes. Natural killer (NK) cells had the highest differential expression of RAB11FIP5, which was associated with greater dysregulation of NK cell subsets in bnAb subjects. NK cells from bnAb individuals had a more adaptive/dysfunctional phenotype and exhibited impaired degranulation and cytokine production that correlated with RAB11FIP5 transcript levels. Moreover, RAB11FIP5 overexpression modulated the function of NK cells. These data suggest that NK cells and Rab11 recycling endosomal transport are involved in regulation of HIV-1 bnAb development.
Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/inmunología , Anticuerpos Neutralizantes/inmunología , Infecciones por VIH/inmunología , Vacunas contra el SIDA/inmunología , Proteínas Adaptadoras Transductoras de Señales/genética , Proteínas Adaptadoras Transductoras de Señales/fisiología , Adulto , Linfocitos B/inmunología , Línea Celular , Estudios de Cohortes , Femenino , Perfilación de la Expresión Génica/métodos , Anticuerpos Anti-VIH/inmunología , Infecciones por VIH/fisiopatología , VIH-1/patogenicidad , Humanos , Células Asesinas Naturales/inmunología , Células Asesinas Naturales/fisiología , Masculino , Persona de Mediana EdadRESUMEN
Thyroid cancer is the most common malignant endocrine tumor. The key test to assess preoperative risk of malignancy is cytologic evaluation of fine-needle aspiration biopsies (FNABs). The evaluation findings can often be indeterminate, leading to unnecessary surgery for benign post-surgical diagnoses. We have developed a deep-learning algorithm to analyze thyroid FNAB whole-slide images (WSIs). We show, on the largest reported data set of thyroid FNAB WSIs, clinical-grade performance in the screening of determinate cases and indications for its use as an ancillary test to disambiguate indeterminate cases. The algorithm screened and definitively classified 45.1% (130/288) of the WSIs as either benign or malignant with risk of malignancy rates of 2.7% and 94.7%, respectively. It reduced the number of indeterminate cases (N = 108) by reclassifying 21.3% (N = 23) as benign with a resultant risk of malignancy rate of 1.8%. Similar results were reproduced using a data set of consecutive FNABs collected during an entire calendar year, achieving clinically acceptable margins of error for thyroid FNAB classification.
Asunto(s)
Aprendizaje Profundo , Neoplasias de la Tiroides , Humanos , Citología , Neoplasias de la Tiroides/diagnóstico , AlgoritmosRESUMEN
PURPOSE: Less invasive decision support tools are desperately needed to identify occult high-risk disease in men with prostate cancer (PCa) on active surveillance (AS). For a variety of reasons, many men on AS with low- or intermediate-risk disease forgo the necessary repeat surveillance biopsies needed to identify potentially higher-risk PCa. Here, we describe the development of a blood-based immunocyte transcriptomic signature to identify men harboring occult aggressive PCa. We then validate it on a biopsy-positive population with the goal of identifying men who should not be on AS and confirm those men with indolent disease who can safely remain on AS. This model uses subtraction-normalized immunocyte transcriptomic profiles to risk-stratify men with PCa who could be candidates for AS. MATERIALS AND METHODS: Men were eligible for enrollment in the study if they were determined by their physician to have a risk profile that warranted prostate biopsy. Both training (n = 1017) and validation cohort (n = 1198) populations had blood samples drawn coincident to their prostate biopsy. Purified CD2+ and CD14+ immune cells were obtained from peripheral blood mononuclear cells, and RNA was extracted and sequenced. To avoid overfitting and unnecessary complexity, a regularized regression model was built on the training cohort to predict PCa aggressiveness based on the National Comprehensive Cancer Network PCa guidelines. This model was then validated on an independent cohort of biopsy-positive men only, using National Comprehensive Cancer Network unfavorable intermediate risk and worse as an aggressiveness outcome, identifying patients who were not appropriate for AS. RESULTS: The best final model for the AS setting was obtained by combining an immunocyte transcriptomic profile based on 2 cell types with PSA density and age, reaching an AUC of 0.73 (95% CI: 0.69-0.77). The model significantly outperforms (P < .001) PSA density as a biomarker, which has an AUC of 0.69 (95% CI: 0.65-0.73). This model yields an individualized patient risk score with 90% negative predictive value and 50% positive predictive value. CONCLUSIONS: While further validation in an intended-use cohort is needed, the immunocyte transcriptomic model offers a promising tool for risk stratification of individual patients who are being considered for AS.
Asunto(s)
Antígeno Prostático Específico , Neoplasias de la Próstata , Masculino , Humanos , Leucocitos Mononucleares/patología , Espera Vigilante , Neoplasias de la Próstata/patología , Biopsia , Medición de RiesgoRESUMEN
RATIONALE & OBJECTIVE: The life expectancy of patients treated with maintenance hemodialysis (MHD) is heterogeneous. Knowledge of life-expectancy may focus care decisions on near-term versus long-term goals. The current tools are limited and focus on near-term mortality. Here, we develop and assess potential utility for predicting near-term mortality and long-term survival on MHD. STUDY DESIGN: Predictive modeling study. SETTING & PARTICIPANTS: 42,351 patients contributing 997,381 patient months over 11 years, abstracted from the electronic health record (EHR) system of midsize, nonprofit dialysis providers. NEW PREDICTORS & ESTABLISHED PREDICTORS: Demographics, laboratory results, vital signs, and service utilization data available within dialysis EHR. OUTCOME: For each patient month, we ascertained death within the next 6 months (ie, near-term mortality) and survival over more than 5 years during receipt of MHD or after kidney transplantation (ie, long-term survival). ANALYTICAL APPROACH: We used least absolute shrinkage and selection operator logistic regression and gradient-boosting machines to predict each outcome. We compared these to time-to-event models spanning both time horizons. We explored the performance of decision rules at different cut points. RESULTS: All models achieved an area under the receiver operator characteristic curve of≥0.80 and optimal calibration metrics in the test set. The long-term survival models had significantly better performance than the near-term mortality models. The time-to-event models performed similarly to binary models. Applying different cut points spanning from the 1st to 90th percentile of the predictions, a positive predictive value (PPV) of 54% could be achieved for near-term mortality, but with poor sensitivity of 6%. A PPV of 71% could be achieved for long-term survival with a sensitivity of 67%. LIMITATIONS: The retrospective models would need to be prospectively validated before they could be appropriately used as clinical decision aids. CONCLUSIONS: A model built with readily available clinical variables to support easy implementation can predict clinically important life expectancy thresholds and shows promise as a clinical decision support tool for patients on MHD. Predicting long-term survival has better decision rule performance than predicting near-term mortality. PLAIN-LANGUAGE SUMMARY: Clinical prediction models (CPMs) are not widely used for patients undergoing maintenance hemodialysis (MHD). Although a variety of CPMs have been reported in the literature, many of these were not well-designed to be easily implementable. We consider the performance of an implementable CPM for both near-term mortality and long-term survival for patients undergoing MHD. Both near-term and long-term models have similar predictive performance, but the long-term models have greater clinical utility. We further consider how the differential performance of predicting over different time horizons may be used to impact clinical decision making. Although predictive modeling is not regularly used for MHD patients, such tools may help promote individualized care planning and foster shared decision making.
Asunto(s)
Fallo Renal Crónico , Diálisis Renal , Humanos , Diálisis Renal/mortalidad , Diálisis Renal/métodos , Masculino , Femenino , Persona de Mediana Edad , Fallo Renal Crónico/terapia , Fallo Renal Crónico/mortalidad , Anciano , Esperanza de Vida , Tasa de Supervivencia/tendencias , Factores de Tiempo , Medición de Riesgo/métodos , Estudios RetrospectivosRESUMEN
OBJECTIVE: Telehealth has been proposed as a safe and effective alternative to in-person care for rheumatoid arthritis (RA). The purpose of this study was to evaluate factors associated with telehealth appropriateness in outpatient RA encounters. METHODS: A prospective cohort study (January 1, 2021, to August 31, 2021) was conducted using electronic health record data from outpatient RA encounters in a single academic rheumatology practice. Rheumatology providers rated the telehealth appropriateness of their own encounters using the Encounter Appropriateness Score for You (EASY) immediately following each encounter. Robust Poisson regression with generalized estimating equations modeling was used to evaluate the association of telehealth appropriateness with patient demographics, RA clinical characteristics, comorbid noninflammatory causes of joint pain, previous and current encounter characteristics, and provider characteristics. RESULTS: During the study period, 1823 outpatient encounters with 1177 unique patients with RA received an EASY score from 25 rheumatology providers. In the final multivariate model, factors associated with increased telehealth appropriateness included higher average provider preference for telehealth in prior encounters (relative risk [RR] 1.26, 95% CI 1.21-1.31), telehealth as the current encounter modality (RR 2.27, 95% CI 1.95-2.64), and increased patient age (RR 1.05, 95% CI 1.01-1.09). Factors associated with decreased telehealth appropriateness included moderate (RR 0.81, 95% CI 0.68-0.96) and high (RR 0.57, 95% CI 0.46-0.70) RA disease activity and if the previous encounters were conducted by telehealth (RR 0.83, 95% CI 0.73-0.95). CONCLUSION: In this study, telehealth appropriateness was most associated with provider preference, the current and previous encounter modality, and RA disease activity. Other factors like patient demographics, RA medications, and comorbid noninflammatory causes of joint pain were not associated with telehealth appropriateness.
Asunto(s)
Artritis Reumatoide , Telemedicina , Humanos , Artritis Reumatoide/terapia , Femenino , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Anciano , Adulto , Pacientes Ambulatorios , Reumatología , Registros Electrónicos de Salud , Atención AmbulatoriaRESUMEN
INTRODUCTION: Risk prediction, including early disease detection, prevention, and intervention, is essential to precision medicine. However, systematic bias in risk estimation caused by heterogeneity across different demographic groups can lead to inappropriate or misinformed treatment decisions. In addition, low incidence (class-imbalance) outcomes negatively impact the classification performance of many standard learning algorithms which further exacerbates the racial disparity issues. Therefore, it is crucial to improve the performance of statistical and machine learning models in underrepresented populations in the presence of heavy class imbalance. METHOD: To address demographic disparity in the presence of class imbalance, we develop a novel framework, Trans-Balance, by leveraging recent advances in imbalance learning, transfer learning, and federated learning. We consider a practical setting where data from multiple sites are stored locally under privacy constraints. RESULTS: We show that the proposed Trans-Balance framework improves upon existing approaches by explicitly accounting for heterogeneity across demographic subgroups and cohorts. We demonstrate the feasibility and validity of our methods through numerical experiments and a real application to a multi-cohort study with data from participants of four large, NIH-funded cohorts for stroke risk prediction. CONCLUSION: Our findings indicate that the Trans-Balance approach significantly improves predictive performance, especially in scenarios marked by severe class imbalance and demographic disparity. Given its versatility and effectiveness, Trans-Balance offers a valuable contribution to enhancing risk prediction in biomedical research and related fields.
Asunto(s)
Algoritmos , Investigación Biomédica , Humanos , Estudios de Cohortes , Aprendizaje Automático , DemografíaRESUMEN
OBJECTIVE: This study aimed to develop a novel approach using routinely collected electronic health records (EHRs) data to improve the prediction of a rare event. We illustrated this using an example of improving early prediction of an autism diagnosis, given its low prevalence, by leveraging correlations between autism and other neurodevelopmental conditions (NDCs). METHODS: To achieve this, we introduced a conditional multi-label model by merging conditional learning and multi-label methodologies. The conditional learning approach breaks a hard task into more manageable pieces in each stage, and the multi-label approach utilizes information from related neurodevelopmental conditions to learn predictive latent features. The study involved forecasting autism diagnosis by age 5.5 years, utilizing data from the first 18 months of life, and the analysis of feature importance correlations to explore the alignment within the feature space across different conditions. RESULTS: Upon analysis of health records from 18,156 children, we are able to generate a model that predicts a future autism diagnosis with moderate performance (AUROC=0.76). The proposed conditional multi-label method significantly improves predictive performance with an AUROC of 0.80 (p < 0.001). Further examination shows that both the conditional and multi-label approach alone provided marginal lift to the model performance compared to a one-stage one-label approach. We also demonstrated the generalizability and applicability of this method using simulated data with high correlation between feature vectors for different labels. CONCLUSION: Our findings underscore the effectiveness of the developed conditional multi-label model for early prediction of an autism diagnosis. The study introduces a versatile strategy applicable to prediction tasks involving limited target populations but sharing underlying features or etiology among related groups.
Asunto(s)
Trastorno Autístico , Registros Electrónicos de Salud , Humanos , Trastorno Autístico/diagnóstico , Preescolar , Lactante , Masculino , Femenino , Niño , AlgoritmosRESUMEN
OBJECTIVE: This study aims to explore the factors associated with rheumatology providers' perceptions of telehealth utility in real-world telehealth encounters. METHODS: From September 14, 2020 to January 31, 2021, 6 providers at an academic medical center rated their telehealth visits according to perceived utility in making treatment decisions using the following Telehealth Utility Score (TUS) (1 = very low utility to 5 = very high utility). Modified Poisson regression models were used to assess the association between TUS scores and encounter diagnoses, disease activity measures, and immunomodulatory therapy changes during the encounter. RESULTS: A total of 481 telehealth encounters were examined, of which 191 (39.7%) were rated as "low telehealth utility" (TUS 1-3) and 290 (60.3%) were rated as "high telehealth utility" (TUS 4-5). Encounters with a diagnosis of inflammatory arthritis were significantly less likely to be rated as high telehealth utility (adjusted relative risk [aRR], 0.8061; p = 0.004), especially in those with a concurrent noninflammatory musculoskeletal diagnosis (aRR, 0.54; p = 0.006). Other factors significantly associated with low telehealth utility included higher disease activity according to current and prior RAPID3 scores (aRR, 0.87 and aRR, 0.89, respectively; p < 0.001) and provider global scores (aRR, 0.83; p < 0.001), as well as an increase in immunomodulatory therapy (aRR, 0.70; p = 0.015). CONCLUSIONS: Provider perceptions of telehealth utility in real-world encounters are significantly associated with patient diagnoses, current and prior disease activity, and the need for changes in immunomodulatory therapy. These findings inform efforts to optimize the appropriate utilization of telehealth in rheumatology.
Asunto(s)
Artritis , Reumatología , Telemedicina , Humanos , Pacientes Ambulatorios , Centros Médicos AcadémicosRESUMEN
OBJECTIVE: To implement a machine learning model using only the restricted data available at case creation time to predict surgical case length for multiple services at different locations. BACKGROUND: The operating room is one of the most expensive resources in a health system, estimated to cost $22 to $133 per minute and generate about 40% of hospital revenue. Accurate prediction of surgical case length is necessary for efficient scheduling and cost-effective utilization of the operating room and other resources. METHODS: We introduced a similarity cascade to capture the complexity of cases and surgeon influence on the case length and incorporated that into a gradient-boosting machine learning model. The model loss function was customized to improve the balance between over- and under-prediction of the case length. A production pipeline was created to seamlessly deploy and implement the model across our institution. RESULTS: The prospective analysis showed that the model output was gradually adopted by the schedulers and outperformed the scheduler-predicted case length from August to December 2022. In 33,815 surgical cases across outpatient and inpatient platforms, the operational implementation predicted 11.2% fewer underpredicted cases and 5.9% more cases within 20% of the actual case length compared with the schedulers and only overpredicted 5.3% more. The model assisted schedulers to predict 3.4% more cases within 20% of the actual case length and 4.3% fewer underpredicted cases. CONCLUSIONS: We created a unique framework that is being leveraged every day to predict surgical case length more accurately at case posting time and could be potentially utilized to deploy future machine learning models.
Asunto(s)
Hospitales , Quirófanos , Humanos , Predicción , Aprendizaje AutomáticoRESUMEN
We examined the performance of deep learning models on the classification of thyroid fine-needle aspiration biopsies using microscope images captured in 2 ways: with a high-resolution scanner and with a mobile phone camera. Our training set consisted of images from 964 whole-slide images captured with a high-resolution scanner. Our test set consisted of 100 slides; 20 manually selected regions of interest (ROIs) from each slide were captured in 2 ways as mentioned above. Applying a baseline machine learning algorithm trained on scanner ROIs resulted in performance deterioration when applied to the smartphone ROIs (97.8% area under the receiver operating characteristic curve [AUC], CI = [95.4%, 100.0%] for scanner images vs 89.5% AUC, CI = [82.3%, 96.6%] for mobile images, P = .019). Preliminary analysis via histogram matching showed that the baseline model was overly sensitive to slight color variations in the images (specifically, to color differences between mobile and scanner images). Adding color augmentation during training reduces this sensitivity and narrows the performance gap between mobile and scanner images (97.6% AUC, CI = [95.0%, 100.0%] for scanner images vs 96.0% AUC, CI = [91.8%, 100.0%] for mobile images, P = .309), with both modalities on par with human pathologist performance (95.6% AUC, CI = [91.6%, 99.5%]) for malignancy prediction (P = .398 for pathologist vs scanner and P = .875 for pathologist vs mobile). For indeterminate cases (pathologist-assigned Bethesda category of 3, 4, or 5), color augmentations confer some improvement (88.3% AUC, CI = [73.7%, 100.0%] for the baseline model vs 96.2% AUC, CI = [90.9%, 100.0%] with color augmentations, P = .158). In addition, we found that our model's performance levels off after 15 ROIs, a promising indication that ROI data collection would not be time-consuming for our diagnostic system. Finally, we showed that the model has sensible Bethesda category (TBS) predictions (increasing risk malignancy rate with predicted TBS category, with 0% malignancy for predicted TBS 2 and 100% malignancy for TBS 6).
Asunto(s)
Citología , Neoplasias de la Tiroides , Humanos , Teléfono Inteligente , Neoplasias de la Tiroides/diagnóstico , Neoplasias de la Tiroides/patología , Aprendizaje AutomáticoRESUMEN
HCC recurrence following liver transplantation (LT) is highly morbid and occurs despite strict patient selection criteria. Individualized prediction of post-LT HCC recurrence risk remains an important need. Clinico-radiologic and pathologic data of 4981 patients with HCC undergoing LT from the US Multicenter HCC Transplant Consortium (UMHTC) were analyzed to develop a REcurrent Liver cAncer Prediction ScorE (RELAPSE). Multivariable Fine and Gray competing risk analysis and machine learning algorithms (Random Survival Forest and Classification and Regression Tree models) identified variables to model HCC recurrence. RELAPSE was externally validated in 1160 HCC LT recipients from the European Hepatocellular Cancer Liver Transplant study group. Of 4981 UMHTC patients with HCC undergoing LT, 71.9% were within Milan criteria, 16.1% were initially beyond Milan criteria with 9.4% downstaged before LT, and 12.0% had incidental HCC on explant pathology. Overall and recurrence-free survival at 1, 3, and 5 years was 89.7%, 78.6%, and 69.8% and 86.8%, 74.9%, and 66.7%, respectively, with a 5-year incidence of HCC recurrence of 12.5% (median 16 months) and non-HCC mortality of 20.8%. A multivariable model identified maximum alpha-fetoprotein (HR = 1.35 per-log SD, 95% CI,1.22-1.50, p < 0.001), neutrophil-lymphocyte ratio (HR = 1.16 per-log SD, 95% CI,1.04-1.28, p < 0.006), pathologic maximum tumor diameter (HR = 1.53 per-log SD, 95% CI, 1.35-1.73, p < 0.001), microvascular (HR = 2.37, 95%-CI, 1.87-2.99, p < 0.001) and macrovascular (HR = 3.38, 95% CI, 2.41-4.75, p < 0.001) invasion, and tumor differentiation (moderate HR = 1.75, 95% CI, 1.29-2.37, p < 0.001; poor HR = 2.62, 95% CI, 1.54-3.32, p < 0.001) as independent variables predicting post-LT HCC recurrence (C-statistic = 0.78). Machine learning algorithms incorporating additional covariates improved prediction of recurrence (Random Survival Forest C-statistic = 0.81). Despite significant differences in European Hepatocellular Cancer Liver Transplant recipient radiologic, treatment, and pathologic characteristics, external validation of RELAPSE demonstrated consistent 2- and 5-year recurrence risk discrimination (AUCs 0.77 and 0.75, respectively). We developed and externally validated a RELAPSE score that accurately discriminates post-LT HCC recurrence risk and may allow for individualized post-LT surveillance, immunosuppression modification, and selection of high-risk patients for adjuvant therapies.
Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Trasplante de Hígado , Humanos , Trasplante de Hígado/efectos adversos , Factores de Riesgo , Recurrencia Local de Neoplasia/patología , Estudios Retrospectivos , RecurrenciaRESUMEN
Determining transcriptional factor binding sites (TFBSs) is critical for understanding the molecular mechanisms regulating gene expression in different biological conditions. Biological assays designed to directly mapping TFBSs require large sample size and intensive resources. As an alternative, ATAC-seq assay is simple to conduct and provides genomic cleavage profiles that contain rich information for imputing TFBSs indirectly. Previous footprint-based tools are inheritably limited by the accuracy of their bias correction algorithms and the efficiency of their feature extraction models. Here we introduce TAMC (Transcriptional factor binding prediction from ATAC-seq profile at Motif-predicted binding sites using Convolutional neural networks), a deep-learning approach for predicting motif-centric TF binding activity from paired-end ATAC-seq data. TAMC does not require bias correction during signal processing. By leveraging a one-dimensional convolutional neural network (1D-CNN) model, TAMC make predictions based on both footprint and non-footprint features at binding sites for each TF and outperforms existing footprinting tools in TFBS prediction particularly for ATAC-seq data with limited sequencing depth.
Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Aprendizaje Profundo , Sitios de Unión/genética , Unión Proteica/genética , Factores de Transcripción/metabolismoRESUMEN
Recent work has shown that predictive models can be applied to structured electronic health record (EHR) data to stratify autism likelihood from an early age (<1 year). Integrating clinical narratives (or notes) with structured data has been shown to improve prediction performance in other clinical applications, but the added predictive value of this information in early autism prediction has not yet been explored. In this study, we aimed to enhance the performance of early autism prediction by using both structured EHR data and clinical narratives. We built models based on structured data and clinical narratives separately, and then an ensemble model that integrated both sources of data. We assessed the predictive value of these models from Duke University Health System over a 14-year span to evaluate ensemble models predicting later autism diagnosis (by age 4â¯years) from data collected from ages 30 to 360â¯days. Our sample included 11,750 children above by age 3â¯years (385 meeting autism diagnostic criteria). The ensemble model for autism prediction showed superior performance and at age 30â¯days achieved 46.8% sensitivity (95% confidence interval, CI: 22.0%, 52.9%), 28.0% positive predictive value (PPV) at high (90%) specificity (CI: 2.0%, 33.1%), and AUC4 (with at least 4-year follow-up for controls) reaching 0.769 (CI: 0.715, 0.811). Prediction by 360â¯days achieved 44.5% sensitivity (CI: 23.6%, 62.9%), and 13.7% PPV at high (90%) specificity (CI: 9.6%, 18.9%), and AUC4 reaching 0.797 (CI: 0.746, 0.840). Results show that incorporating clinical narratives in early autism prediction achieved promising accuracy by age 30â¯days, outperforming models based on structured data only. Furthermore, findings suggest that additional features learned from clinician narratives might be hypothesis generating for understanding early development in autism.
Asunto(s)
Trastorno Autístico , Registros Electrónicos de Salud , Niño , Humanos , Lactante , Preescolar , Trastorno Autístico/diagnóstico , Valor Predictivo de las Pruebas , Narración , ElectrónicaRESUMEN
Importance: Stroke is the fifth-highest cause of death in the US and a leading cause of serious long-term disability with particularly high risk in Black individuals. Quality risk prediction algorithms, free of bias, are key for comprehensive prevention strategies. Objective: To compare the performance of stroke-specific algorithms with pooled cohort equations developed for atherosclerotic cardiovascular disease for the prediction of new-onset stroke across different subgroups (race, sex, and age) and to determine the added value of novel machine learning techniques. Design, Setting, and Participants: Retrospective cohort study on combined and harmonized data from Black and White participants of the Framingham Offspring, Atherosclerosis Risk in Communities (ARIC), Multi-Ethnic Study for Atherosclerosis (MESA), and Reasons for Geographical and Racial Differences in Stroke (REGARDS) studies (1983-2019) conducted in the US. The 62â¯482 participants included at baseline were at least 45 years of age and free of stroke or transient ischemic attack. Exposures: Published stroke-specific algorithms from Framingham and REGARDS (based on self-reported risk factors) as well as pooled cohort equations for atherosclerotic cardiovascular disease plus 2 newly developed machine learning algorithms. Main Outcomes and Measures: Models were designed to estimate the 10-year risk of new-onset stroke (ischemic or hemorrhagic). Discrimination concordance index (C index) and calibration ratios of expected vs observed event rates were assessed at 10 years. Analyses were conducted by race, sex, and age groups. Results: The combined study sample included 62â¯482 participants (median age, 61 years, 54% women, and 29% Black individuals). Discrimination C indexes were not significantly different for the 2 stroke-specific models (Framingham stroke, 0.72; 95% CI, 0.72-073; REGARDS self-report, 0.73; 95% CI, 0.72-0.74) vs the pooled cohort equations (0.72; 95% CI, 0.71-0.73): differences 0.01 or less (P values >.05) in the combined sample. Significant differences in discrimination were observed by race: the C indexes were 0.76 for all 3 models in White vs 0.69 in Black women (all P values <.001) and between 0.71 and 0.72 in White men and between 0.64 and 0.66 in Black men (all P values ≤.001). When stratified by age, model discrimination was better for younger (<60 years) vs older (≥60 years) adults for both Black and White individuals. The ratios of observed to expected 10-year stroke rates were closest to 1 for the REGARDS self-report model (1.05; 95% CI, 1.00-1.09) and indicated risk overestimation for Framingham stroke (0.86; 95% CI, 0.82-0.89) and pooled cohort equations (0.74; 95% CI, 0.71-0.77). Performance did not significantly improve when novel machine learning algorithms were applied. Conclusions and Relevance: In this analysis of Black and White individuals without stroke or transient ischemic attack among 4 US cohorts, existing stroke-specific risk prediction models and novel machine learning techniques did not significantly improve discriminative accuracy for new-onset stroke compared with the pooled cohort equations, and the REGARDS self-report model had the best calibration. All algorithms exhibited worse discrimination in Black individuals than in White individuals, indicating the need to expand the pool of risk factors and improve modeling techniques to address observed racial disparities and improve model performance.
Asunto(s)
Población Negra , Disparidades en Atención de Salud , Prejuicio , Medición de Riesgo , Accidente Cerebrovascular , Población Blanca , Femenino , Humanos , Masculino , Persona de Mediana Edad , Aterosclerosis/epidemiología , Enfermedades Cardiovasculares/epidemiología , Ataque Isquémico Transitorio/epidemiología , Estudios Retrospectivos , Accidente Cerebrovascular/diagnóstico , Accidente Cerebrovascular/epidemiología , Accidente Cerebrovascular/etnología , Medición de Riesgo/normas , Reproducibilidad de los Resultados , Factores Sexuales , Factores de Edad , Factores Raciales/estadística & datos numéricos , Población Negra/estadística & datos numéricos , Población Blanca/estadística & datos numéricos , Estados Unidos/epidemiología , Aprendizaje Automático/normas , Sesgo , Prejuicio/prevención & control , Disparidades en Atención de Salud/etnología , Disparidades en Atención de Salud/normas , Disparidades en Atención de Salud/estadística & datos numéricos , Simulación por Computador/normas , Simulación por Computador/estadística & datos numéricosRESUMEN
OBJECTIVE: To design and establish a prospective biospecimen repository that integrates multi-omics assays with clinical data to study mechanisms of controlled injury and healing. BACKGROUND: Elective surgery is an opportunity to understand both the systemic and focal responses accompanying controlled and well-characterized injury to the human body. The overarching goal of this ongoing project is to define stereotypical responses to surgical injury, with the translational purpose of identifying targetable pathways involved in healing and resilience, and variations indicative of aberrant peri-operative outcomes. METHODS: Clinical data from the electronic medical record combined with large-scale biological data sets derived from blood, urine, fecal matter, and tissue samples are collected prospectively through the peri-operative period on patients undergoing 14 surgeries chosen to represent a range of injury locations and intensities. Specimens are subjected to genomic, transcriptomic, proteomic, and metabolomic assays to describe their genetic, metabolic, immunologic, and microbiome profiles, providing a multidimensional landscape of the human response to injury. RESULTS: The highly multiplexed data generated includes changes in over 28,000 mRNA transcripts, 100 plasma metabolites, 200 urine metabolites, and 400 proteins over the longitudinal course of surgery and recovery. In our initial pilot dataset, we demonstrate the feasibility of collecting high quality multi-omic data at pre- and postoperative time points and are already seeing evidence of physiologic perturbation between timepoints. CONCLUSIONS: This repository allows for longitudinal, state-of-the-art geno-mic, transcriptomic, proteomic, metabolomic, immunologic, and clinical data collection and provides a rich and stable infrastructure on which to fuel further biomedical discovery.
Asunto(s)
Biología Computacional , Proteómica , Genómica , Humanos , Metabolómica , Estudios Prospectivos , Proteómica/métodosRESUMEN
OBJECTIVES: Sepsis causes significant mortality. However, most patients who die of sepsis do not present with severe infection, hampering efforts to deliver early, aggressive therapy. It is also known that the host gene expression response to infection precedes clinical illness. This study seeks to develop transcriptomic models to predict progression to sepsis or shock within 72 hours of hospitalization and to validate previously identified transcriptomic signatures in the prediction of 28-day mortality. DESIGN: Retrospective differential gene expression analysis and predictive modeling using RNA sequencing data. PATIENTS: Two hundred seventy-seven patients enrolled at four large academic medical centers; all with clinically adjudicated infection were considered for inclusion in this study. MEASUREMENTS AND MAIN RESULTS: Sepsis progression was defined as an increase in Sepsis 3 category within 72 hours. Transcriptomic data were generated using RNAseq of whole blood. Least absolute shrinkage and selection operator modeling was used to identify predictive signatures for various measures of disease progression. Four previously identified gene signatures were tested for their ability to predict 28-day mortality. There were no significant differentially expressed genes in 136 subjects with worsened Sepsis 3 category compared with 141 nonprogressor controls. There were 1,178 differentially expressed genes identified when sepsis progression was defined as ICU admission or 28-day mortality. A model based on these genes predicted progression with an area under the curve of 0.71. Validation of previously identified gene signatures to predict sepsis mortality revealed area under the receiver operating characteristic values of 0.70-0.75 and no significant difference between signatures. CONCLUSIONS: Host gene expression was unable to predict sepsis progression when defined by an increase in Sepsis-3 category, suggesting this definition is not a useful framework for transcriptomic prediction methods. However, there was a differential response when progression was defined as ICU admission or death. Validation of previously described signatures predicted 28-day mortality with insufficient accuracy to offer meaningful clinical utility.
Asunto(s)
Sepsis , Humanos , Estudios Retrospectivos , Curva ROC , Hospitalización , Expresión Génica , PronósticoRESUMEN
BACKGROUND AND AIMS: Whether glycemic control, as opposed to diabetes status, is associated with the severity of NAFLD is open for study. We aimed to evaluate whether degree of glycemic control in the years preceding liver biopsy predicts the histological severity of NASH. APPROACH AND RESULTS: Using the Duke NAFLD Clinical Database, we examined patients with biopsy-proven NAFLD/NASH (n = 713) and the association of liver injury with glycemic control as measured by hemoglobin A1c (HbA1c). The study cohort was predominantly female (59%) and White (84%) with median (interquartile range) age of 50 (42, 58) years; 49% had diabetes (n = 348). Generalized linear regression models adjusted for age, sex, race, diabetes, body mass index, and hyperlipidemia were used to assess the association between mean HbA1c over the year preceding liver biopsy and severity of histological features of NAFLD/NASH. Histological features were graded and staged according to the NASH Clinical Research Network system. Group-based trajectory analysis was used to examine patients with at least three HbA1c (n = 298) measures over 5 years preceding clinically indicated liver biopsy. Higher mean HbA1c was associated with higher grade of steatosis and ballooned hepatocytes, but not lobular inflammation. Every 1% increase in mean HbA1c was associated with 15% higher odds of increased fibrosis stage (OR, 1.15; 95% CI, 1.01, 1.31). As compared with good glycemic control, moderate control was significantly associated with increased severity of ballooned hepatocytes (OR, 1.74; 95% CI, 1.01, 3.01; P = 0.048) and hepatic fibrosis (HF; OR, 4.59; 95% CI, 2.33, 9.06; P < 0.01). CONCLUSIONS: Glycemic control predicts severity of ballooned hepatocytes and HF in NAFLD/NASH, and thus optimizing glycemic control may be a means of modifying risk of NASH-related fibrosis progression.
Asunto(s)
Glucemia/metabolismo , Diabetes Mellitus/metabolismo , Hemoglobina Glucada/metabolismo , Hepatocitos/patología , Cirrosis Hepática/patología , Enfermedad del Hígado Graso no Alcohólico/patología , Adulto , Diabetes Mellitus/tratamiento farmacológico , Femenino , Control Glucémico , Humanos , Hipoglucemiantes/uso terapéutico , Masculino , Persona de Mediana Edad , Enfermedad del Hígado Graso no Alcohólico/metabolismo , Índice de Severidad de la EnfermedadRESUMEN
PURPOSE OF REVIEW: Artificial intelligence tools are being rapidly integrated into clinical environments and may soon be incorporated into dementia diagnostic paradigms. A comprehensive review of emerging trends will allow physicians and other healthcare providers to better anticipate and understand these powerful tools. RECENT FINDINGS: Machine learning models that utilize cerebral biomarkers are demonstrably effective for dementia identification and prediction; however, cerebral biomarkers are relatively expensive and not widely available. As eye images harbor several ophthalmic biomarkers that mirror the state of the brain and can be clinically observed with routine imaging, eye-based machine learning models are an emerging area, with efficacy comparable with cerebral-based machine learning models. Emerging machine learning architectures like recurrent, convolutional, and partially pretrained neural networks have proven to be promising frontiers for feature extraction and classification with ocular biomarkers. SUMMARY: Machine learning models that can accurately distinguish those with symptomatic Alzheimer's dementia from those with mild cognitive impairment and normal cognition as well as predict progressive disease using relatively inexpensive and accessible ocular imaging inputs are impactful tools for the diagnosis and risk stratification of Alzheimer's dementia continuum. If these machine learning models can be incorporated into clinical care, they may simplify diagnostic efforts. Recent advancements in ocular-based machine learning efforts are promising steps forward.
Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Enfermedad de Alzheimer/diagnóstico , Inteligencia Artificial , Biomarcadores , Disfunción Cognitiva/diagnóstico , Humanos , Aprendizaje AutomáticoRESUMEN
BACKGROUND: In the early stages of the COVID-19 pandemic our institution was interested in forecasting how long surgical patients receiving elective procedures would spend in the hospital. Initial examination of our models indicated that, due to the skewed nature of the length of stay, accurate prediction was challenging and we instead opted for a simpler classification model. In this work we perform a deeper examination of predicting in-hospital length of stay. METHODS: We used electronic health record data on length of stay from 42,209 elective surgeries. We compare different loss-functions (mean squared error, mean absolute error, mean relative error), algorithms (LASSO, Random Forests, multilayer perceptron) and data transformations (log and truncation). We also assess the performance of two stage hybrid classification-regression approach. RESULTS: Our results show that while it is possible to accurately predict short length of stays, predicting longer length of stay is extremely challenging. As such, we opt for a two-stage model that first classifies patients into long versus short length of stays and then a second stage that fits a regresssor among those predicted to have a short length of stay. DISCUSSION: The results indicate both the challenges and considerations necessary to applying machine-learning methods to skewed outcomes. CONCLUSIONS: Two-stage models allow those developing clinical decision support tools to explicitly acknowledge where they can and cannot make accurate predictions.
Asunto(s)
COVID-19 , Pandemias , COVID-19/epidemiología , Hospitales , Humanos , Tiempo de Internación , Aprendizaje AutomáticoRESUMEN
BACKGROUND: Host gene expression has emerged as a complementary strategy to pathogen detection tests for the discrimination of bacterial and viral infection. The impact of immunocompromise on host-response tests remains unknown. We evaluated a host-response test discriminating bacterial, viral, and noninfectious conditions in immunocompromised subjects. METHODS: An 81-gene signature was measured using real-time-polymerase chain reaction in subjects with immunocompromise (chemotherapy, solid-organ transplant, immunomodulatory agents, AIDS) with bacterial infection, viral infection, or noninfectious illness. A regularized logistic regression model trained in immunocompetent subjects was used to estimate the likelihood of each class in immunocompromised subjects. RESULTS: Accuracy in the 136-subject immunocompetent training cohort was 84.6% for bacterial versus nonbacterial discrimination and 80.8% for viral versus nonviral discrimination. Model validation in 134 immunocompromised subjects showed overall accuracy of 73.9% for bacterial infection (Pâ =â .04 relative to immunocompetent subjects) and 75.4% for viral infection (Pâ =â .30). A scheme reporting results by quartile improved test utility. The highest probability quartile ruled-in bacterial and viral infection with 91.4% and 84.0% specificity, respectively. The lowest probability quartile ruled-out infection with 90.1% and 96.4% sensitivity for bacterial and viral infection, respectively. Performance was independent of the type or number of immunocompromising conditions. CONCLUSIONS: A host gene expression test discriminated bacterial, viral, and noninfectious etiologies at a lower overall accuracy in immunocompromised patients compared with immunocompetent patients, although this difference was only significant for bacterial infection classification. With modified interpretive criteria, a host-response strategy may offer clinically useful diagnostic information for patients with immunocompromise.