RESUMO
BACKGROUND: Type 1 diabetes (T1D) is a devastating autoimmune disease, and its rising prevalence in the United States and around the world presents a critical problem in public health. While some treatment options exist for patients already diagnosed, individuals considered at risk for developing T1D and who are still in the early stages of their disease pathogenesis without symptoms have no options for any preventive intervention. This is because of the uncertainty in determining their risk level and in predicting with high confidence who will progress, or not, to clinical diagnosis. Biomarkers that assess one's risk with high certainty could address this problem and will inform decisions on early intervention, especially in children where the burden of justifying treatment is high. Single omics approaches (e.g., genomics, proteomics, metabolomics, etc.) have been applied to identify T1D biomarkers based on specific disturbances in association with the disease. However, reliable early biomarkers of T1D have remained elusive to date. To overcome this, we previously showed that parallel multi-omics provides a more comprehensive picture of the disease-associated disturbances and facilitates the identification of candidate T1D biomarkers. METHODS: This paper evaluated the use of machine learning (ML) using data augmentation and supervised ML methods for the purpose of improving the identification of salient patterns in the data and the ultimate extraction of novel biomarker candidates in integrated parallel multi-omics datasets from a limited number of samples. We also examined different stages of data integration (early, intermediate, and late) to assess at which stage supervised parametric models can learn under conditions of high dimensionality and variation in feature counts across different omics. In the late integration scheme, we employed a multi-view ensemble comprising individual parametric models trained over single omics to address the computational challenges posed by the high dimensionality and variation in feature counts across the different yet integrated multi-omics datasets. RESULTS: the multi-view ensemble improves the prediction of case vs. control and finds the most success in flagging a larger consistent set of associated features when compared with chance models, which may eventually be used downstream in identifying a novel composite biomarker signature of T1D risk. CONCLUSIONS: the current work demonstrates the utility of supervised ML in exploring integrated parallel multi-omics data in the ongoing quest for early T1D biomarkers, reinforcing the hope for identifying novel composite biomarker signatures of T1D risk via ML and ultimately informing early treatment decisions in the face of the escalating global incidence of this debilitating disease.
RESUMO
Traumatic brain injury (TBI) is a leading cause of death and disability. Yet, despite immense research efforts, treatment options remain elusive. Translational failures in TBI are often attributed to the heterogeneity of the TBI population and limited methods to capture these individual variabilities. Advances in machine learning (ML) have the potential to further personalized treatment strategies and better inform translational research. However, the use of ML has yet to be widely assessed in pre-clinical neurotrauma research, where data are strictly limited in subject number. To better establish ML's feasibility, we utilized the fluid percussion injury (FPI) portion of the rich, rat data set collected by Operation Brain Trauma Therapy (OBTT), which tested multiple pharmacological treatments. Previous work has provided confidence that both unsupervised and supervised ML techniques can uncover useful insights from this OBTT pre-clinical research data set. As a proof-of-concept, we aimed to better evaluate the multi-variate recovery profiles afforded by the administration of nine different experimental therapies. We assessed supervised pairwise classifiers trained on a pre-processed data set that incorporated metrics from four feature groups to determine their ability to correctly identify specific drug treatments. In all but one of the possible pairwise combinations of minocycline, levetiracetam, erythropoietin, nicotinamide, and amantadine, the baseline was outperformed by one or more supervised classifiers, the exception being nicotinamide versus amantadine. Further, when the same methods were employed to assess different doses of the same treatment, the ML classifiers had greater difficulty in understanding which treatment each sample received. Our data serve as a critical first step toward identifying optimal treatments for specific subgroups of samples that are dependent on factors such as types and severity of traumatic injuries, as well as informing the prediction of therapeutic combinations that may lead to greater treatment effects than individual therapies.
Assuntos
Lesões Encefálicas Traumáticas , Aprendizado de Máquina , Recuperação de Função Fisiológica , Pesquisa Translacional Biomédica/métodos , Animais , Conjuntos de Dados como Assunto , Modelos Animais de Doenças , RatosRESUMO
Traumatic brain injury (TBI) is a leading cause of death and disability yet treatment strategies remain elusive. Advances in machine learning present exciting opportunities for developing personalized medicine and informing laboratory research. However, their feasibility has yet to be widely assessed in animal research where data are typically limited or in the TBI field where each patient presents with a unique injury. The Operation Brain Trauma Therapy (OBTT) has amassed an animal dataset that spans multiple types of injury, treatment strategies, behavioral assessments, histological measures, and biomarker screenings. This paper aims to analyze these data using supervised learning techniques for the first time by partitioning the dataset into acute input metrics (i.e. 7 days post-injury) and a defined recovery outcome (i.e. memory retention). Preprocessing is then applied to transform the raw OBTT dataset, e.g. developing a class attribute by histogram binning, eliminating borderline cases, and applying principal component analysis (PCA). We find that these steps are also useful in establishing a treatment ranking; Minocycline, a therapy with no significant findings in the OBTT analyses, yields the highest percentage recovery in our ranking. Furthermore, of the seven classifiers we have evaluated, Naïve Bayes achieves the best performance (67%) and yields significant improvement over our baseline model on the preprocessed dataset with borderline elimination. We also investigate the effect of testing on individual treatment groups to evaluate which groups are difficult to classify, and note the interpretive qualities of our model that can be clinically relevant.Clinical Relevance- These studies establish methods for better analyzing multivariate functional recovery and understanding which measures affect prognosis following traumatic brain injury.