RESUMO
Adoption of high-content omic technologies in clinical studies, coupled with computational methods, has yielded an abundance of candidate biomarkers. However, translating such findings into bona fide clinical biomarkers remains challenging. To facilitate this process, we introduce Stabl, a general machine learning method that identifies a sparse, reliable set of biomarkers by integrating noise injection and a data-driven signal-to-noise threshold into multivariable predictive modeling. Evaluation of Stabl on synthetic datasets and five independent clinical studies demonstrates improved biomarker sparsity and reliability compared to commonly used sparsity-promoting regularization methods while maintaining predictive performance; it distills datasets containing 1,400-35,000 features down to 4-34 candidate biomarkers. Stabl extends to multi-omic integration tasks, enabling biological interpretation of complex predictive models, as it hones in on a shortlist of proteomic, metabolomic and cytometric events predicting labor onset, microbial biomarkers of pre-term birth and a pre-operative immune signature of post-surgical infections. Stabl is available at https://github.com/gregbellan/Stabl .
Assuntos
Biomarcadores , Aprendizado de Máquina , Biomarcadores/metabolismo , Humanos , Proteômica/métodos , Biologia Computacional/métodos , Metabolômica/métodos , Reprodutibilidade dos TestesRESUMO
High-content omic technologies coupled with sparsity-promoting regularization methods (SRM) have transformed the biomarker discovery process. However, the translation of computational results into a clinical use-case scenario remains challenging. A rate-limiting step is the rigorous selection of reliable biomarker candidates among a host of biological features included in multivariate models. We propose Stabl, a machine learning framework that unifies the biomarker discovery process with multivariate predictive modeling of clinical outcomes by selecting a sparse and reliable set of biomarkers. Evaluation of Stabl on synthetic datasets and four independent clinical studies demonstrates improved biomarker sparsity and reliability compared to commonly used SRMs at similar predictive performance. Stabl readily extends to double- and triple-omics integration tasks and identifies a sparser and more reliable set of biomarkers than those selected by state-of-the-art early- and late-fusion SRMs, thereby facilitating the biological interpretation and clinical translation of complex multi-omic predictive models. The complete package for Stabl is available online at https://github.com/gregbellan/Stabl.
RESUMO
Oral squamous cell carcinoma (OSCC), a prevalent and aggressive neoplasm, poses a significant challenge due to poor prognosis and limited prognostic biomarkers. Leveraging highly multiplexed imaging mass cytometry, we investigated the tumor immune microenvironment (TIME) in OSCC biopsies, characterizing immune cell distribution and signaling activity at the tumor-invasive front. Our spatial subsetting approach standardized cellular populations by tissue zone, improving feature reproducibility and revealing TIME patterns accompanying loss-of-differentiation. Employing a machine-learning pipeline combining reliable feature selection with multivariable modeling, we achieved accurate histological grade classification (AUC = 0.88). Three model features correlated with clinical outcomes in an independent cohort: granulocyte MAPKAPK2 signaling at the tumor front, stromal CD4+ memory T cell size, and the distance of fibroblasts from the tumor border. This study establishes a robust modeling framework for distilling complex imaging data, uncovering sentinel characteristics of the OSCC TIME to facilitate prognostic biomarkers discovery for recurrence risk stratification and immunomodulatory therapy development.
RESUMO
Stable epigenetic changes appear uncommon, suggesting that changes typically dissipate or are repaired. Changes that stably alter gene expression across generations presumably require particular conditions that are currently unknown. Here we report that a minimal combination of cis-regulatory sequences can support permanent RNA silencing of a single-copy transgene and its derivatives in C. elegans simply upon mating. Mating disrupts competing RNA-based mechanisms to initiate silencing that can last for >300 generations. This stable silencing requires components of the small RNA pathway and can silence homologous sequences in trans. While animals do not recover from mating-induced silencing, they often recover from and become resistant to trans silencing. Recovery is also observed in most cases when double-stranded RNA is used to silence the same coding sequence in different regulatory contexts that drive germline expression. Therefore, we propose that regulatory features can evolve to oppose permanent and potentially maladaptive responses to transient change.