Pesquisa | BVS CLAP/SMR-OPAS/OMS

Fast, accurate, and racially unbiased pan-cancer tumor-only variant calling with tabular machine learning.

McLaughlin, R Tyler; Asthana, Maansi; Di Meo, Marc; Ceccarelli, Michele; Jacob, Howard J; Masica, David L.

NPJ Precis Oncol ; 7(1): 4, 2023 Jan 07.

Artigo em Inglês | MEDLINE | ID: mdl-36611079

RESUMO

Accurately identifying somatic mutations is essential for precision oncology and crucial for calculating tumor-mutational burden (TMB), an important predictor of response to immunotherapy. For tumor-only variant calling (i.e., when the cancer biopsy but not the patient's normal tissue sample is sequenced), accurately distinguishing somatic mutations from germline variants is a challenging problem that, when unaddressed, results in unreliable, biased, and inflated TMB estimates. Here, we apply machine learning to the task of somatic vs germline classification in tumor-only solid tumor samples using TabNet, XGBoost, and LightGBM, three machine-learning models for tabular data. We constructed a training set for supervised classification using features derived exclusively from tumor-only variant calling and drawing somatic and germline truth labels from an independent pipeline using the patient-matched normal samples. All three trained models achieved state-of-the-art performance on two holdout test datasets: a TCGA dataset including sarcoma, breast adenocarcinoma, and endometrial carcinoma samples (AUC > 94%), and a metastatic melanoma dataset (AUC > 85%). Concordance between matched-normal and tumor-only TMB improves from R2 = 0.006 to 0.71-0.76 with the addition of a machine-learning classifier, with LightGBM performing best. Notably, these machine-learning models generalize across cancer subtypes and capture kits with a call rate of 100%. We reproduce the recent finding that tumor-only TMB estimates for Black patients are extremely inflated relative to that of white patients due to the racial biases of germline databases. We show that our approach with XGBoost and LightGBM eliminates this significant racial bias in tumor-only variant calling.

Iterative community-driven development of a SARS-CoV-2 tissue simulator.

Getz, Michael; Wang, Yafei; An, Gary; Asthana, Maansi; Becker, Andrew; Cockrell, Chase; Collier, Nicholson; Craig, Morgan; Davis, Courtney L; Faeder, James R; Ford Versypt, Ashlee N; Mapder, Tarunendu; Gianlupi, Juliano F; Glazier, James A; Hamis, Sara; Heiland, Randy; Hillen, Thomas; Hou, Dennis; Islam, Mohammad Aminul; Jenner, Adrianne L; Kurtoglu, Furkan; Larkin, Caroline I; Liu, Bing; Macfarlane, Fiona; Maygrundter, Pablo; Morel, Penelope A; Narayanan, Aarthi; Ozik, Jonathan; Pienaar, Elsje; Rangamani, Padmini; Saglam, Ali Sinan; Shoemaker, Jason Edward; Smith, Amber M; Weaver, Jordan J A; Macklin, Paul.

bioRxiv ; 2021 Nov 10.

Artigo em Inglês | MEDLINE | ID: mdl-32511322

RESUMO

The 2019 novel coronavirus, SARS-CoV-2, is a pathogen of critical significance to international public health. Knowledge of the interplay between molecular-scale virus-receptor interactions, single-cell viral replication, intracellular-scale viral transport, and emergent tissue-scale viral propagation is limited. Moreover, little is known about immune system-virus-tissue interactions and how these can result in low-level (asymptomatic) infections in some cases and acute respiratory distress syndrome (ARDS) in others, particularly with respect to presentation in different age groups or pre-existing inflammatory risk factors. Given the nonlinear interactions within and among each of these processes, multiscale simulation models can shed light on the emergent dynamics that lead to divergent outcomes, identify actionable "choke points" for pharmacologic interventions, screen potential therapies, and identify potential biomarkers that differentiate patient outcomes. Given the complexity of the problem and the acute need for an actionable model to guide therapy discovery and optimization, we introduce and iteratively refine a prototype of a multiscale model of SARS-CoV-2 dynamics in lung tissue. The first prototype model was built and shared internationally as open source code and an online interactive model in under 12 hours, and community domain expertise is driving regular refinements. In a sustained community effort, this consortium is integrating data and expertise across virology, immunology, mathematical biology, quantitative systems physiology, cloud and high performance computing, and other domains to accelerate our response to this critical threat to international health. More broadly, this effort is creating a reusable, modular framework for studying viral replication and immune response in tissues, which can also potentially be adapted to related problems in immunology and immunotherapy.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA