RESUMO
Single-cell RNA and ATAC sequencing technologies enable the examination of gene expression and chromatin accessibility in individual cells, providing insights into cellular phenotypes. In cancer research, it is important to consistently analyze these states within an evolutionary context on genetic clones. Here we present CONGAS+, a Bayesian model to map single-cell RNA and ATAC profiles onto the latent space of copy number clones. CONGAS+ clusters cells into tumour subclones with similar ploidy, rendering straightforward to compare their expression and chromatin profiles. The framework, implemented on GPU and tested on real and simulated data, scales to analyse seamlessly thousands of cells, demonstrating better performance than single-molecule models, and supporting new multi-omics assays. In prostate cancer, lymphoma and basal cell carcinoma, CONGAS+ successfully identifies complex subclonal architectures while providing a coherent mapping between ATAC and RNA, facilitating the study of genotype-phenotype maps and their connection to genomic instability.
Assuntos
Variações do Número de Cópias de DNA , RNA , RNA/genética , Teorema de Bayes , Variações do Número de Cópias de DNA/genética , Células Clonais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , CromatinaRESUMO
BACKGROUND: Longitudinal single-cell sequencing experiments of patient-derived models are increasingly employed to investigate cancer evolution. In this context, robust computational methods are needed to properly exploit the mutational profiles of single cells generated via variant calling, in order to reconstruct the evolutionary history of a tumor and characterize the impact of therapeutic strategies, such as the administration of drugs. To this end, we have recently developed the LACE framework for the Longitudinal Analysis of Cancer Evolution. RESULTS: The LACE 2.0 release aimed at inferring longitudinal clonal trees enhances the original framework with new key functionalities: an improved data management for preprocessing of standard variant calling data, a reworked inference engine, and direct connection to public databases. CONCLUSIONS: All of this is accessible through a new and interactive Shiny R graphical interface offering the possibility to apply filters helpful in discriminating relevant or potential driver mutations, set up inferential parameters, and visualize the results. The software is available at: github.com/BIMIB-DISCo/LACE.
Assuntos
Neoplasias , Software , Humanos , Neoplasias/genética , Células ClonaisRESUMO
MOTIVATION: The advancements of single-cell sequencing methods have paved the way for the characterization of cellular states at unprecedented resolution, revolutionizing the investigation on complex biological systems. Yet, single-cell sequencing experiments are hindered by several technical issues, which cause output data to be noisy, impacting the reliability of downstream analyses. Therefore, a growing number of data science methods has been proposed to recover lost or corrupted information from single-cell sequencing data. To date, however, no quantitative benchmarks have been proposed to evaluate such methods. RESULTS: We present a comprehensive analysis of the state-of-the-art computational approaches for denoising and imputation of single-cell transcriptomic data, comparing their performance in different experimental scenarios. In detail, we compared 19 denoising and imputation methods, on both simulated and real-world datasets, with respect to several performance metrics related to imputation of dropout events, recovery of true expression profiles, characterization of cell similarity, identification of differentially expressed genes and computation time. The effectiveness and scalability of all methods were assessed with regard to distinct sequencing protocols, sample size and different levels of biological variability and technical noise. As a result, we identify a subset of versatile approaches exhibiting solid performances on most tests and show that certain algorithmic families prove effective on specific tasks but inefficient on others. Finally, most methods appear to benefit from the introduction of appropriate assumptions on noise distribution of biological processes.
Assuntos
Perfilação da Expressão Gênica , RNA-Seq , Análise de Célula Única , Software , Animais , HumanosRESUMO
MOTIVATION: Driver (epi)genomic alterations underlie the positive selection of cancer subpopulations, which promotes drug resistance and relapse. Even though substantial heterogeneity is witnessed in most cancer types, mutation accumulation patterns can be regularly found and can be exploited to reconstruct predictive models of cancer evolution. Yet, available methods can not infer logical formulas connecting events to represent alternative evolutionary routes or convergent evolution. RESULTS: We introduce PMCE, an expressive framework that leverages mutational profiles from cross-sectional sequencing data to infer probabilistic graphical models of cancer evolution including arbitrary logical formulas, and which outperforms the state-of-the-art in terms of accuracy and robustness to noise, on simulations. The application of PMCE to 7866 samples from the TCGA database allows us to identify a highly significant correlation between the predicted evolutionary paths and the overall survival in 7 tumor types, proving that our approach can effectively stratify cancer patients in reliable risk groups. AVAILABILITY AND IMPLEMENTATION: PMCE is freely available at https://github.com/BIMIB-DISCo/PMCE, in addition to the code to replicate all the analyses presented in the manuscript. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Neoplasias , Humanos , Prognóstico , Estudos Transversais , Neoplasias/genética , GenômicaRESUMO
BACKGROUND: The combined effects of biological variability and measurement-related errors on cancer sequencing data remain largely unexplored. However, the spatio-temporal simulation of multi-cellular systems provides a powerful instrument to address this issue. In particular, efficient algorithmic frameworks are needed to overcome the harsh trade-off between scalability and expressivity, so to allow one to simulate both realistic cancer evolution scenarios and the related sequencing experiments, which can then be used to benchmark downstream bioinformatics methods. RESULT: We introduce a Julia package for SPAtial Cancer Evolution (J-SPACE), which allows one to model and simulate a broad set of experimental scenarios, phenomenological rules and sequencing settings.Specifically, J-SPACE simulates the spatial dynamics of cells as a continuous-time multi-type birth-death stochastic process on a arbitrary graph, employing different rules of interaction and an optimised Gillespie algorithm. The evolutionary dynamics of genomic alterations (single-nucleotide variants and indels) is simulated either under the Infinite Sites Assumption or several different substitution models, including one based on mutational signatures. After mimicking the spatial sampling of tumour cells, J-SPACE returns the related phylogenetic model, and allows one to generate synthetic reads from several Next-Generation Sequencing (NGS) platforms, via the ART read simulator. The results are finally returned in standard FASTA, FASTQ, SAM, ALN and Newick file formats. CONCLUSION: J-SPACE is designed to efficiently simulate the heterogeneous behaviour of a large number of cancer cells and produces a rich set of outputs. Our framework is useful to investigate the emergent spatial dynamics of cancer subpopulations, as well as to assess the impact of incomplete sampling and of experiment-specific errors. Importantly, the output of J-SPACE is designed to allow the performance assessment of downstream bioinformatics pipelines processing NGS data. J-SPACE is freely available at: https://github.com/BIMIB-DISCo/J-Space.jl .
Assuntos
Neoplasias , Software , Simulação por Computador , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Neoplasias/genética , Neoplasias/patologia , FilogeniaRESUMO
Checkpoint inhibitors (CPIs) are routinely employed in relapsed/refractory classical Hodgkin lymphoma. Nonetheless, persistent long-term responses are uncommon, and one-third of patients are refractory. Several reports have suggested that treatment with CPIs may re-sensitize patients to chemotherapy, however there is no consensus on the optimal chemotherapy regimen and subsequent consolidation strategy. In this retrospective study we analysed the response to rechallenge with chemotherapy after CPI failure. Furthermore, we exploratively characterized the clonal evolution profile of a small sample of patients (n = 5) by employing the CALDER approach. Among the 28 patients included in the study, 17 (71%) were primary refractory and 26 (92%) were refractory to the last chemotherapy prior to CPIs. Following rechallenge with chemotherapy, response was recorded in 23 (82%) patients experiencing complete remission and 3 (11%) patients experiencing partial remission. The tumour evolution of the patients inferred by CALDER seemingly occurred prior to the first cycle of therapy and was characterized either by linear or branching evolution patterns. Twenty-five patients proceeded to allogeneic stem cell transplantation. At a median follow-up of 21 months, median PFS and OS were not reached. In conclusion, patients who fail CPIs can be effectively rescued by salvage chemotherapy and bridged to allo-SCT/auto-SCT.
Assuntos
Transplante de Células-Tronco Hematopoéticas , Doença de Hodgkin , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Evolução Clonal , Doença de Hodgkin/tratamento farmacológico , Humanos , Inibidores de Checkpoint Imunológico , Recidiva Local de Neoplasia/tratamento farmacológico , Estudos Retrospectivos , Terapia de Salvação , Resultado do TratamentoRESUMO
BACKGROUND: The increasing availability of omics data collected from patients affected by severe pathologies, such as cancer, is fostering the development of data science methods for their analysis. INTRODUCTION: The combination of data integration and machine learning approaches can provide new powerful instruments to tackle the complexity of cancer development and deliver effective diagnostic and prognostic strategies. METHODS: We explore the possibility of exploiting the topological properties of sample-specific metabolic networks as features in a supervised classification task. Such networks are obtained by projecting transcriptomic data from RNA-seq experiments on genome-wide metabolic models to define weighted networks modeling the overall metabolic activity of a given sample. RESULTS: We show the classification results on a labeled breast cancer dataset from the TCGA database, including 210 samples (cancer vs. normal). In particular, we investigate how the performance is affected by a threshold-based pruning of the networks by comparing Artificial Neural Networks, Support Vector Machines and Random Forests. Interestingly, the best classification performance is achieved within a small threshold range for all methods, suggesting that it might represent an effective choice to recover useful information while filtering out noise from data. Overall, the best accuracy is achieved with SVMs, which exhibit performances similar to those obtained when gene expression profiles are used as features. CONCLUSION: These findings demonstrate that the topological properties of sample-specific metabolic networks are effective in classifying cancer and normal samples, suggesting that useful information can be extracted from a relatively limited number of features.
RESUMO
Metabolic reprogramming is a general feature of cancer cells. Regrettably, the comprehensive quantification of metabolites in biological specimens does not promptly translate into knowledge on the utilization of metabolic pathways. By estimating fluxes across metabolic pathways, computational models hold the promise to bridge this gap between data and biological functionality. These models currently portray the average behavior of cell populations however, masking the inherent heterogeneity that is part and parcel of tumorigenesis as much as drug resistance. To remove this limitation, we propose single-cell Flux Balance Analysis (scFBA) as a computational framework to translate single-cell transcriptomes into single-cell fluxomes. We show that the integration of single-cell RNA-seq profiles of cells derived from lung adenocarcinoma and breast cancer patients into a multi-scale stoichiometric model of a cancer cell population: significantly 1) reduces the space of feasible single-cell fluxomes; 2) allows to identify clusters of cells with different growth rates within the population; 3) points out the possible metabolic interactions among cells via exchange of metabolites. The scFBA suite of MATLAB functions is available at https://github.com/BIMIB-DISCo/scFBA, as well as the case study datasets.
Assuntos
Biologia Computacional/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Adenocarcinoma de Pulmão/genética , Algoritmos , Neoplasias da Mama/genética , Simulação por Computador , Feminino , Perfilação da Expressão Gênica/métodos , Genética Populacional/métodos , Humanos , Masculino , Redes e Vias Metabólicas , Neoplasias/genética , Neoplasias/metabolismo , RNA/genética , Software , Transcriptoma/genéticaRESUMO
BACKGROUND: A large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types. RESULTS: We introduce TRaIT, a computational framework to infer mutational graphs that model the accumulation of multiple types of somatic alterations driving tumour evolution. Compared to other tools, TRaIT supports multi-region and single-cell sequencing data within the same statistical framework, and delivers expressive models that capture many complex evolutionary phenomena. TRaIT improves accuracy, robustness to data-specific errors and computational complexity compared to competing methods. CONCLUSIONS: We show that the application of TRaIT to single-cell and multi-region cancer datasets can produce accurate and reliable models of single-tumour evolution, quantify the extent of intra-tumour heterogeneity and generate new testable experimental hypotheses.
Assuntos
Algoritmos , Neoplasias/patologia , Biologia Computacional/métodos , Evolução Molecular , Humanos , Mutação , Neoplasias/classificação , Neoplasias/genética , Análise de Sequência de DNA , Análise de Célula ÚnicaRESUMO
The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next-generation sequencing data and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent work on the "selective advantage" relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications because it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations, and progression model inference. We demonstrate PiCnIc's ability to reproduce much of the current knowledge on colorectal cancer progression as well as to suggest novel experimentally verifiable hypotheses.
Assuntos
Evolução Biológica , Neoplasias Colorretais/genética , Modelos Genéticos , Algoritmos , Humanos , Aprendizado de Máquina , Repetições de MicrossatélitesRESUMO
Effective stratification of cancer patients on the basis of their molecular make-up is a key open challenge. Given the altered and heterogenous nature of cancer metabolism, we here propose to use the overall expression of central carbon metabolism as biomarker to characterize groups of patients with important characteristics, such as response to ad-hoc therapeutic strategies and survival expectancy. To this end, we here introduce the data integration framework named Metabolic Reaction Enrichment Analysis (MaREA), which strives to characterize the metabolic deregulations that distinguish cancer phenotypes, by projecting RNA-seq data onto metabolic networks, without requiring metabolic measurements. MaREA computes a score for each network reaction, based on the expression of the set of genes encoding for the associated enzyme(s). The scores are first used as features for cluster analysis and then to rank and visualize in an organized fashion the metabolic deregulations that distinguish cancer sub-types. We applied our method to recent lung and breast cancer RNA-seq datasets from The Cancer Genome Atlas and we were able to identify subgroups of patients with significant differences in survival expectancy. We show how the prognostic power of MaREA improves when an extracted and further curated core model focusing on central carbon metabolism is used rather than the genome-wide reference network. The visualization of the metabolic differences between the groups with best and worst prognosis allowed to identify and analyze key metabolic properties related to cancer aggressiveness. Some of these properties are shared across different cancer (sub) types, e.g., the up-regulation of nucleic acid and amino acid synthesis, whereas some other appear to be tumor-specific, such as the up- or down-regulation of the phosphoenolpyruvate carboxykinase reaction, which display different patterns in distinct tumor (sub)types. These results might be soon employed to deliver highly automated diagnostic and prognostic strategies for cancer patients.
Assuntos
Biomarcadores Tumorais/metabolismo , Neoplasias/genética , Neoplasias/metabolismo , Análise de Sequência de RNA/métodos , Transcriptoma , Adenocarcinoma/diagnóstico , Adenocarcinoma/metabolismo , Algoritmos , Biópsia , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/metabolismo , Carbono/metabolismo , Análise por Conglomerados , Perfilação da Expressão Gênica , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/metabolismo , Redes e Vias Metabólicas , Reconhecimento Automatizado de Padrão , PrognósticoRESUMO
MOTIVATION: We introduce TRanslational ONCOlogy (TRONCO), an open-source R package that implements the state-of-the-art algorithms for the inference of cancer progression models from (epi)genomic mutational profiles. TRONCO can be used to extract population-level models describing the trends of accumulation of alterations in a cohort of cross-sectional samples, e.g. retrieved from publicly available databases, and individual-level models that reveal the clonal evolutionary history in single cancer patients, when multiple samples, e.g. multiple biopsies or single-cell sequencing data, are available. The resulting models can provide key hints for uncovering the evolutionary trajectories of cancer, especially for precision medicine or personalized therapy. AVAILABILITY AND IMPLEMENTATION: TRONCO is released under the GPL license, is hosted at http://bimib.disco.unimib.it/ (Software section) and archived also at bioconductor.org. CONTACT: tronco@disco.unimib.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Modelos Teóricos , Neoplasias/genética , Software , Algoritmos , Progressão da Doença , Epigênese Genética , Genômica , Humanos , Interface Usuário-ComputadorRESUMO
Gene Regulatory Networks (GRNs) control many biological systems, but how such network coordination is shaped is still unknown. GRNs can be subdivided into basic connections that describe how the network members interact e.g., co-expression, physical interaction, co-localization, genetic influence, pathways, and shared protein domains. The important regulatory mechanisms of these networks involve miRNAs. We developed an R/Bioconductor package, namely SpidermiR, which offers an easy access to both GRNs and miRNAs to the end user, and integrates this information with differentially expressed genes obtained from The Cancer Genome Atlas. Specifically, SpidermiR allows the users to: (i) query and download GRNs and miRNAs from validated and predicted repositories; (ii) integrate miRNAs with GRNs in order to obtain miRNA-gene-gene and miRNA-protein-protein interactions, and to analyze miRNA GRNs in order to identify miRNA-gene communities; and (iii) graphically visualize the results of the analyses. These analyses can be performed through a single interface and without the need for any downloads. The full data sets are then rapidly integrated and processed locally.
Assuntos
MicroRNAs/metabolismo , Software , Estatística como Assunto , Neoplasias da Mama/genética , Feminino , Humanos , Masculino , Proteínas de Neoplasias/metabolismo , Neoplasias da Próstata/genética , Ligação ProteicaRESUMO
BACKGROUND: Dynamical models of gene regulatory networks (GRNs) are highly effective in describing complex biological phenomena and processes, such as cell differentiation and cancer development. Yet, the topological and functional characterization of real GRNs is often still partial and an exhaustive picture of their functioning is missing. RESULTS: We here introduce CABERNET, a Cytoscape app for the generation, simulation and analysis of Boolean models of GRNs, specifically focused on their augmentation when a only partial topological and functional characterization of the network is available. By generating large ensembles of networks in which user-defined entities and relations are added to the original core, CABERNET allows to formulate hypotheses on the missing portions of real networks, as well to investigate their generic properties, in the spirit of complexity science. CONCLUSIONS: CABERNET offers a series of innovative simulation and modeling functions and tools, including (but not being limited to) the dynamical characterization of the gene activation patterns ruling cell types and differentiation fates, and sophisticated robustness assessments, as in the case of gene knockouts. The integration within the widely used Cytoscape framework for the visualization and analysis of biological networks, makes CABERNET a new essential instrument for both the bioinformatician and the computational biologist, as well as a computational support for the experimentalist. An example application concerning the analysis of an augmented T-helper cell GRN is provided.
Assuntos
Diferenciação Celular/genética , Linhagem da Célula/genética , Células/citologia , Biologia Computacional/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Software , Células/metabolismo , Simulação por Computador , HumanosRESUMO
UNLABELLED: We devise a novel inference algorithm to effectively solve the cancer progression model reconstruction problem. Our empirical analysis of the accuracy and convergence rate of our algorithm, CAncer PRogression Inference (CAPRI), shows that it outperforms the state-of-the-art algorithms addressing similar problems. MOTIVATION: Several cancer-related genomic data have become available (e.g. The Cancer Genome Atlas, TCGA) typically involving hundreds of patients. At present, most of these data are aggregated in a cross-sectional fashion providing all measurements at the time of diagnosis. Our goal is to infer cancer 'progression' models from such data. These models are represented as directed acyclic graphs (DAGs) of collections of 'selectivity' relations, where a mutation in a gene A 'selects' for a later mutation in a gene B. Gaining insight into the structure of such progressions has the potential to improve both the stratification of patients and personalized therapy choices. RESULTS: The CAPRI algorithm relies on a scoring method based on a probabilistic theory developed by Suppes, coupled with bootstrap and maximum likelihood inference. The resulting algorithm is efficient, achieves high accuracy and has good complexity, also, in terms of convergence properties. CAPRI performs especially well in the presence of noise in the data, and with limited sample sizes. Moreover CAPRI, in contrast to other approaches, robustly reconstructs different types of confluent trajectories despite irregularities in the data. We also report on an ongoing investigation using CAPRI to study atypical Chronic Myeloid Leukemia, in which we uncovered non trivial selectivity relations and exclusivity patterns among key genomic events. AVAILABILITY AND IMPLEMENTATION: CAPRI is part of the TRanslational ONCOlogy R package and is freely available on the web at: http://bimib.disco.unimib.it/index.php/Tronco CONTACT: daniele.ramazzotti@disco.unimib.it SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Biologia Computacional/métodos , Leucemia Mielogênica Crônica BCR-ABL Positiva/genética , Leucemia Mielogênica Crônica BCR-ABL Positiva/patologia , Modelos Teóricos , Estudos Transversais , Bases de Dados Genéticas , Progressão da Doença , Humanos , Mutação/genética , Probabilidade , Transdução de SinaisRESUMO
SUMMARY: The characterization of the complex phenomenon of cell differentiation is a key goal of both systems and computational biology. GeStoDifferent is a Cytoscape plugin aimed at the generation and the identification of gene regulatory networks (GRNs) describing an arbitrary stochastic cell differentiation process. The (dynamical) model adopted to describe general GRNs is that of noisy random Boolean networks (NRBNs), with a specific focus on their emergent dynamical behavior. GeStoDifferent explores the space of GRNs by filtering the NRBN instances inconsistent with a stochastic lineage differentiation tree representing the cell lineages that can be obtained by following the fate of a stem cell descendant. Matched networks can then be analyzed by Cytoscape network analysis algorithms or, for instance, used to define (multiscale) models of cellular dynamics. AVAILABILITY: Freely available at http://bimib.disco.unimib.it/index.php/Retronet#GESTODifferent or at the Cytoscape App Store http://apps.cytoscape.org/.
Assuntos
Diferenciação Celular/genética , Redes Reguladoras de Genes , Software , Linhagem da Célula , Modelos Genéticos , Processos EstocásticosRESUMO
Colon rectal cancers (CRC) are the result of sequences of mutations which lead the intestinal tissue to develop in a carcinoma following a "progression" of observable phenotypes. The actual modeling and simulation of the key biological structures involved in this process is of interest to biologists and physicians and, at the same time, it poses significant challenges from the mathematics and computer science viewpoints. In this report we give an overview of some mathematical models for cell sorting (a basic phenomenon that underlies several dynamical processes in an organism), intestinal crypt dynamics and related problems and open questions. In particular, major attention is devoted to the survey of so-called in-lattice (or grid) models and off-lattice (off-grid) models. The current work is the groundwork for future research on semi-automated hypotheses formation and testing about the behavior of the various actors taking part in the adenoma-carcinoma progression, from regulatory processes to cell-cell signaling pathways.
Assuntos
Adenoma/patologia , Carcinoma/patologia , Neoplasias Colorretais/patologia , Intestinos/patologia , Modelos Biológicos , Simulação por Computador , Humanos , Biologia de SistemasRESUMO
Cancer patients show heterogeneous phenotypes and very different outcomes and responses even to common treatments, such as standard chemotherapy. This state-of-affairs has motivated the need for the comprehensive characterization of cancer phenotypes and fueled the generation of large omics datasets, comprising multiple omics data reported for the same patients, which might now allow us to start deciphering cancer heterogeneity and implement personalized therapeutic strategies. In this work, we performed the analysis of four cancer types obtained from the latest efforts by The Cancer Genome Atlas, for which seven distinct omics data were available for each patient, in addition to curated clinical outcomes. We performed a uniform pipeline for raw data preprocessing and adopted the Cancer Integration via MultIkernel LeaRning (CIMLR) integrative clustering method to extract cancer subtypes. We then systematically review the discovered clusters for the considered cancer types, highlighting novel associations between the different omics and prognosis.
Assuntos
Genômica , Neoplasias , Humanos , Genômica/métodos , Multiômica , Neoplasias/genética , Genoma , Análise por ConglomeradosRESUMO
Recurring sequences of genomic alterations occurring across patients can highlight repeated evolutionary processes with significant implications for predicting cancer progression. Leveraging the ever-increasing availability of cancer omics data, here we unveil cancer's evolutionary signatures tied to distinct disease outcomes, representing "favored trajectories" of acquisition of driver mutations detected in patients with similar prognosis. We present a framework named ASCETIC (Agony-baSed Cancer EvoluTion InferenCe) to extract such signatures from sequencing experiments generated by different technologies such as bulk and single-cell sequencing data. We apply ASCETIC to (i) single-cell data from 146 myeloid malignancy patients and bulk sequencing from 366 acute myeloid leukemia patients, (ii) multi-region sequencing from 100 early-stage lung cancer patients, (iii) exome/genome data from 10,000+ Pan-Cancer Atlas samples, and (iv) targeted sequencing from 25,000+ MSK-MET metastatic patients, revealing subtype-specific single-nucleotide variant signatures associated with distinct prognostic clusters. Validations on several datasets underscore the robustness and generalizability of the extracted signatures.
Assuntos
Genômica , Neoplasias , Humanos , Neoplasias/genética , Exoma/genética , Pacientes , TecnologiaRESUMO
A key task of genomic surveillance of infectious viral diseases lies in the early detection of dangerous variants. Unexpected help to this end is provided by the analysis of deep sequencing data of viral samples, which are typically discarded after creating consensus sequences. Such analysis allows one to detect intra-host low-frequency mutations, which are a footprint of mutational processes underlying the origination of new variants. Their timely identification may improve public-health decision-making with respect to traditional approaches exploiting consensus sequences. We present the analysis of 220,788 high-quality deep sequencing SARS-CoV-2 samples, showing that many spike and nucleocapsid mutations of interest associated to the most circulating variants, including Beta, Delta, and Omicron, might have been intercepted several months in advance. Furthermore, we show that a refined genomic surveillance system leveraging deep sequencing data might allow one to pinpoint emerging mutation patterns, providing an automated data-driven support to virologists and epidemiologists.