RESUMO
BACKGROUND: Longitudinal single-cell sequencing experiments of patient-derived models are increasingly employed to investigate cancer evolution. In this context, robust computational methods are needed to properly exploit the mutational profiles of single cells generated via variant calling, in order to reconstruct the evolutionary history of a tumor and characterize the impact of therapeutic strategies, such as the administration of drugs. To this end, we have recently developed the LACE framework for the Longitudinal Analysis of Cancer Evolution. RESULTS: The LACE 2.0 release aimed at inferring longitudinal clonal trees enhances the original framework with new key functionalities: an improved data management for preprocessing of standard variant calling data, a reworked inference engine, and direct connection to public databases. CONCLUSIONS: All of this is accessible through a new and interactive Shiny R graphical interface offering the possibility to apply filters helpful in discriminating relevant or potential driver mutations, set up inferential parameters, and visualize the results. The software is available at: github.com/BIMIB-DISCo/LACE.
Assuntos
Neoplasias , Software , Humanos , Neoplasias/genética , Células ClonaisRESUMO
MOTIVATION: The advancements of single-cell sequencing methods have paved the way for the characterization of cellular states at unprecedented resolution, revolutionizing the investigation on complex biological systems. Yet, single-cell sequencing experiments are hindered by several technical issues, which cause output data to be noisy, impacting the reliability of downstream analyses. Therefore, a growing number of data science methods has been proposed to recover lost or corrupted information from single-cell sequencing data. To date, however, no quantitative benchmarks have been proposed to evaluate such methods. RESULTS: We present a comprehensive analysis of the state-of-the-art computational approaches for denoising and imputation of single-cell transcriptomic data, comparing their performance in different experimental scenarios. In detail, we compared 19 denoising and imputation methods, on both simulated and real-world datasets, with respect to several performance metrics related to imputation of dropout events, recovery of true expression profiles, characterization of cell similarity, identification of differentially expressed genes and computation time. The effectiveness and scalability of all methods were assessed with regard to distinct sequencing protocols, sample size and different levels of biological variability and technical noise. As a result, we identify a subset of versatile approaches exhibiting solid performances on most tests and show that certain algorithmic families prove effective on specific tasks but inefficient on others. Finally, most methods appear to benefit from the introduction of appropriate assumptions on noise distribution of biological processes.
Assuntos
Perfilação da Expressão Gênica , RNA-Seq , Análise de Célula Única , Software , Animais , HumanosRESUMO
Checkpoint inhibitors (CPIs) are routinely employed in relapsed/refractory classical Hodgkin lymphoma. Nonetheless, persistent long-term responses are uncommon, and one-third of patients are refractory. Several reports have suggested that treatment with CPIs may re-sensitize patients to chemotherapy, however there is no consensus on the optimal chemotherapy regimen and subsequent consolidation strategy. In this retrospective study we analysed the response to rechallenge with chemotherapy after CPI failure. Furthermore, we exploratively characterized the clonal evolution profile of a small sample of patients (n = 5) by employing the CALDER approach. Among the 28 patients included in the study, 17 (71%) were primary refractory and 26 (92%) were refractory to the last chemotherapy prior to CPIs. Following rechallenge with chemotherapy, response was recorded in 23 (82%) patients experiencing complete remission and 3 (11%) patients experiencing partial remission. The tumour evolution of the patients inferred by CALDER seemingly occurred prior to the first cycle of therapy and was characterized either by linear or branching evolution patterns. Twenty-five patients proceeded to allogeneic stem cell transplantation. At a median follow-up of 21 months, median PFS and OS were not reached. In conclusion, patients who fail CPIs can be effectively rescued by salvage chemotherapy and bridged to allo-SCT/auto-SCT.
Assuntos
Transplante de Células-Tronco Hematopoéticas , Doença de Hodgkin , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Evolução Clonal , Doença de Hodgkin/tratamento farmacológico , Humanos , Inibidores de Checkpoint Imunológico , Recidiva Local de Neoplasia/tratamento farmacológico , Estudos Retrospectivos , Terapia de Salvação , Resultado do TratamentoRESUMO
BACKGROUND: The increasing availability of omics data collected from patients affected by severe pathologies, such as cancer, is fostering the development of data science methods for their analysis. INTRODUCTION: The combination of data integration and machine learning approaches can provide new powerful instruments to tackle the complexity of cancer development and deliver effective diagnostic and prognostic strategies. METHODS: We explore the possibility of exploiting the topological properties of sample-specific metabolic networks as features in a supervised classification task. Such networks are obtained by projecting transcriptomic data from RNA-seq experiments on genome-wide metabolic models to define weighted networks modeling the overall metabolic activity of a given sample. RESULTS: We show the classification results on a labeled breast cancer dataset from the TCGA database, including 210 samples (cancer vs. normal). In particular, we investigate how the performance is affected by a threshold-based pruning of the networks by comparing Artificial Neural Networks, Support Vector Machines and Random Forests. Interestingly, the best classification performance is achieved within a small threshold range for all methods, suggesting that it might represent an effective choice to recover useful information while filtering out noise from data. Overall, the best accuracy is achieved with SVMs, which exhibit performances similar to those obtained when gene expression profiles are used as features. CONCLUSION: These findings demonstrate that the topological properties of sample-specific metabolic networks are effective in classifying cancer and normal samples, suggesting that useful information can be extracted from a relatively limited number of features.
RESUMO
Transcripts originating from the transcriptional read through of two adjacent, similarly oriented genes have been identified in normal and neoplastic tissues, but their functional role and the mechanisms that regulate their expression are mostly unknown. Here, we investigated whether the expression of read-through transcripts previously identified in the non-involved lung tissue of lung adenocarcinoma patients was genetically regulated. Data on genome-wide single nucleotide variant genotypes and expression levels of 10 read-through transcripts in 201 samples of lung tissue were combined to identify expression quantitative trait loci (eQTLs). Then, to identify genes whose expression levels correlated with the 10 read-through transcripts, we used whole transcriptome profiles available for 154 patients. For 8 read-though transcripts, we identified 60 eQTLs (false discovery rate <0.05), including 17 cis-eQTLs and 43 trans-eQTLs. These eQTLs did not maintain their behavior on the 'parental' genes involved in the read-through transcriptional event. The expression levels of 7 read-through transcripts were found to correlate with the expression of other genes: CHIA-PIFO and CTSC-RAB38 correlated with CHIA and RAB38, respectively, while 5 other read-through transcripts correlated with 43 unique non-parental transcripts; thus offering indications about the molecular processes in which these chimeric transcripts may be involved. We confirmed 9 eQTLs (for 4 transcripts) in the non-involved lung tissue from an independent series of 188 lung adenocarcinoma patients. Therefore, this study indicates that the expression of four read-through transcripts in normal lung tissue is under germline genetic regulation, and that this regulation is independent of that of the genes involved in the read-through event.
Assuntos
Adenocarcinoma de Pulmão/genética , Predisposição Genética para Doença , Locos de Características Quantitativas/genética , Transcriptoma/genética , Adenocarcinoma de Pulmão/patologia , Adenocarcinoma de Pulmão/cirurgia , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Regulação Neoplásica da Expressão Gênica/genética , Estudo de Associação Genômica Ampla , Genótipo , Células Germinativas/metabolismo , Células Germinativas/patologia , Humanos , Pulmão/metabolismo , Pulmão/patologia , Masculino , Pessoa de Meia-Idade , Proteínas de Neoplasias/genética , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Metabolic reprogramming is a general feature of cancer cells. Regrettably, the comprehensive quantification of metabolites in biological specimens does not promptly translate into knowledge on the utilization of metabolic pathways. By estimating fluxes across metabolic pathways, computational models hold the promise to bridge this gap between data and biological functionality. These models currently portray the average behavior of cell populations however, masking the inherent heterogeneity that is part and parcel of tumorigenesis as much as drug resistance. To remove this limitation, we propose single-cell Flux Balance Analysis (scFBA) as a computational framework to translate single-cell transcriptomes into single-cell fluxomes. We show that the integration of single-cell RNA-seq profiles of cells derived from lung adenocarcinoma and breast cancer patients into a multi-scale stoichiometric model of a cancer cell population: significantly 1) reduces the space of feasible single-cell fluxomes; 2) allows to identify clusters of cells with different growth rates within the population; 3) points out the possible metabolic interactions among cells via exchange of metabolites. The scFBA suite of MATLAB functions is available at https://github.com/BIMIB-DISCo/scFBA, as well as the case study datasets.
Assuntos
Biologia Computacional/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Adenocarcinoma de Pulmão/genética , Algoritmos , Neoplasias da Mama/genética , Simulação por Computador , Feminino , Perfilação da Expressão Gênica/métodos , Genética Populacional/métodos , Humanos , Masculino , Redes e Vias Metabólicas , Neoplasias/genética , Neoplasias/metabolismo , RNA/genética , Software , Transcriptoma/genéticaRESUMO
MOTIVATION: Intratumour heterogeneity poses many challenges to the treatment of cancer. Unfortunately, the transcriptional and metabolic information retrieved by currently available computational and experimental techniques portrays the average behaviour of intermixed and heterogeneous cell subpopulations within a given tumour. Emerging single-cell genomic analyses are nonetheless unable to characterize the interactions among cancer subpopulations. In this study, we propose popFBA , an extension to classic Flux Balance Analysis, to explore how metabolic heterogeneity and cooperation phenomena affect the overall growth of cancer cell populations. RESULTS: We show how clones of a metabolic network of human central carbon metabolism, sharing the same stoichiometry and capacity constraints, may follow several different metabolic paths and cooperate to maximize the growth of the total population. We also introduce a method to explore the space of possible interactions, given some constraints on plasma supply of nutrients. We illustrate how alternative nutrients in plasma supply and/or a dishomogeneous distribution of oxygen provision may affect the landscape of heterogeneous phenotypes. We finally provide a technique to identify the most proliferative cells within the heterogeneous population. AVAILABILITY AND IMPLEMENTATION: the popFBA MATLAB function and the SBML model are available at https://github.com/BIMIB-DISCo/popFBA . CONTACT: chiara.damiani@unimib.it.
Assuntos
Biologia Computacional/métodos , Redes e Vias Metabólicas , Neoplasias/metabolismo , Software , Proliferação de Células , Simulação por Computador , Humanos , Modelos Biológicos , Neoplasias/fisiopatologiaRESUMO
Effective stratification of cancer patients on the basis of their molecular make-up is a key open challenge. Given the altered and heterogenous nature of cancer metabolism, we here propose to use the overall expression of central carbon metabolism as biomarker to characterize groups of patients with important characteristics, such as response to ad-hoc therapeutic strategies and survival expectancy. To this end, we here introduce the data integration framework named Metabolic Reaction Enrichment Analysis (MaREA), which strives to characterize the metabolic deregulations that distinguish cancer phenotypes, by projecting RNA-seq data onto metabolic networks, without requiring metabolic measurements. MaREA computes a score for each network reaction, based on the expression of the set of genes encoding for the associated enzyme(s). The scores are first used as features for cluster analysis and then to rank and visualize in an organized fashion the metabolic deregulations that distinguish cancer sub-types. We applied our method to recent lung and breast cancer RNA-seq datasets from The Cancer Genome Atlas and we were able to identify subgroups of patients with significant differences in survival expectancy. We show how the prognostic power of MaREA improves when an extracted and further curated core model focusing on central carbon metabolism is used rather than the genome-wide reference network. The visualization of the metabolic differences between the groups with best and worst prognosis allowed to identify and analyze key metabolic properties related to cancer aggressiveness. Some of these properties are shared across different cancer (sub) types, e.g., the up-regulation of nucleic acid and amino acid synthesis, whereas some other appear to be tumor-specific, such as the up- or down-regulation of the phosphoenolpyruvate carboxykinase reaction, which display different patterns in distinct tumor (sub)types. These results might be soon employed to deliver highly automated diagnostic and prognostic strategies for cancer patients.
Assuntos
Biomarcadores Tumorais/metabolismo , Neoplasias/genética , Neoplasias/metabolismo , Análise de Sequência de RNA/métodos , Transcriptoma , Adenocarcinoma/diagnóstico , Adenocarcinoma/metabolismo , Algoritmos , Biópsia , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/metabolismo , Carbono/metabolismo , Análise por Conglomerados , Perfilação da Expressão Gênica , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/metabolismo , Redes e Vias Metabólicas , Reconhecimento Automatizado de Padrão , PrognósticoRESUMO
A key task of genomic surveillance of infectious viral diseases lies in the early detection of dangerous variants. Unexpected help to this end is provided by the analysis of deep sequencing data of viral samples, which are typically discarded after creating consensus sequences. Such analysis allows one to detect intra-host low-frequency mutations, which are a footprint of mutational processes underlying the origination of new variants. Their timely identification may improve public-health decision-making with respect to traditional approaches exploiting consensus sequences. We present the analysis of 220,788 high-quality deep sequencing SARS-CoV-2 samples, showing that many spike and nucleocapsid mutations of interest associated to the most circulating variants, including Beta, Delta, and Omicron, might have been intercepted several months in advance. Furthermore, we show that a refined genomic surveillance system leveraging deep sequencing data might allow one to pinpoint emerging mutation patterns, providing an automated data-driven support to virologists and epidemiologists.
RESUMO
Many large national and transnational studies have been dedicated to the analysis of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) genome, most of which focused on missense and nonsense mutations. However, approximately 30 per cent of the SARS-CoV-2 variants are synonymous, therefore changing the target codon without affecting the corresponding protein sequence. By performing a large-scale analysis of sequencing data generated from almost 400,000 SARS-CoV-2 samples, we show that silent mutations increasing the similarity of viral codons to the human ones tend to fixate in the viral genome overtime. This indicates that SARS-CoV-2 codon usage is adapting to the human host, likely improving its effectiveness in using the human aminoacyl-tRNA set through the accumulation of deceitfully neutral silent mutations. One-Sentence Summary. Synonymous SARS-CoV-2 mutations related to the activity of different mutational processes may positively impact viral evolution by increasing its adaptation to the human codon usage.
RESUMO
We outline the features of the R package SparseSignatures and its application to determine the signatures contributing to mutation profiles of tumor samples. We describe installation details and illustrate a step-by-step approach to (1) prepare the data for signature analysis, (2) determine the optimal parameters, and (3) employ them to determine the signatures and related exposure levels in the point mutation dataset. For complete details on the use and execution of this protocol, please refer to Lal et al. (2021).
Assuntos
Neoplasias , Algoritmos , Humanos , Mutação , Neoplasias/diagnósticoRESUMO
To dissect the mechanisms underlying the inflation of variants in the Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2) genome, we present a large-scale analysis of intra-host genomic diversity, which reveals that most samples exhibit heterogeneous genomic architectures, due to the interplay between host-related mutational processes and transmission dynamics. The decomposition of minor variants profiles unveils three non-overlapping mutational signatures related to nucleotide substitutions and likely ruled by APOlipoprotein B Editing Complex (APOBEC), Reactive Oxygen Species (ROS), and Adenosine Deaminase Acting on RNA (ADAR), highlighting heterogeneous host responses to SARS-CoV-2 infections. A corrected-for-signatures dN/dS analysis demonstrates that such mutational processes are affected by purifying selection, with important exceptions. In fact, several mutations appear to transit toward clonality, defining new clonal genotypes that increase the overall genomic diversity. Furthermore, the phylogenomic analysis shows the presence of homoplasies and supports the hypothesis of transmission of minor variants. This study paves the way for the integrated analysis of intra-host genomic diversity and clinical outcomes of SARS-CoV-2 infections.
RESUMO
We introduce VERSO, a two-step framework for the characterization of viral evolution from sequencing data of viral genomes, which is an improvement on phylogenomic approaches for consensus sequences. VERSO exploits an efficient algorithmic strategy to return robust phylogenies from clonal variant profiles, also in conditions of sampling limitations. It then leverages variant frequency patterns to characterize the intra-host genomic diversity of samples, revealing undetected infection chains and pinpointing variants likely involved in homoplasies. On simulations, VERSO outperforms state-of-the-art tools for phylogenetic inference. Notably, the application to 6,726 amplicon and RNA sequencing samples refines the estimation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution, while co-occurrence patterns of minor variants unveil undetected infection paths, which are validated with contact tracing data. Finally, the analysis of SARS-CoV-2 mutational landscape uncovers a temporal increase of overall genomic diversity and highlights variants transiting from minor to clonal state and homoplastic variants, some of which fall on the spike gene. Available at: https://github.com/BIMIB-DISCo/VERSO.
RESUMO
We present MaREA4Galaxy, a user-friendly tool that allows a user to characterize and to graphically compare groups of samples with different transcriptional regulation of metabolism, as estimated from cross-sectional RNA-seq data. The tool is available as plug-in for the widely-used Galaxy platform for comparative genomics and bioinformatics analyses. MaREA4Galaxy combines three modules. The Expression2RAS module, which, for each reaction of a specified set, computes a Reaction Activity Score (RAS) as a function of the expression level of genes encoding for the associated enzyme. The MaREA (Metabolic Reaction Enrichment Analysis) module that allows to highlight significant differences in reaction activities between specified groups of samples. The Clustering module which employs the RAS computed before as a metric for unsupervised clustering of samples into distinct metabolic subgroups; the Clustering tool provides different clustering techniques and implements standard methods to evaluate the goodness of the results.
RESUMO
One of the key challenges in current cancer research is the development of computational strategies to support clinicians in the identification of successful personalized treatments. Control theory might be an effective approach to this end, as proven by the long-established application to therapy design and testing. In this respect, we here introduce the Control Theory for Therapy Design (CT4TD) framework, which employs optimal control theory on patient-specific pharmacokinetics (PK) and pharmacodynamics (PD) models, to deliver optimized therapeutic strategies. The definition of personalized PK/PD models allows to explicitly consider the physiological heterogeneity of individuals and to adapt the therapy accordingly, as opposed to standard clinical practices. CT4TD can be used in two distinct scenarios. At the time of the diagnosis, CT4TD allows to set optimized personalized administration strategies, aimed at reaching selected target drug concentrations, while minimizing the costs in terms of toxicity and adverse effects. Moreover, if longitudinal data on patients under treatment are available, our approach allows to adjust the ongoing therapy, by relying on simplified models of cancer population dynamics, with the goal of minimizing or controlling the tumor burden. CT4TD is highly scalable, as it employs the efficient dCRAB/RedCRAB optimization algorithm, and the results are robust, as proven by extensive tests on synthetic data. Furthermore, the theoretical framework is general, and it might be applied to any therapy for which a PK/PD model can be estimated, and for any kind of administration and cost. As a proof of principle, we present the application of CT4TD to Imatinib administration in Chronic Myeloid leukemia, in which we adopt a simplified model of cancer population dynamics. In particular, we show that the optimized therapeutic strategies are diversified among patients, and display improvements with respect to the current standard regime.
RESUMO
Laboratory models derived from clinical samples represent a solid platform in preclinical research for drug testing and investigation of disease mechanisms. The integration of these laboratory models with their digital counterparts (i.e., predictive mathematical models) allows to set up digital twins essential to fully exploit their potential to face the enormous molecular complexity of human organisms. In particular, due to the close integration of cell metabolism with all other cellular processes, any perturbation in cellular physiology typically reflect on altered cells metabolic profiling. In this regard, changes in metabolism have been shown, also in our laboratory, to drive a causal role in the emergence of cancer disease. Nevertheless, a unique metabolic program does not describe the altered metabolic profile of all tumour cells due to many causes from genetic variability to intratumour heterogeneous dependency on nutrients consumption and metabolism by multiple co-existing subclones. Currently, fluxomics approaches just match with the necessity of characterizing the overall flux distribution of cells within given samples, by disregarding possible heterogeneous behaviors. For the purpose of stratifying cancer heterogeneous subpopulations, quantification of fluxes at the single-cell level is needed. To this aim, we here present a new computational framework called single-cell Flux Balance Analysis (scFBA) that aims to set up digital metabolic twins in the perspective of being better exploited within a framework that makes also use of laboratory patient cell models. In particular, scFBA aims at integrating single-cell RNA-seq data within computational population models in order to depict a snapshot of the corresponding single-cell metabolic phenotypes at a given moment, together with an unsupervised identification of metabolic subpopulations.
Assuntos
Redes e Vias Metabólicas/fisiologia , Metaboloma/fisiologia , Neoplasias/metabolismo , Humanos , Metabolômica/métodos , Análise de Célula Única/métodos , SoftwareRESUMO
Alterations in the gene expression of organs in contact with the environment may signal exposure to toxins. To identify genes in lung tissue whose expression levels are altered by cigarette smoking, we compared the transcriptomes of lung tissue between 118 ever smokers and 58 never smokers. In all cases, the tissue studied was non-involved lung tissue obtained at lobectomy from patients with lung adenocarcinoma. Of the 17,097 genes analyzed, 357 were differentially expressed between ever smokers and never smokers (FDR < 0.05), including 290 genes that were up-regulated and 67 down-regulated in ever smokers. For 85 genes, the absolute value of the fold change was ≥2. The gene with the smallest FDR was MYO1A (FDR = 6.9 × 10-4) while the gene with the largest difference between groups was FGG (fold change = 31.60). Overall, 100 of the genes identified in this study (38.6%) had previously been found to associate with smoking in at least one of four previously reported datasets of non-involved lung tissue. Seven genes (KMO, CD1A, SPINK5, TREM2, CYBB, DNASE2B, FGG) were differentially expressed between ever and never smokers in all five datasets, with concordant higher expression in ever smokers. Smoking-induced up-regulation of six of these genes was also observed in a transcription dataset from lung tissue of non-cancer patients. Among the three most significant gene networks, two are involved in immunity and inflammation and one in cell death. Overall, this study shows that the lung parenchyma transcriptome of smokers has altered gene expression and that these alterations are reproducible in different series of smokers across countries. Moreover, this study identified a seven-gene panel that reflects lung tissue exposure to cigarette smoke.