RESUMO
Next-Generation Sequencing (NGS) is a high-throughput technology widely applied to genome sequencing and transcriptome profiling. RNA-Seq uses NGS to reveal RNA identities and quantities in a given sample. However, it produces a huge amount of raw data that need to be preprocessed with fast and effective computational methods. RNA-Seq can look at different populations of RNAs, including ncRNAs. Indeed, in the last few years, several ncRNAs pipelines have been developed for ncRNAs analysis from RNA-Seq experiments. In this paper, we analyze eight recent pipelines (iSmaRT, iSRAP, miARma-Seq, Oasis 2, SPORTS1.0, sRNAnalyzer, sRNApipe, sRNA workbench) which allows the analysis not only of single specific classes of ncRNAs but also of more than one ncRNA classes. Our systematic performance evaluation aims at guiding users to select the appropriate pipeline for processing each ncRNA class, focusing on three key points: (i) accuracy in ncRNAs identification, (ii) accuracy in read count estimation and (iii) deployment and ease of use.
Assuntos
Benchmarking , RNA não Traduzido , RNA-Seq , Sequência de Bases , Mapeamento Cromossômico , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA , RNA não Traduzido/genética , Análise de Sequência de RNA/métodos , Software , Sequenciamento do ExomaRESUMO
BACKGROUND: Protein kinases are enzymes controlling different cellular functions. Genetic alterations often result in kinase dysregulation, making kinases a very attractive class of druggable targets in several human diseases. Existing approved drugs still target a very limited portion of the human 'kinome', demanding a broader functional knowledge of individual and co-expressed kinase patterns in physiologic and pathologic settings. The development of novel rapid and cost-effective methods for kinome screening is therefore highly desirable, potentially leading to the identification of novel kinase drug targets. RESULTS: In this work, we describe the development of KING-REX (KINase Gene RNA EXpression), a comprehensive kinome RNA targeted custom assay-based panel designed for Next Generation Sequencing analysis, coupled with a dedicated data analysis pipeline. We have conceived KING-REX for the gene expression analysis of 512 human kinases; for 319 kinases, paired assays and custom analysis pipeline features allow the evaluation of 3'- and 5'-end transcript imbalances as readout for the prediction of gene rearrangements. Validation tests on cell line models harboring known gene fusions demonstrated a comparable accuracy of KING-REX gene expression assessment as in whole transcriptome analyses, together with a robust detection of transcript portion imbalances in rearranged kinases, even in complex RNA mixtures or in degraded RNA. CONCLUSIONS: These results support the use of KING-REX as a rapid and cost effective kinome investigation tool in the field of kinase target identification for applications in cancer biology and other human diseases.
Assuntos
Perfilação da Expressão Gênica/métodos , Proteínas Quinases/genética , Fusão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Proteínas Quinases/metabolismo , Estabilidade de RNARESUMO
BACKGROUND: Kinase over-expression and activation as a consequence of gene amplification or gene fusion events is a well-known mechanism of tumorigenesis. The search for novel rearrangements of kinases or other druggable genes may contribute to understanding the biology of cancerogenesis, as well as lead to the identification of new candidate targets for drug discovery. However this requires the ability to query large datasets to identify rare events occurring in very small fractions (1-3 %) of different tumor subtypes. This task is different from what is normally done by conventional tools that are able to find genes differentially expressed between two experimental conditions. RESULTS: We propose a computational method aimed at the automatic identification of genes which are selectively over-expressed in a very small fraction of samples within a specific tissue. The method does not require a healthy counterpart or a reference sample for the analysis and can be therefore applied also to transcriptional data generated from cell lines. In our implementation the tool can use gene-expression data from microarray experiments, as well as data generated by RNASeq technologies. CONCLUSIONS: The method was implemented as a publicly available, user-friendly tool called KAOS (Kinase Automatic Outliers Search). The tool enables the automatic execution of iterative searches for the identification of extreme outliers and for the graphical visualization of the results. Filters can be applied to select the most significant outliers. The performance of the tool was evaluated using a synthetic dataset and compared to state-of-the-art tools. KAOS performs particularly well in detecting genes that are overexpressed in few samples or when an extreme outlier stands out on a high variable expression background. To validate the method on real case studies, we used publicly available tumor cell line microarray data, and we were able to identify genes which are known to be overexpressed in specific samples, as well as novel ones.
Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Neoplasias/enzimologia , Neoplasias/genética , Fosfotransferases/genética , Algoritmos , Automação/métodos , Linhagem Celular Tumoral , Expressão Gênica , HumanosRESUMO
MOTIVATION: Metabolic engineering algorithms provide means to optimize a biological process leading to the improvement of a biotechnological interesting molecule. Therefore, it is important to understand how to act in a metabolic pathway in order to have the best results in terms of productions. In this work, we present a computational framework that searches for optimal and robust microbial strains that are able to produce target molecules. Our framework performs three tasks: it evaluates the parameter sensitivity of the microbial model, searches for the optimal genetic or fluxes design and finally calculates the robustness of the microbial strains. We are capable to combine the exploration of species, reactions, pathways and knockout parameter spaces with the Pareto-optimality principle. RESULTS: Our framework provides also theoretical and practical guidelines for design automation. The statistical cross comparison of our new optimization procedures, performed with respect to currently widely used algorithms for bacteria (e.g. Escherichia coli) over different multiple functions, reveals good performances over a variety of biotechnological products. AVAILABILITY: http://www.dmi.unict.it/nicosia/pathDesign.html. CONTACT: nicosia@dmi.unict.it or pl219@cam.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Biologia Computacional/métodos , Engenharia Metabólica/métodos , Biotecnologia/métodos , Escherichia coli/genética , Escherichia coli/metabolismo , Técnicas de Inativação de Genes , Redes e Vias MetabólicasRESUMO
Inhibition of kinase gene fusions (KGFs) has proven successful in cancer treatment and continues to represent an attractive research area, due to kinase druggability and clinical validation. Indeed, literature and public databases report a remarkable number of KGFs as potential drug targets, often identified by in vitro characterization of tumor cell line models and confirmed also in clinical samples. However, KGF molecular and experimental information can sometimes be sparse and partially overlapping, suggesting the need for a specific annotation database of KGFs, conveniently condensing all the molecular details that can support targeted drug development pipelines and diagnostic approaches. Here, we describe KuNG FU (KiNase Gene FUsion), a manually curated database collecting detailed annotations on KGFs that were identified and experimentally validated in human cancer cell lines from multiple sources, exclusively focusing on in-frame KGF events retaining an intact kinase domain, representing potentially active driver kinase targets. To our knowledge, KuNG FU represents to date the largest freely accessible homogeneous and curated database of kinase gene fusions in cell line models.
Assuntos
Bases de Dados Genéticas , Fusão Gênica , Neoplasias/genética , Proteínas Quinases/genética , Linhagem Celular Tumoral , Curadoria de Dados , Mineração de Dados , Conjuntos de Dados como Assunto , HumanosRESUMO
Chordomas are rare, slowly growing tumors with high medical need, arising in the axial skeleton from notochord remnants. The transcription factor "brachyury" represents a distinctive molecular marker and a key oncogenic driver of chordomas. Tyrosine kinase receptors are also expressed, but so far kinase inhibitors have not shown clear clinical efficacy in chordoma patients. The need for effective therapies is extremely high, but the paucity of established chordoma cell lines has limited preclinical research. Here we describe the isolation of the new Chor-IN-1 cell line from a recurrent sacral chordoma and its characterization as compared to other chordoma cell lines. Chor-IN-1 displays genomic identity to the tumor of origin and has morphological features, growth characteristics and chromosomal abnormalities typical of chordoma, with expression of brachyury and other relevant biomarkers. Chor-IN-1 gene variants, copy number alterations and kinome gene expression were analyzed in comparison to other four chordoma cell lines, generating large scale DNA and mRNA genomic data that can be exploited for the identification of novel pharmacological targets and candidate predictive biomarkers of drug sensitivity in chordoma. The establishment of this new, well characterized chordoma cell line provides a useful tool for the identification of drugs active in chordoma.
Assuntos
Cordoma/genética , Genômica , Biópsia , Linhagem Celular Tumoral , Cordoma/metabolismo , Cordoma/patologia , Aberrações Cromossômicas , Variações do Número de Cópias de DNA , Regulação Neoplásica da Expressão Gênica , Genômica/métodos , Humanos , Imuno-Histoquímica , Cariótipo , Masculino , Pessoa de Meia-IdadeRESUMO
Analyzing and optimizing biological models is often identified as a research priority in biomedical engineering. An important feature of a model should be the ability to find the best condition in which an organism has to be grown in order to reach specific optimal output values chosen by the researcher. In this work, we take into account a mitochondrial model analyzed with flux-balance analysis. The optimal design and assessment of these models is achieved through single- and/or multi-objective optimization techniques driven by epsilon-dominance and identifiability analysis. Our optimization algorithm searches for the values of the flux rates that optimize multiple cellular functions simultaneously. The optimization of the fluxes of the metabolic network includes not only input fluxes, but also internal fluxes. A faster convergence process with robust candidate solutions is permitted by a relaxed Pareto dominance, regulating the granularity of the approximation of the desired Pareto front. We find that the maximum ATP production is linked to a total consumption of NADH, and reaching the maximum amount of NADH leads to an increasing request of NADH from the external environment. Furthermore, the identifiability analysis characterizes the type and the stage of three monogenic diseases. Finally, we propose a new methodology to extend any constraint-based model using protein abundances.
Assuntos
Análise do Fluxo Metabólico , Mitocôndrias/metabolismo , Trifosfato de Adenosina/biossíntese , Algoritmos , Complexo Cetoglutarato Desidrogenase/deficiência , Proteínas Mitocondriais/metabolismo , Modelos Biológicos , NAD/metabolismo , Succinato Desidrogenase/genéticaRESUMO
Recent advances in synthetic biology call for robust, flexible and efficient in silico optimization methodologies. We present a Pareto design approach for the bi-level optimization problem associated to the overproduction of specific metabolites in Escherichia coli. Our method efficiently explores the high dimensional genetic manipulation space, finding a number of trade-offs between synthetic and biological objectives, hence furnishing a deeper biological insight to the addressed problem and important results for industrial purposes. We demonstrate the computational capabilities of our Pareto-oriented approach comparing it with state-of-the-art heuristics in the overproduction problems of i) 1,4-butanediol, ii) myristoyl-CoA, i ii) malonyl-CoA , iv) acetate and v) succinate. We show that our algorithms are able to gracefully adapt and scale to more complex models and more biologically-relevant simulations of the genetic manipulations allowed. The Results obtained for 1,4-butanediol overproduction significantly outperform results previously obtained, in terms of 1,4-butanediol to biomass formation ratio and knock-out costs. In particular overproduction percentage is of +662.7%, from 1.425 mmolh⻹gDW⻹ (wild type) to 10.869 mmolh⻹gDW⻹, with a knockout cost of 6. Whereas, Pareto-optimal designs we have found in fatty acid optimizations strictly dominate the ones obtained by the other methodologies, e.g., biomass and myristoyl-CoA exportation improvement of +21.43% (0.17 h⻹) and +5.19% (1.62 mmolh⻹gDW⻹), respectively. Furthermore CPU time required by our heuristic approach is more than halved. Finally we implement pathway oriented sensitivity analysis, epsilon-dominance analysis and robustness analysis to enhance our biological understanding of the problem and to improve the optimization algorithm capabilities.
Assuntos
Escherichia coli/metabolismo , Modelos Biológicos , Acetatos/metabolismo , Acil Coenzima A/metabolismo , Butileno Glicóis/metabolismo , Ácidos Graxos/metabolismo , Malonil Coenzima A/metabolismo , Ácido Succínico/metabolismo , Biologia Sintética/métodosRESUMO
The bioenergetic activity of mitochondria can be thoroughly investigated by using computational methods. In particular, in our work we focus on ATP and NADH, namely the metabolites representing the production of energy in the cell. We develop a computational framework to perform an exhaustive investigation at the level of species, reactions, genes and metabolic pathways. The framework integrates several methods implementing the state-of-the-art algorithms for many-objective optimization, sensitivity, and identifiability analysis applied to biological systems. We use this computational framework to analyze three case studies related to the human mitochondria and the algal metabolism of Chlamydomonas reinhardtii, formally described with algebraic differential equations or flux balance analysis. Integrating the results of our framework applied to interacting organelles would provide a general-purpose method for assessing the production of energy in a biological network.
Assuntos
Metabolismo Energético , Redes e Vias Metabólicas , Mitocôndrias/metabolismo , Modelos Biológicos , Algoritmos , Chlamydomonas reinhardtii/metabolismoRESUMO
In low and high eukaryotes, energy is collected or transformed in compartments, the organelles. The rich variety of size, characteristics, and density of the organelles makes it difficult to build a general picture. In this paper, we make use of the Pareto-front analysis to investigate the optimization of energy metabolism in mitochondria and chloroplasts. Using the Pareto optimality principle, we compare models of organelle metabolism on the basis of single- and multiobjective optimization, approximation techniques (the Bayesian Automatic Relevance Determination), robustness, and pathway sensitivity analysis. Finally, we report the first analysis of the metabolic model for the hydrogenosome of Trichomonas vaginalis, which is found in several protozoan parasites. Our analysis has shown the importance of the Pareto optimality for such comparison and for insights into the evolution of the metabolism from cytoplasmic to organelle bound, involving a model order reduction. We report that Pareto fronts represent an asymptotic analysis useful to describe the metabolism of an organism aimed at maximizing concurrently two or more metabolite concentrations.
Assuntos
Metabolismo Energético/fisiologia , Modelos Biológicos , Organelas/metabolismo , Trifosfato de Adenosina/metabolismo , Algoritmos , Anaerobiose , Biologia Computacional , Trichomonas vaginalisRESUMO
In this work, we develop methodologies for analyzing and cross comparing metabolic models. We investigate three important metabolic networks to discuss the complexity of biological organization of organisms, modeling, and system properties. In particular, we analyze these metabolic networks because of their biotechnological and basic science importance: the photosynthetic carbon metabolism in a general leaf, the Rhodobacter spheroides bacterium, and the Chlamydomonas reinhardtii alga. We adopt single- and multi-objective optimization algorithms to maximize the CO 2 uptake rate and the production of metabolites of industrial interest or for ecological purposes. We focus both on the level of genes (e.g., finding genetic manipulations to increase the production of one or more metabolites) and on finding concentration enzymes for improving the CO 2 consumption. We find that R. spheroides is able to absorb an amount of CO 2 until 57.452 mmol h (-1) gDW (-1) , while C. reinhardtii obtains a maximum of 6.7331. We report that the Pareto front analysis proves extremely useful to compare different organisms, as well as providing the possibility to investigate them with the same framework. By using the sensitivity and robustness analysis, our framework identifies the most sensitive and fragile components of the biological systems we take into account, allowing us to compare their models. We adopt the identifiability analysis to detect functional relations among enzymes; we observe that RuBisCO, GAPDH, and FBPase belong to the same functional group, as suggested also by the sensitivity analysis.