RESUMO
MOTIVATION: Fusion genes are both useful cancer biomarkers and important drug targets. Finding relevant fusion genes is challenging due to genomic instability resulting in a high number of passenger events. To reveal and prioritize relevant gene fusion events we have developed FUsionN Gene Identification toolset (FUNGI) that uses an ensemble of fusion detection algorithms with prioritization and visualization modules. RESULTS: We applied FUNGI to an ovarian cancer dataset of 107 tumor samples from 36 patients. Ten out of 11 detected and prioritized fusion genes were validated. Many of detected fusion genes affect the PI3K-AKT pathway with potential role in treatment resistance. AVAILABILITYAND IMPLEMENTATION: FUNGI and its documentation are available at https://bitbucket.org/alejandra_cervera/fungi as standalone or from Anduril at https://www.anduril.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
BACKGROUND: Latest Next Generation Sequencing technologies opened the way to a novel era of genomic studies, allowing to gain novel insights into multifactorial pathologies as cancer. In particular gene fusion detection and comprehension have been deeply enhanced by these methods. However, state of the art algorithms for gene fusion identification are still challenging. Indeed, they identify huge amounts of poorly overlapping candidates and all the reported fusions should be considered for in lab validation clearly overwhelming wet lab capabilities. RESULTS: In this work we propose a novel methodological approach and tool named FuGePrior for the prioritization of gene fusions from paired-end RNA-Seq data. The proposed pipeline combines state of the art tools for chimeric transcript discovery and prioritization, a series of filtering and processing steps designed by considering modern literature on gene fusions and an analysis on functional reliability of gene fusion structure. CONCLUSIONS: FuGePrior performance has been assessed on two publicly available paired-end RNA-Seq datasets: The first by Edgren and colleagues includes four breast cancer cell lines and a normal breast sample, whereas the second by Ren and colleagues comprises fourteen primary prostate cancer samples and their paired normal counterparts. FuGePrior results accounted for a reduction in the number of fusions output of chimeric transcript discovery tools that ranges from 65 to 75% depending on the considered breast cancer cell line and from 37 to 65% according to the prostate cancer sample under examination. Furthermore, since both datasets come with a partial validation we were able to assess the performance of FuGePrior in correctly prioritizing real gene fusions. Specifically, 25 out of 26 validated fusions in breast cancer dataset have been correctly labelled as reliable and biologically significant. Similarly, 2 out of 5 validated fusions in prostate dataset have been recognized as priority by FuGePrior tool.
Assuntos
Neoplasias da Mama/genética , Neoplasias da Próstata/genética , Proteínas Recombinantes de Fusão/genética , Análise de Sequência de RNA , Algoritmos , Linhagem Celular Tumoral , Bases de Dados Genéticas , Feminino , Genômica , Humanos , Células MCF-7 , Masculino , Proteínas Recombinantes de Fusão/química , Reprodutibilidade dos Testes , SoftwareRESUMO
BACKGROUND: Massive parallel sequencing of transcriptomes, revealed the presence of many miRNAs and miRNAs variants named isomiRs with a potential role in several cellular processes through their interaction with a target mRNA. Many methods and tools have been recently devised to detect and quantify miRNAs from sequencing data. However, all of them are implemented on top of general purpose alignment methods, thus providing poorly accurate results and no information concerning isomiRs and conserved miRNA-mRNA interaction sites. RESULTS: To overcome these limitations we present a novel algorithm named isomiR-SEA, that is able to provide users with very accurate miRNAs expression levels and both isomiRs and miRNA-mRNA interaction sites precise classifications. Tags are mapped on the known miRNAs sequences thanks to a specialized alignment algorithm developed on top of biological evidence concerning miRNAs structure. Specifically, isomiR-SEA checks for miRNA seed presence in the input tags and evaluates, during all the alignment phases, the positions of the encountered mismatches, thus allowing to distinguish among the different isomiRs and conserved miRNA-mRNA interaction sites. CONCLUSIONS: isomiR-SEA performances have been assessed on two public RNA-Seq datasets proving that the implemented algorithm is able to account for more reliable and accurate miRNAs expression levels with respect to those provided by two compared state of the art tools. Moreover, differently from the few methods currently available to perform isomiRs detection, the proposed algorithm implements the evaluation of isomiRs and conserved miRNA-mRNA interaction sites already in the first alignment phases, thus avoiding any additional filtering stages potentially responsible for the loss of useful information.
Assuntos
Algoritmos , MicroRNAs/metabolismo , Oligonucleotídeos Antissenso/metabolismo , RNA Mensageiro/metabolismo , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , MicroRNAs/antagonistas & inibidores , MicroRNAs/genética , RNA Mensageiro/genética , Análise de Sequência de RNA , Transcriptoma , Interface Usuário-ComputadorRESUMO
MOTIVATION: Next-generation sequencing technology allows the detection of genomic structural variations, novel genes and transcript isoforms from the analysis of high-throughput data. In this work, we propose a new framework for the detection of fusion transcripts through short paired-end reads which integrates splicing-driven alignment and abundance estimation analysis, producing a more accurate set of reads supporting the junction discovery and taking into account also not annotated transcripts. Bellerophontes performs a selection of putative junctions on the basis of a match to an accurate gene fusion model. RESULTS: We report the fusion genes discovered by the proposed framework on experimentally validated biological samples of chronic myelogenous leukemia (CML) and on public NCBI datasets, for which Bellerophontes is able to detect the exact junction sequence. With respect to state-of-art approaches, Bellerophontes detects the same experimentally validated fusions, however, it is more selective on the total number of detected fusions and provides a more accurate set of spanning reads supporting the junctions. We finally report the fusions involving non-annotated transcripts found in CML samples. AVAILABILITY AND IMPLEMENTATION: Bellerophontes JAVA/Perl/Bash software implementation is free and available at http://eda.polito.it/bellerophontes/.
Assuntos
Fusão Gênica , Splicing de RNA , Análise de Sequência de RNA/métodos , Software , Algoritmos , Biologia Computacional/métodos , Humanos , Leucemia Mielogênica Crônica BCR-ABL Positiva/genética , RNA/genética , Alinhamento de SequênciaRESUMO
In this article, we present a computational multiscale model for the characterization of subcellular proteins. The model is encoded inside a simulation tool that builds coarse-grained (CG) force fields from atomistic simulations. Equilibrium molecular dynamics simulations on an all-atom model of the actin filament are performed. Then, using the statistical distribution of the distances between pairs of selected groups of atoms at the output of the MD simulations, the force field is parameterized using the Boltzmann inversion approach. This CG force field is further used to characterize the dynamics of the protein via Brownian dynamics simulations. This combination of methods into a single computational tool flow enables the simulation of actin filaments with length up to 400 nm, extending the time and length scales compared to state-of-the-art approaches. Moreover, the proposed multiscale modeling approach allows to investigate the relationship between atomistic structure and changes on the overall dynamics and mechanics of the filament and can be easily (i) extended to the characterization of other subcellular structures and (ii) used to investigate the cellular effects of molecular alterations due to pathological conditions.
Assuntos
Citoesqueleto de Actina/química , Citoesqueleto de Actina/metabolismo , Fenômenos Biomecânicos , Módulo de Elasticidade , Simulação de Dinâmica MolecularRESUMO
Approximately 18% of acute myeloid leukemia (AML) cases express a fusion transcript. However, few fusions are recurrent across AML and the identification of these rare chimeras is of interest to characterize AML patients. Here, we studied the transcriptome of 8 adult AML patients with poorly described chromosomal translocation(s), with the aim of identifying novel and rare fusion transcripts. We integrated RNA-sequencing data with multiple approaches including computational analysis, Sanger sequencing, fluorescence in situ hybridization and in vitro studies to assess the oncogenic potential of the ZEB2-BCL11B chimera. We detected 7 different fusions with partner genes involving transcription factors (OAZ-MAFK, ZEB2-BCL11B), tumor suppressors (SAV1-GYPB, PUF60-TYW1, CNOT2-WT1) and rearrangements associated with the loss of NF1 (CPD-PXT1, UTP6-CRLF3). Notably, ZEB2-BCL11B rearrangements co-occurred with FLT3 mutations and were associated with a poorly differentiated or mixed phenotype leukemia. Although the fusion alone did not transform murine c-Kit+ bone marrow cells, 45.4% of 14q32 non-rearranged AML cases were also BCL11B-positive, suggesting a more general and complex mechanism of leukemogenesis associated with BCL11B expression. Overall, by combining different approaches, we described rare fusion events contributing to the complexity of AML and we linked the expression of some chimeras to genomic alterations hitting known genes in AML.
RESUMO
Transcriptional enhancers function as docking platforms for combinations of transcription factors (TFs) to control gene expression. How enhancer sequences determine nucleosome occupancy, TF recruitment and transcriptional activation in vivo remains unclear. Using ATAC-seq across a panel of Drosophila inbred strains, we found that SNPs affecting binding sites of the TF Grainy head (Grh) causally determine the accessibility of epithelial enhancers. We show that deletion and ectopic expression of Grh cause loss and gain of DNA accessibility, respectively. However, although Grh binding is necessary for enhancer accessibility, it is insufficient to activate enhancers. Finally, we show that human Grh homologs-GRHL1, GRHL2 and GRHL3-function similarly. We conclude that Grh binding is necessary and sufficient for the opening of epithelial enhancers but not for their activation. Our data support a model positing that complex spatiotemporal expression patterns are controlled by regulatory hierarchies in which pioneer factors, such as Grh, establish tissue-specific accessible chromatin landscapes upon which other factors can act.
Assuntos
Proteínas de Ligação a DNA/genética , Proteínas de Drosophila/genética , Nucleossomos/genética , Fatores de Transcrição/genética , Animais , Animais Geneticamente Modificados , Sítios de Ligação , Linhagem Celular Tumoral , Cromatina/genética , Drosophila melanogaster/genética , Elementos Facilitadores Genéticos , Células Epiteliais , Regulação da Expressão Gênica no Desenvolvimento , Humanos , Células MCF-7 , Polimorfismo de Nucleotídeo Único , Ativação TranscricionalRESUMO
In this paper we present VDJSeq-Solver, a methodology and tool to identify clonal lymphocyte populations from paired-end RNA Sequencing reads derived from the sequencing of mRNA neoplastic cells. The tool detects the main clone that characterises the tissue of interest by recognizing the most abundant V(D)J rearrangement among the existing ones in the sample under study. The exact sequence of the clone identified is capable of accounting for the modifications introduced by the enzymatic processes. The proposed tool overcomes limitations of currently available lymphocyte rearrangements recognition methods, working on a single sequence at a time, that are not applicable to high-throughput sequencing data. In this work, VDJSeq-Solver has been applied to correctly detect the main clone and identify its sequence on five Mantle Cell Lymphoma samples; then the tool has been tested on twelve Diffuse Large B-Cell Lymphoma samples. In order to comply with the privacy, ethics and intellectual property policies of the University Hospital and the University of Verona, data is available upon request to supporto.utenti@ateneo.univr.it after signing a mandatory Materials Transfer Agreement. VDJSeq-Solver JAVA/Perl/Bash software implementation is free and available at http://eda.polito.it/VDJSeq-Solver/.
Assuntos
Simulação por Computador , Genes de Imunoglobulinas , Linfoma Difuso de Grandes Células B/diagnóstico , Linfoma de Célula do Manto/diagnóstico , Software , Recombinação V(D)J/genética , Algoritmos , Células Clonais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Linfoma Difuso de Grandes Células B/genética , Linfoma de Célula do Manto/genética , Reação em Cadeia da Polimerase , Análise de Sequência de RNA/métodosRESUMO
In this paper we present a methodology to evaluate the binding free energy of a miRNA:mRNA complex through molecular dynamics (MD)-thermodynamic integration (TI) simulations. We applied our method to the Caenorhabditis elegans let-7 miRNA:lin-41 mRNA complex-a validated miRNA:mRNA interaction-in order to estimate the energetic stability of the structure. To make the miRNA:mRNA simulation possible and realistic, the methodology introduces specific solutions to overcome some of the general challenges of nucleic acid simulations and binding free energy computations that have been discussed widely in many previous research reports. The main features of the proposed methodology are: (1) positioning of the restraints imposed on the simulations in order to guarantee complex stability; (2) optimal sampling of the phase space to achieve satisfactory accuracy in the binding energy value; (3) determination of a suitable trade-off between computational costs and accuracy of binding free energy computation by the assessment of the scalability characteristics of the parallel simulations required for the TI. The experiments carried out demonstrate that MD simulations are a viable strategy for the study of miRNA binding characteristics, opening the way to the development of new computational target prediction methods based on three-dimensional structure information.