Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Bioinformatics ; 37(23): 4307-4313, 2021 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-34255826

RESUMEN

MOTIVATION: Accurate assembly of RNA-seq is a crucial step in many analytic tasks such as gene annotation or expression studies. Despite ongoing research, progress on traditional single sample assembly has brought no major breakthrough. Multi-sample RNA-Seq experiments provide more information than single sample datasets and thus constitute a promising area of research. Yet, this advantage is challenging to utilize due to the large amount of accumulating errors. RESULTS: We present an extension to Ryuto enabling the reconstruction of consensus transcriptomes from multiple RNA-seq datasets, incorporating consensus calling at low level features. We report stable improvements already at three replicates. Ryuto outperforms competing approaches, providing a better and user-adjustable sensitivity-precision trade-off. Ryuto's unique ability to utilize a (incomplete) reference for multi sample assemblies greatly increases precision. We demonstrate benefits for differential expression analysis.Ryuto consistently improves assembly on replicates of the same tissue independent of filter settings, even when mixing conditions or time series. Consensus voting in Ryuto is especially effective at high precision assembly, while Ryuto's conventional mode can reach higher recall. AVAILABILITY AND IMPLEMENTATION: Ryuto is available at https://github.com/studla/RYUTO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Análisis de Secuencia de ARN , RNA-Seq , Anotación de Secuencia Molecular
2.
BMC Bioinformatics ; 20(1): 190, 2019 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-30991937

RESUMEN

BACKGROUND: The rapid increase in High-throughput sequencing of RNA (RNA-seq) has led to tremendous improvements in the detection and reconstruction of both expressed coding and non-coding RNA transcripts. Yet, the complete and accurate annotation of the complex transcriptional output of not only the human genome has remained elusive. One of the critical bottlenecks in this endeavor is the computational reconstruction of transcript structures, due to high noise levels, technological limits, and other biases in the raw data. RESULTS: We introduce several new and improved algorithms in a novel workflow for transcript assembly and quantification. We propose an extension of the common splice graph framework that combines aspects of overlap and bin graphs and makes it possible to efficiently use both multi-splice and paired-end information to the fullest extent. Phasing information of reads is used to further resolve loci. The decomposition of read coverage patterns is modeled as a minimum-cost flow problem to account for the unavoidable non-uniformities of RNA-seq data. CONCLUSION: Its performance compares favorably with state of the art methods on both simulated and real-life datasets. Ryuto calls 1-4% more true transcripts, while calling 5-35% less false predictions compared to the next best competitor.


Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN Mensajero , Análisis de Secuencia de ARN/métodos , Transcriptoma/genética , Algoritmos , Perfilación de la Expresión Génica/métodos , Humanos , ARN Mensajero/análisis , ARN Mensajero/genética
3.
J Math Biol ; 77(2): 313-341, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-29260295

RESUMEN

Clusters of paralogous genes such as the famous HOX cluster of developmental transcription factors tend to evolve by stepwise duplication of its members, often involving unequal crossing over. Gene conversion and possibly other mechanisms of concerted evolution further obfuscate the phylogenetic relationships. As a consequence, it is very difficult or even impossible to disentangle the detailed history of gene duplications in gene clusters. In this contribution we show that the expansion of gene clusters by unequal crossing over as proposed by Walter Gehring leads to distinctive patterns of genetic distances, namely a subclass of circular split systems. Furthermore, when the gene cluster was left undisturbed by genome rearrangements, the shortest Hamiltonian paths with respect to genetic distances coincide with the genomic order. This observation can be used to detect ancient genomic rearrangements of gene clusters and to distinguish gene clusters whose evolution was dominated by unequal crossing over within genes from those that expanded through other mechanisms.


Asunto(s)
Modelos Genéticos , Familia de Multigenes , Alcohol Deshidrogenasa/genética , Algoritmos , Animales , Simulación por Computador , Intercambio Genético , Evolución Molecular , Duplicación de Gen , Genes Homeobox , Genoma , Humanos , Conceptos Matemáticos , Filogenia , Recombinación Genética
4.
Nucleic Acids Res ; 44(D1): D38-47, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26538599

RESUMEN

Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.


Asunto(s)
Biología Computacional , Sistema de Registros , Curaduría de Datos , Programas Informáticos
5.
Algorithms Mol Biol ; 16(1): 8, 2021 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-34074310

RESUMEN

BACKGROUND: Advances in genome sequencing over the last years have lead to a fundamental paradigm shift in the field. With steadily decreasing sequencing costs, genome projects are no longer limited by the cost of raw sequencing data, but rather by computational problems associated with genome assembly. There is an urgent demand for more efficient and and more accurate methods is particular with regard to the highly complex and often very large genomes of animals and plants. Most recently, "hybrid" methods that integrate short and long read data have been devised to address this need. RESULTS: LazyB is such a hybrid genome assembler. It has been designed specificially with an emphasis on utilizing low-coverage short and long reads. LazyB starts from a bipartite overlap graph between long reads and restrictively filtered short-read unitigs. This graph is translated into a long-read overlap graph G. Instead of the more conventional approach of removing tips, bubbles, and other local features, LazyB stepwisely extracts subgraphs whose global properties approach a disjoint union of paths. First, a consistently oriented subgraph is extracted, which in a second step is reduced to a directed acyclic graph. In the next step, properties of proper interval graphs are used to extract contigs as maximum weight paths. These path are translated into genomic sequences only in the final step. A prototype implementation of LazyB, entirely written in python, not only yields significantly more accurate assemblies of the yeast and fruit fly genomes compared to state-of-the-art pipelines but also requires much less computational effort. CONCLUSIONS: LazyB is new low-cost genome assembler that copes well with large genomes and low coverage. It is based on a novel approach for reducing the overlap graph to a collection of paths, thus opening new avenues for future improvements. AVAILABILITY: The LazyB prototype is available at https://github.com/TGatter/LazyB .

6.
Life (Basel) ; 11(12)2021 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-34947908

RESUMEN

Tunicates are the sister group of vertebrates and thus occupy a key position for investigations into vertebrate innovations as well as into the consequences of the vertebrate-specific genome duplications. Nevertheless, tunicate genomes have not been studied extensively in the past, and comparative studies of tunicate genomes have remained scarce. The carpet sea squirt Didemnum vexillum, commonly known as "sea vomit", is a colonial tunicate considered an invasive species with substantial ecological and economical risk. We report the assembly of the D. vexillum genome using a hybrid approach that combines 28.5 Gb Illumina and 12.35 Gb of PacBio data. The new hybrid scaffolded assembly has a total size of 517.55 Mb that increases contig length about eightfold compared to previous, Illumina-only assembly. As a consequence of an unusually high genetic diversity of the colonies and the moderate length of the PacBio reads, presumably caused by the unusually acidic milieu of the tunic, the assembly is highly fragmented (L50 = 25,284, N50 = 6539). It is sufficient, however, for comprehensive annotations of both protein-coding genes and non-coding RNAs. Despite its shortcomings, the draft assembly of the "sea vomit" genome provides a valuable resource for comparative tunicate genomics and for the study of the specific properties of colonial ascidians.

7.
Mol Biotechnol ; 52(2): 123-8, 2012 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-22161280

RESUMEN

The synthesis of complete genes is becoming a more and more popular approach in heterologous gene expression. Reasons for this are the decreasing prices and the numerous advantages in comparison to classic molecular cloning methods. Two of these advantages are the possibility to adapt the codon usage to the host organism and the option to introduce restriction enzyme target sites of choice. C.U.R.R.F. (Codon Usage regarding Restriction Finder) is a free Java(®)-based software program which is able to detect possible restriction sites in both coding and non-coding DNA sequences by introducing multiple silent or non-silent mutations, respectively. The deviation of an alternative sequence containing a desired restriction motive from the sequence with the optimal codon usage is considered during the search of potential restriction sites in coding DNA and mRNA sequences as well as protein sequences. C.U.R.R.F is available at http://www.zvm.tu-dresden.de/die_tu_dresden/fakultaeten/fakultaet_mathematik_und_naturwissenschaften/fachrichtung_biologie/mikrobiologie/allgemeine_mikrobiologie/currf.


Asunto(s)
Codón , Enzimas de Restricción del ADN/química , ADN/química , Programas Informáticos , Algoritmos , Secuencia de Bases/genética , Biología Computacional/métodos , ADN/metabolismo , Enzimas de Restricción del ADN/metabolismo , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA