Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 23(5): 545-54, 2007 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-17237054

RESUMEN

MOTIVATION: Hidden Markov models (HMMs) and generalized HMMs been successfully applied to many problems, but the standard Viterbi algorithm for computing the most probable interpretation of an input sequence (known as decoding) requires memory proportional to the length of the sequence, which can be prohibitive. Existing approaches to reducing memory usage either sacrifice optimality or trade increased running time for reduced memory. RESULTS: We developed two novel decoding algorithms, Treeterbi and Parallel Treeterbi, and implemented them in the TWINSCAN/N-SCAN gene-prediction system. The worst case asymptotic space and time are the same as for standard Viterbi, but in practice, Treeterbi optimally decodes arbitrarily long sequences with generalized HMMs in bounded memory without increasing running time. Parallel Treeterbi uses the same ideas to split optimal decoding across processors, dividing latency to completion by approximately the number of available processors with constant average overhead per processor. Using these algorithms, we were able to optimally decode all human chromosomes with N-SCAN, which increased its accuracy relative to heuristic solutions. We also implemented Treeterbi for Pairagon, our pair HMM based cDNA-to-genome aligner. AVAILABILITY: The TWINSCAN/N-SCAN/PAIRAGON open source software package is available from http://genes.cse.wustl.edu.


Asunto(s)
Algoritmos , Genes , Genómica/métodos , Cromosomas Humanos , Biología Computacional , ADN Complementario/química , Humanos , Cadenas de Markov , Lenguajes de Programación
2.
BMC Bioinformatics ; 4: 50, 2003 Oct 17.
Artículo en Inglés | MEDLINE | ID: mdl-14565849

RESUMEN

SUMMARY: Eval is a flexible tool for analyzing the performance of gene annotation systems. It provides summaries and graphical distributions for many descriptive statistics about any set of annotations, regardless of their source. It also compares sets of predictions to standard annotations and to one another. Input is in the standard Gene Transfer Format (GTF). Eval can be run interactively or via the command line, in which case output options include easily parsable tab-delimited files. AVAILABILITY: To obtain the module package with documentation, go to http://genes.cse.wustl.edu/ and follow links for Resources, then Software. Please contact brent@cse.wustl.edu


Asunto(s)
Genoma , Programas Informáticos/clasificación , Biología Computacional/normas , Biología Computacional/estadística & datos numéricos , Gráficos por Computador/normas , Gráficos por Computador/estadística & datos numéricos
3.
Genome Res ; 13(1): 46-54, 2003 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-12529305

RESUMEN

The availability of draft sequences for both the mouse and human genomes makes it possible, for the first time, to annotate whole mammalian genomes using comparative methods. TWINSCAN is a gene-prediction system that combines the methods of single-genome predictors like GENSCAN with information derived from genome comparison, thereby improving accuracy. Because TWINSCAN uses genomic sequence only, it is less biased toward highly and/or ubiquitously expressed genes than GENEWISE, GENOMESCAN, and other methods based on evidence derived from transcripts. We show that TWINSCAN improves gene prediction in human using intermediate products from various stages of the sequencing and analysis of the mouse genome, from low-redundancy, whole-genome shotgun reads to the draft assembly and the synteny map. TWINSCAN improves on the prior state of the art even when alignments from only 1X coverage of the mouse genome are available. Gene prediction accuracy improves steadily from 1X through 3X, more slowly from 3X to 4X, and relatively little thereafter. The assembly and the synteny map greatly speed the computations, however. Our human annotation using the mouse assembly is conservative, predicting only 25,622 genes, and appears to be one of the best de novo annotations of the human genome to date.


Asunto(s)
Mapeo Cromosómico/métodos , Predicción/métodos , Genes/genética , Genoma Humano , Genoma , Sintenía/genética , Animales , Composición de Base/genética , Biología Computacional/métodos , Biología Computacional/normas , Exones/genética , Secuencia Rica en GC/genética , Humanos , Ratones , Valor Predictivo de las Pruebas
4.
Proc Natl Acad Sci U S A ; 100(3): 1140-5, 2003 Feb 04.
Artículo en Inglés | MEDLINE | ID: mdl-12552088

RESUMEN

A primary motivation for sequencing the mouse genome was to accelerate the discovery of mammalian genes by using sequence conservation between mouse and human to identify coding exons. Achieving this goal proved challenging because of the large proportion of the mouse and human genomes that is apparently conserved but apparently does not code for protein. We developed a two-stage procedure that exploits the mouse and human genome sequences to produce a set of genes with a much higher rate of experimental verification than previously reported prediction methods. RT-PCR amplification and direct sequencing applied to an initial sample of mouse predictions that do not overlap previously known genes verified the regions flanking one intron in 139 predictions, with verification rates reaching 76%. On average, the confirmed predictions show more restricted expression patterns than the mouse orthologs of known human genes, and two-thirds lack homologs in fish genomes, demonstrating the sensitivity of this dual-genome approach to hard-to-find genes. We verified 112 previously unknown homologs of known proteins, including two homeobox proteins relevant to developmental biology, an aquaporin, and a homolog of dystrophin. We estimate that transcription and splicing can be verified for >1,000 gene predictions identified by this method that do not overlap known genes. This is likely to constitute a significant fraction of the previously unknown, multiexon mammalian genes.


Asunto(s)
Genoma Humano , Genoma , Secuencia de Aminoácidos , Animales , Exones , Técnicas Genéticas , Humanos , Intrones , Ratones , Datos de Secuencia Molecular , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Análisis de Secuencia de ADN , Homología de Secuencia de Aminoácido , Distribución Tisular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA