Búsqueda | Portal de Búsqueda de la BVS Colombia

Conrad: gene prediction using conditional random fields.

DeCaprio, David; Vinson, Jade P; Pearson, Matthew D; Montgomery, Philip; Doherty, Matthew; Galagan, James E.

Genome Res ; 17(9): 1389-98, 2007 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-17690204

RESUMEN

We present Conrad, the first comparative gene predictor based on semi-Markov conditional random fields (SMCRFs). Unlike the best standalone gene predictors, which are based on generalized hidden Markov models (GHMMs) and trained by maximum likelihood, Conrad is discriminatively trained to maximize annotation accuracy. In addition, unlike the best annotation pipelines, which rely on heuristic and ad hoc decision rules to combine standalone gene predictors with additional information such as ESTs and protein homology, Conrad encodes all sources of information as features and treats all features equally in the training and inference algorithms. Conrad outperforms the best standalone gene predictors in cross-validation and whole chromosome testing on two fungi with vastly different gene structures. The performance improvement arises from the SMCRF's discriminative training methods and their ability to easily incorporate diverse types of information by encoding them as feature functions. On Cryptococcus neoformans, configuring Conrad to reproduce the predictions of a two-species phylo-GHMM closely matches the performance of Twinscan. Enabling discriminative training increases performance, and adding new feature functions further increases performance, achieving a level of accuracy that is unprecedented for this organism. Similar results are obtained on Aspergillus nidulans comparing Conrad versus Fgenesh. SMCRFs are a promising framework for gene prediction because of their highly modular nature, simplifying the process of designing and testing potential indicators of gene structure. Conrad's implementation of SMCRFs advances the state of the art in gene prediction in fungi and provides a robust platform for both current application and future research.

Asunto(s)

Algoritmos , Aspergillus nidulans/genética , Cryptococcus neoformans/genética , Genes Fúngicos , Programas Informáticos , Inteligencia Artificial , Cromosomas Fúngicos , Análisis Discriminante , Funciones de Verosimilitud , Cadenas de Markov , Estándares de Referencia

An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing.

Margulies, Elliott H; Vinson, Jade P; Miller, Webb; Jaffe, David B; Lindblad-Toh, Kerstin; Chang, Jean L; Green, Eric D; Lander, Eric S; Mullikin, James C; Clamp, Michele.

Proc Natl Acad Sci U S A ; 102(13): 4795-800, 2005 Mar 29.

Artículo en Inglés | MEDLINE | ID: mdl-15778292

RESUMEN

With the recent completion of a high-quality sequence of the human genome, the challenge is now to understand the functional elements that it encodes. Comparative genomic analysis offers a powerful approach for finding such elements by identifying sequences that have been highly conserved during evolution. Here, we propose an initial strategy for detecting such regions by generating low-redundancy sequence from a collection of 16 eutherian mammals, beyond the 7 for which genome sequence data are already available. We show that such sequence can be accurately aligned to the human genome and used to identify most of the highly conserved regions. Although not a long-term substitute for generating high-quality genomic sequences from many mammalian species, this strategy represents a practical initial approach for rapidly annotating the most evolutionarily conserved sequences in the human genome, providing a key resource for the systematic study of human genome function.

Asunto(s)

Secuencia Conservada/genética , Genoma Humano , Genómica/métodos , Mamíferos/genética , Análisis de Secuencia de ADN/métodos , Animales , Secuencia de Bases , Biología Computacional , Humanos , Filogenia , Alineación de Secuencia

Assembly of polymorphic genomes: algorithms and application to Ciona savignyi.

Vinson, Jade P; Jaffe, David B; O'Neill, Keith; Karlsson, Elinor K; Stange-Thomann, Nicole; Anderson, Scott; Mesirov, Jill P; Satoh, Nori; Satou, Yutaka; Nusbaum, Chad; Birren, Bruce; Galagan, James E; Lander, Eric S.

Genome Res ; 15(8): 1127-35, 2005 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-16077012

RESUMEN

Whole-genome assembly is now used routinely to obtain high-quality draft sequence for the genomes of species with low levels of polymorphism. However, genome assembly remains extremely challenging for highly polymorphic species. The difficulty arises because two divergent haplotypes are sequenced together, making it difficult to distinguish alleles at the same locus from paralogs at different loci. We present here a method for assembling highly polymorphic diploid genomes that involves assembling the two haplotypes separately and then merging them to obtain a reference sequence. Our method was developed to assemble the genome of the sea squirt Ciona savignyi, which was sequenced to a depth of 12.7 x from a single wild individual. By comparing finished clones of the two haplotypes we determined that the sequenced individual had an extremely high heterozygosity rate, averaging 4.6% with significant regional variation and rearrangements at all physical scales. Applied to these data, our method produced a reference assembly covering 157 Mb, with N50 contig and scaffold sizes of 47 kb and 989 kb, respectively. Alignment of ESTs indicates that 88% of loci are present at least once and 81% exactly once in the reference assembly. Our method represented loci in a single copy more reliably and achieved greater contiguity than a conventional whole-genome assembly method.

Asunto(s)

Algoritmos , Genoma , Urocordados/genética , Animales , Secuencia de Bases , Clonación Molecular/métodos , Diploidia , Etiquetas de Secuencia Expresada , Haplotipos/genética , Heterocigoto , Datos de Secuencia Molecular , Reacción en Cadena de la Polimerasa/métodos , Reproducibilidad de los Resultados

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA