Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Proc Natl Acad Sci U S A ; 113(47): E7428-E7437, 2016 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-27810962

RESUMEN

The ability to rationally manipulate the transcriptional states of cells would be of great use in medicine and bioengineering. We have developed an algorithm, NetSurgeon, which uses genome-wide gene-regulatory networks to identify interventions that force a cell toward a desired expression state. We first validated NetSurgeon extensively on existing datasets. Next, we used NetSurgeon to select transcription factor deletions aimed at improving ethanol production in Saccharomyces cerevisiae cultures that are catabolizing xylose. We reasoned that interventions that move the transcriptional state of cells using xylose toward that of cells producing large amounts of ethanol from glucose might improve xylose fermentation. Some of the interventions selected by NetSurgeon successfully promoted a fermentative transcriptional state in the absence of glucose, resulting in strains with a 2.7-fold increase in xylose import rates, a 4-fold improvement in xylose integration into central carbon metabolism, or a 1.3-fold increase in ethanol production rate. We conclude by presenting an integrated model of transcriptional regulation and metabolic flux that will enable future efforts aimed at improving xylose fermentation to prioritize functional regulators of central carbon metabolism.


Asunto(s)
Eliminación de Gen , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/crecimiento & desarrollo , Factores de Transcripción/genética , Algoritmos , Etanol/metabolismo , Fermentación , Redes Reguladoras de Genes , Glucosa/metabolismo , Ingeniería Metabólica , Modelos Genéticos , Saccharomyces cerevisiae/genética , Transcriptoma , Xilosa/metabolismo
2.
Nature ; 450(7167): 203-18, 2007 Nov 08.
Artículo en Inglés | MEDLINE | ID: mdl-17994087

RESUMEN

Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.


Asunto(s)
Drosophila/clasificación , Drosophila/genética , Evolución Molecular , Genes de Insecto/genética , Genoma de los Insectos/genética , Genómica , Filogenia , Animales , Codón/genética , Elementos Transponibles de ADN/genética , Drosophila/inmunología , Drosophila/metabolismo , Proteínas de Drosophila/genética , Orden Génico/genética , Genoma Mitocondrial/genética , Inmunidad/genética , Familia de Multigenes/genética , ARN no Traducido/genética , Reproducción/genética , Alineación de Secuencia , Análisis de Secuencia de ADN , Sintenía/genética
3.
Bioinformatics ; 25(13): 1587-93, 2009 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-19414532

RESUMEN

MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. RESULTS: We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. AVAILABILITY: Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/


Asunto(s)
ADN Complementario/química , Genómica/métodos , Alineación de Secuencia/métodos , Animales , Secuencia de Bases , Humanos , Cadenas de Markov , Ratones , Ratas
4.
Genome Biol ; 7 Suppl 1: S5.1-10, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16925839

RESUMEN

BACKGROUND: This paper describes Pairagon+N-SCAN_EST, a gene annotation pipeline that uses only native alignments. For each expressed sequence it chooses the best genomic alignment. Systems like ENSEMBL and ExoGean rely on trans alignments, in which expressed sequences are aligned to the genomic loci of putative homologs. Trans alignments contain a high proportion of mismatches, gaps, and/or apparently unspliceable introns, compared to alignments of cDNA sequences to their native loci. The Pairagon+N-SCAN_EST pipeline's first stage is Pairagon, a cDNA-to-genome alignment program based on a PairHMM probability model. This model relies on prior knowledge, such as the fact that introns must begin with GT, GC, or AT and end with AG or AC. It produces very precise alignments of high quality cDNA sequences. In the genomic regions between Pairagon's cDNA alignments, the pipeline combines EST alignments with de novo gene prediction by using N-SCAN_EST. N-SCAN_EST is based on a generalized HMM probability model augmented with a phylogenetic conservation model and EST alignments. It can predict complete transcripts by extending or merging EST alignments, but it can also predict genes in regions without EST alignments. Because they are based on probability models, both Pairagon and N-SCAN_EST can be trained automatically for new genomes and data sets. RESULTS: On the ENCODE regions of the human genome, Pairagon+N-SCAN_EST was as accurate as any other system tested in the EGASP assessment, including ENSEMBL and ExoGean. CONCLUSION: With sufficient mRNA/EST evidence, genome annotation without trans alignments can compete successfully with systems like ENSEMBL and ExoGean, which use trans alignments.


Asunto(s)
Biología Computacional/métodos , Etiquetas de Secuencia Expresada , Genómica/métodos , Alineación de Secuencia , Programas Informáticos , Secuencia de Bases , Biología Computacional/normas , ADN Complementario/análisis , Genes , Genoma Humano , Genómica/normas , Humanos , Modelos Estadísticos , Sistemas de Lectura Abierta , Filogenia , ARN Mensajero/análisis
5.
Genome Biol ; 7 Suppl 1: S3.1-13, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-16925837

RESUMEN

BACKGROUND: This study analyzes the predictions of a number of promoter predictors on the ENCODE regions of the human genome as part of the ENCODE Genome Annotation Assessment Project (EGASP). The systems analyzed operate on various principles and we assessed the effectiveness of different conceptual strategies used to correlate produced promoter predictions with the manually annotated 5' gene ends. RESULTS: The predictions were assessed relative to the manual HAVANA annotation of the 5' gene ends. These 5' gene ends were used as the estimated reference transcription start sites. With the maximum allowed distance for predictions of 1,000 nucleotides from the reference transcription start sites, the sensitivity of predictors was in the range 32% to 56%, while the positive predictive value was in the range 79% to 93%. The average distance mismatch of predictions from the reference transcription start sites was in the range 259 to 305 nucleotides. At the same time, using transcription start site estimates from DBTSS and H-Invitational databases as promoter predictions, we obtained a sensitivity of 58%, a positive predictive value of 92%, and an average distance from the annotated transcription start sites of 117 nucleotides. In this experiment, the best performing promoter predictors were those that combined promoter prediction with gene prediction. The main reason for this is the reduced promoter search space that resulted in smaller numbers of false positive predictions. CONCLUSION: The main finding, now supported by comprehensive data, is that the accuracy of human promoter predictors for high-throughput annotation purposes can be significantly improved if promoter prediction is combined with gene prediction. Based on the lessons learned in this experiment, we propose a framework for the preparation of the next similar promoter prediction assessment.


Asunto(s)
Biología Computacional/métodos , Genoma Humano , Genómica/métodos , Regiones Promotoras Genéticas , Biología Computacional/normas , Bases de Datos Genéticas , Genes , Genómica/normas , Humanos , ARN Mensajero/análisis , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN
6.
Genome Res ; 15(5): 742-7, 2005 May.
Artículo en Inglés | MEDLINE | ID: mdl-15867435

RESUMEN

The retrainable, comparative gene predictor N-SCAN integrates multigenome modeling and 5' untranslated region (5' UTR) modeling. In this article, we evaluate N-SCAN's transcription-start site (TSS) and first exon predictions both computationally and experimentally. The computational results indicate that N-SCAN is more accurate than any of the other tools we tested at predicting the TSS and the complete first exon. It is the only one of these tools that can predict complete gene structures together with 5' UTRs. Experimental evaluation shows that N-SCAN can be used to validate novel UTR introns in human gene predictions that do not overlap any RefSeq gene and even to correct RefSeq mRNAs by adding validated UTR exons that are missing from RefSeq.


Asunto(s)
Regiones no Traducidas 5'/genética , Biología Computacional/métodos , Genes/genética , Genoma Humano , Genómica/métodos , Modelos Genéticos , Secuencia de Bases , Estudios de Evaluación como Asunto , Exones/genética , Humanos , Datos de Secuencia Molecular , Análisis de Secuencia de ADN , Sitio de Iniciación de la Transcripción
7.
Genome Res ; 14(11): 2330-5, 2004 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-15479946

RESUMEN

The genomes of clusters of related eukaryotes are now being sequenced at an increasing rate, creating a need for accurate, low-cost annotation of exon-intron structures. In this paper, we demonstrate that reverse transcription-polymerase chain reaction (RT-PCR) and direct sequencing based on predicted gene structures satisfy this need, at least for single-celled eukaryotes. The TWINSCAN gene prediction algorithm was adapted for the fungal pathogen Cryptococcus neoformans by using a precise model of intron lengths in combination with ungapped alignments between the genome sequences of the two closely related Cryptococcus varieties. This approach resulted in approximately 60% of known genes being predicted exactly right at every coding base and splice site. When previously unannotated TWINSCAN predictions were tested by RT-PCR and direct sequencing, 75% of targets spanning two predicted introns were amplified and produced high-quality sequence. When targets spanning the complete predicted open reading frame were tested, 72% of them amplified and produced high-quality sequence. We conclude that sequencing a small number of expressed sequence tags (ESTs) to provide training data, running TWINSCAN on an entire genome, and then performing RT-PCR and direct sequencing on all of its predictions would be a cost-effective method for obtaining an experimentally verified genome annotation.


Asunto(s)
Algoritmos , Cryptococcus neoformans/genética , Genoma Fúngico , Intrones/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Biología Computacional , Valor Predictivo de las Pruebas , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA