Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
PLoS One ; 6(8): e23501, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21876754

RESUMEN

We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ∼280 bp or ∼3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed.


Asunto(s)
Algoritmos , Genoma Fúngico/genética , Pichia/genética , Análisis de Secuencia de ADN/métodos , Artefactos , Sesgo , Quimerismo , Cromosomas Fúngicos/genética , Simulación por Computador , Elementos Transponibles de ADN/genética , Bases de Datos Genéticas , Escherichia coli/genética , Biblioteca de Genes , Genes del Tipo Sexual de los Hongos , Mutagénesis Insercional/genética , Estándares de Referencia
2.
Nucleic Acids Res ; 35(21): e148, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-18006572

RESUMEN

DNA resequencing arrays enable rapid acquisition of high-quality sequence data. This technology represents a promising platform for rapid high-resolution genotyping of microorganisms. Traditional array-based resequencing methods have relied on the use of specific PCR-amplified fragments from the query samples as hybridization targets. While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method. We have developed and validated an Affymetrix Inc. GeneChip(R) array-based, whole-genome resequencing platform for Francisella tularensis, the causative agent of tularemia. A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed. Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.


Asunto(s)
Biología Computacional/métodos , Genoma Bacteriano , Genómica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Polimorfismo de Nucleótido Simple , Francisella tularensis/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/normas , Estándares de Referencia , Reproducibilidad de los Resultados
3.
RNA Biol ; 3(1): 40-8, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-17114936

RESUMEN

Several recent studies indicate that mammals and other organisms produce large numbers of RNA transcripts that do not correspond to known genes. It has been suggested that these transcripts do not encode proteins, but may instead function as RNAs. However, discrimination of coding and non-coding transcripts is not straightforward, and different laboratories have used different methods, whose ability to perform this discrimination is unclear. In this study, we examine ten bioinformatic methods that assess protein-coding potential and compare their ability and congruency in the discrimination of non-coding from coding sequences, based on four underlying principles: open reading frame size, sequence similarity to known proteins or protein domains, statistical models of protein-coding sequence, and synonymous versus non-synonymous substitution rates. Despite these different approaches, the methods show broad concordance, suggesting that coding and non-coding transcripts can, in general, be reliably discriminated, and that many of the recently discovered extra-genic transcripts are indeed non-coding. Comparison of the methods indicates reasons for unreliable predictions, and approaches to increase confidence further. Conversely and surprisingly, our analyses also provide evidence that as much as approximately 10% of entries in the manually curated protein database Swiss-Prot are erroneous translations of actually non-coding transcripts.


Asunto(s)
Bioquímica/métodos , Técnicas Genéticas , ARN Mensajero/química , ARN no Traducido/química , Algoritmos , Animales , Biología Computacional , ADN Complementario/metabolismo , Interpretación Estadística de Datos , Bases de Datos de Proteínas , Etiquetas de Secuencia Expresada , Ratones , Sistemas de Lectura Abierta , Estructura Terciaria de Proteína , Proteínas/química , ARN Mensajero/genética , ARN no Traducido/genética
4.
DNA Repair (Amst) ; 1(7): 517-29, 2002 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-12509226

RESUMEN

Prokaryotes and lower eukaryotes possess redundant activities that remove the plethora of oxidative DNA base damages produced during normal oxidative metabolism and which have been associated with cancer and aging. Thus far, only one oxidized pyrimidine-specific DNA glycosylase has been identified in humans, hNthl. Here, we report the identification of three new putative human DNA glycosylases that are phylogenetically members of the Fpg/Nei family primarily found in the bacterial kingdom. We have characterized one of these, hNEI1, and show it to be functionally homologous to bacterial Nei, that is, its principal substrates are oxidized pyrimidines, it undergoes a lyase reaction by, beta,delta-elimination and traps a Schiff base with a substrate containing thymine glycol (Tg). Furthermore, inactivation of active site residues shown to be important in Escherichia coli Nei inactivate the human enzyme. The hNEI1 gene is located on the long arm of chromosome 15 that is frequently deleted in human cancers.


Asunto(s)
Reparación del ADN/genética , Endodesoxirribonucleasas/genética , Proteínas de Escherichia coli , Escherichia coli/genética , Secuencia de Aminoácidos , Animales , Clonación Molecular , ADN-Formamidopirimidina Glicosilasa , Desoxirribonucleasa (Dímero de Pirimidina) , Endodesoxirribonucleasas/metabolismo , Escherichia coli/metabolismo , Evolución Molecular , Humanos , Concentración de Iones de Hidrógeno , Ratones , Datos de Secuencia Molecular , N-Glicosil Hidrolasas/genética , Neoplasias/genética , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Cloruro de Sodio/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA