Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Res Notes ; 6: 25, 2013 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-23339526

RESUMO

BACKGROUND: Sequencing-by-synthesis technologies significantly improve over the Sanger method in terms of speed and cost per base. However, they still usually fail to compete in terms of read length and quality. Current high-throughput implementations of the pyrosequencing technique yield reads whose length approach those of the capillary electrophoresis method. A less obvious question is whether their quality is affected by platform-specific sequencing errors. RESULTS: We present an empirical study aimed at assessing the quality and characterising sequencing errors for high throughput pyrosequencing data. We have developed a procedure for extracting sequencing error data from genome assemblies and study their characteristics, in particular the length distribution of indel gaps and their relation to the sequence contexts where they occur. We used this procedure to analyse data from three prokaryotic genomes sequenced with the GS FLX technology. We also compared two models previously employed with success for peptide sequence alignment. CONCLUSIONS: We observed an overall very low error rate in the analysed data, with indel errors being much more abundant than substitutions. We also observed a dependence between the length of the gaps and that of the homopolymer context where they occur. As with protein alignments, a power-law model seems to approximate the indel errors more accurately, although the results are not so conclusive as to justify a depart from the commonly used affine gap penalty scheme. In whichever case, however, our procedure can be used to estimate more realistic error model parameters.


Assuntos
Artefatos , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Modelos Estatísticos , Algoritmos , Sequência de Bases , Mutação INDEL , Dados de Sequência Molecular , Mycoplasma hyopneumoniae/genética , Alinhamento de Sequência , Staphylococcus aureus/genética , Streptococcus pneumoniae/genética
2.
BMC Bioinformatics ; 12: 163, 2011 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-21672185

RESUMO

BACKGROUND: Over the past few years, new massively parallel DNA sequencing technologies have emerged. These platforms generate massive amounts of data per run, greatly reducing the cost of DNA sequencing. However, these techniques also raise important computational difficulties mostly due to the huge volume of data produced, but also because of some of their specific characteristics such as read length and sequencing errors. Among the most critical problems is that of efficiently and accurately mapping reads to a reference genome in the context of re-sequencing projects. RESULTS: We present an efficient method for the local alignment of pyrosequencing reads produced by the GS FLX (454) system against a reference sequence. Our approach explores the characteristics of the data in these re-sequencing applications and uses state of the art indexing techniques combined with a flexible seed-based approach, leading to a fast and accurate algorithm which needs very little user parameterization. An evaluation performed using real and simulated data shows that our proposed method outperforms a number of mainstream tools on the quantity and quality of successful alignments, as well as on the execution time. CONCLUSIONS: The proposed methodology was implemented in a software tool called TAPyR--Tool for the Alignment of Pyrosequencing Reads--which is publicly available from http://www.tapyr.net.


Assuntos
Análise de Sequência de DNA/métodos , Algoritmos , Animais , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Alinhamento de Sequência , Software
3.
BMC Genomics ; 12: 137, 2011 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-21375742

RESUMO

BACKGROUND: Eucalyptus species are among the most planted hardwoods in the world because of their rapid growth, adaptability and valuable wood properties. The development and integration of genomic resources into breeding practice will be increasingly important in the decades to come. Bacterial artificial chromosome (BAC) libraries are key genomic tools that enable positional cloning of important traits, synteny evaluation, and the development of genome framework physical maps for genetic linkage and genome sequencing. RESULTS: We describe the construction and characterization of two deep-coverage BAC libraries EG_Ba and EG_Bb obtained from nuclear DNA fragments of E. grandis (clone BRASUZ1) digested with HindIII and BstYI, respectively. Genome coverages of 17 and 15 haploid genome equivalents were estimated for EG_Ba and EG_Bb, respectively. Both libraries contained large inserts, with average sizes ranging from 135 Kb (Eg_Bb) to 157 Kb (Eg_Ba), very low extra-nuclear genome contamination providing a probability of finding a single copy gene ≥ 99.99%. Libraries were screened for the presence of several genes of interest via hybridizations to high-density BAC filters followed by PCR validation. Five selected BAC clones were sequenced and assembled using the Roche GS FLX technology providing the whole sequence of the E. grandis chloroplast genome, and complete genomic sequences of important lignin biosynthesis genes. CONCLUSIONS: The two E. grandis BAC libraries described in this study represent an important milestone for the advancement of Eucalyptus genomics and forest tree research. These BAC resources have a highly redundant genome coverage (> 15×), contain large average inserts and have a very low percentage of clones with organellar DNA or empty vectors. These publicly available BAC libraries are thus suitable for a broad range of applications in genetic and genomic research in Eucalyptus and possibly in related species of Myrtaceae, including genome sequencing, gene isolation, functional and comparative genomics. Because they have been constructed using the same tree (E. grandis BRASUZ1) whose full genome is being sequenced, they should prove instrumental for assembly and gap filling of the upcoming Eucalyptus reference genome sequence.


Assuntos
Eucalyptus/genética , Biblioteca Gênica , Genoma de Planta , Genômica/métodos , Lignina/biossíntese , Cromossomos Artificiais Bacterianos , DNA de Plantas/genética , Genoma de Cloroplastos , Lignina/genética , Anotação de Sequência Molecular , Análise de Sequência de DNA
4.
Bioinformatics ; 24(16): i160-6, 2008 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-18689819

RESUMO

MOTIVATION: Position weight matrices (PWMs) have become a standard for representing biological sequence motifs. Their relative simplicity has favoured the development of efficient algorithms for diverse tasks such as motif identification, sequence scanning and statistical significance evaluation. Markov chainbased models generalize the PWM model by allowing for interposition dependencies to be considered, at the cost of substantial computational overhead, which may limit their application. RESULTS: In this article, we consider two aspects regarding the use of higher order Markov models for biological sequence motifs, namely, the representation and the computation of P-values for motifs described by a set of occurrences. We propose an efficient representation based on the use of tries, from which empirical position-specific conditional base probabilities can be computed, and extend state-of-the-art PWM-based algorithms to allow for the computation of exact P-values for high-order Markov motif models. AVAILABILITY: The software is available in the form of a Java objectoriented library from http://www.cin.ufpe.br/approxiamtely paguso/kmarkov.


Assuntos
Algoritmos , Modelos Químicos , Modelos Genéticos , Análise de Sequência/métodos , Simulação por Computador , Interpretação Estatística de Dados , Cadeias de Markov , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...