Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 15 Suppl 7: S1, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25079667

RESUMO

BACKGROUND: Drug discovery, disease detection, and personalized medicine are fast-growing areas of genomic research. With the advancement of next-generation sequencing techniques, researchers can obtain an abundance of data for many different biological assays in a short period of time. When this data is error-free, the result is a high-quality base-pair resolution picture of the genome. However, when the data is lossy the heuristic algorithms currently used when aligning next-generation sequences causes the corresponding accuracy to drop. RESULTS: This paper describes a program, ADaM (APF DNA Mapper) which significantly increases final alignment accuracy. ADaM works by first using an existing program to align "easy" sequences, and then using an algorithm with accuracy guarantees (the APF) to align the remaining sequences. The final result is a technique that increases the mapping accuracy from only 60% to over 90% for harder-to-align sequences.


Assuntos
Algoritmos , Inteligência Artificial , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Sequência de Bases , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
2.
BMC Bioinformatics ; 14: 337, 2013 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-24261665

RESUMO

BACKGROUND: DNA methylation has been linked to many important biological phenomena. Researchers have recently begun to sequence bisulfite treated DNA to determine its pattern of methylation. However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, sequencing errors, and poor quality bases. Therefore, it is often difficult to align reads to the correct locations in the reference genome. Furthermore, bisulfite sequencing experiments have the additional complexity of having to estimate the DNA methylation levels within the sample. RESULTS: Here, we present a highly accurate probabilistic algorithm, which is an extension of the Genomic Next-generation Universal MAPper to accommodate bisulfite sequencing data (GNUMAP-bs), that addresses the computational problems associated with aligning bisulfite sequencing data to a reference genome. GNUMAP-bs integrates uncertainty from read and mapping qualities to help resolve the difference between poor quality bases and the ambiguity inherent in bisulfite conversion. We tested GNUMAP-bs and other commonly-used bisulfite alignment methods using both simulated and real bisulfite reads and found that GNUMAP-bs and other dynamic programming methods were more accurate than the more heuristic methods. CONCLUSIONS: The GNUMAP-bs aligner is a highly accurate alignment approach for processing the data from bisulfite sequencing experiments. The GNUMAP-bs algorithm is freely available for download at: http://dna.cs.byu.edu/gnumap. The software runs on multiple threads and multiple processors to increase the alignment speed.


Assuntos
Alinhamento de Sequência/normas , Análise de Sequência de DNA , Sulfitos/química , Algoritmos , Inteligência Artificial , Sequência de Bases , Simulação por Computador , Metilação de DNA , Genoma Humano , Humanos , Probabilidade , Software , Sulfitos/normas
3.
Genome Res ; 23(10): 1721-9, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23843222

RESUMO

Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality, and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample and considers cases when the sample species/strain is not in the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for multiple alignment steps, extensive homology searches, or genome assembly--which are time-consuming and labor-intensive steps. We demonstrate the utility of our approach on genomic data from purified and in silico "environmental" samples from known bacterial agents impacting human health for accuracy assessment and comparison with other approaches.


Assuntos
Bactérias/classificação , Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Bacteriano , Análise de Sequência de DNA , Software , Algoritmos , Bacillus anthracis/genética , Teorema de Bayes , Bioterrorismo , Burkholderia mallei/genética , Burkholderia pseudomallei/genética , Clostridium botulinum/genética , Escherichia coli/genética , Infecções por Escherichia coli/microbiologia , Europa (Continente) , Francisella tularensis/genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Especificidade da Espécie , Yersinia pestis/genética
4.
Proc IPDPS (Conf) ; 2011: 435-443, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-23396612

RESUMO

Mapping short next-generation reads to reference genomes is an important element in SNP calling and expression studies. A major limitation to large-scale whole-genome mapping is the large memory requirements for the algorithm and the long run-time necessary for accurate studies. Several parallel implementations have been performed to distribute memory on different processors and to equally share the processing requirements. These approaches are compared with respect to their memory footprint, load balancing, and accuracy. When using MPI with multi-threading, linear speedup can be achieved for up to 256 processors.

5.
Bioinformatics ; 26(1): 38-45, 2010 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-19861355

RESUMO

MOTIVATION: The advent of next-generation sequencing technologies has increased the accuracy and quantity of sequence data, opening the door to greater opportunities in genomic research. RESULTS: In this article, we present GNUMAP (Genomic Next-generation Universal MAPper), a program capable of overcoming two major obstacles in the mapping of reads from next-generation sequencing runs. First, we have created an algorithm that probabilistically maps reads to repeat regions in the genome on a quantitative basis. Second, we have developed a probabilistic Needleman-Wunsch algorithm which utilizes _prb.txt and _int.txt files produced in the Solexa/Illumina pipeline to improve the mapping accuracy for lower quality reads and increase the amount of usable data produced in a given experiment. AVAILABILITY: The source code for the software can be downloaded from http://dna.cs.byu.edu/gnumap.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , DNA/genética , Análise de Sequência de DNA/métodos , Software , Sequência de Bases , Interpretação Estatística de Dados , Dados de Sequência Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA