Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
BMC Bioinformatics ; 15 Suppl 7: S1, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25079667

RESUMEN

BACKGROUND: Drug discovery, disease detection, and personalized medicine are fast-growing areas of genomic research. With the advancement of next-generation sequencing techniques, researchers can obtain an abundance of data for many different biological assays in a short period of time. When this data is error-free, the result is a high-quality base-pair resolution picture of the genome. However, when the data is lossy the heuristic algorithms currently used when aligning next-generation sequences causes the corresponding accuracy to drop. RESULTS: This paper describes a program, ADaM (APF DNA Mapper) which significantly increases final alignment accuracy. ADaM works by first using an existing program to align "easy" sequences, and then using an algorithm with accuracy guarantees (the APF) to align the remaining sequences. The final result is a technique that increases the mapping accuracy from only 60% to over 90% for harder-to-align sequences.


Asunto(s)
Algoritmos , Inteligencia Artificial , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Secuencia de Bases , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos
2.
BMC Bioinformatics ; 14: 337, 2013 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-24261665

RESUMEN

BACKGROUND: DNA methylation has been linked to many important biological phenomena. Researchers have recently begun to sequence bisulfite treated DNA to determine its pattern of methylation. However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, sequencing errors, and poor quality bases. Therefore, it is often difficult to align reads to the correct locations in the reference genome. Furthermore, bisulfite sequencing experiments have the additional complexity of having to estimate the DNA methylation levels within the sample. RESULTS: Here, we present a highly accurate probabilistic algorithm, which is an extension of the Genomic Next-generation Universal MAPper to accommodate bisulfite sequencing data (GNUMAP-bs), that addresses the computational problems associated with aligning bisulfite sequencing data to a reference genome. GNUMAP-bs integrates uncertainty from read and mapping qualities to help resolve the difference between poor quality bases and the ambiguity inherent in bisulfite conversion. We tested GNUMAP-bs and other commonly-used bisulfite alignment methods using both simulated and real bisulfite reads and found that GNUMAP-bs and other dynamic programming methods were more accurate than the more heuristic methods. CONCLUSIONS: The GNUMAP-bs aligner is a highly accurate alignment approach for processing the data from bisulfite sequencing experiments. The GNUMAP-bs algorithm is freely available for download at: http://dna.cs.byu.edu/gnumap. The software runs on multiple threads and multiple processors to increase the alignment speed.


Asunto(s)
Alineación de Secuencia/normas , Análisis de Secuencia de ADN , Sulfitos/química , Algoritmos , Inteligencia Artificial , Secuencia de Bases , Simulación por Computador , Metilación de ADN , Genoma Humano , Humanos , Probabilidad , Programas Informáticos , Sulfitos/normas
3.
Genome Res ; 23(10): 1721-9, 2013 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-23843222

RESUMEN

Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence quality, mapping quality, and provides posterior probabilities of matches to a known database of target genomes. Importantly, our approach also incorporates the possibility that multiple species can be present in the sample and considers cases when the sample species/strain is not in the reference database. Furthermore, our approach can accurately discriminate between very closely related strains of the same species with very little coverage of the genome and without the need for multiple alignment steps, extensive homology searches, or genome assembly--which are time-consuming and labor-intensive steps. We demonstrate the utility of our approach on genomic data from purified and in silico "environmental" samples from known bacterial agents impacting human health for accuracy assessment and comparison with other approaches.


Asunto(s)
Bacterias/clasificación , Bacterias/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Genoma Bacteriano , Análisis de Secuencia de ADN , Programas Informáticos , Algoritmos , Bacillus anthracis/genética , Teorema de Bayes , Bioterrorismo , Burkholderia mallei/genética , Burkholderia pseudomallei/genética , Clostridium botulinum/genética , Escherichia coli/genética , Infecciones por Escherichia coli/microbiología , Europa (Continente) , Francisella tularensis/genética , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Especificidad de la Especie , Yersinia pestis/genética
4.
Proc IPDPS (Conf) ; 2011: 435-443, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-23396612

RESUMEN

Mapping short next-generation reads to reference genomes is an important element in SNP calling and expression studies. A major limitation to large-scale whole-genome mapping is the large memory requirements for the algorithm and the long run-time necessary for accurate studies. Several parallel implementations have been performed to distribute memory on different processors and to equally share the processing requirements. These approaches are compared with respect to their memory footprint, load balancing, and accuracy. When using MPI with multi-threading, linear speedup can be achieved for up to 256 processors.

5.
Bioinformatics ; 26(1): 38-45, 2010 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-19861355

RESUMEN

MOTIVATION: The advent of next-generation sequencing technologies has increased the accuracy and quantity of sequence data, opening the door to greater opportunities in genomic research. RESULTS: In this article, we present GNUMAP (Genomic Next-generation Universal MAPper), a program capable of overcoming two major obstacles in the mapping of reads from next-generation sequencing runs. First, we have created an algorithm that probabilistically maps reads to repeat regions in the genome on a quantitative basis. Second, we have developed a probabilistic Needleman-Wunsch algorithm which utilizes _prb.txt and _int.txt files produced in the Solexa/Illumina pipeline to improve the mapping accuracy for lower quality reads and increase the amount of usable data produced in a given experiment. AVAILABILITY: The source code for the software can be downloaded from http://dna.cs.byu.edu/gnumap.


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , ADN/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Secuencia de Bases , Interpretación Estadística de Datos , Datos de Secuencia Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...