Pesquisa | Secretaria de Estado da Saúde

Langdon, W B; Lam, Brian Yee Hong.

BioData Min ; 10: 28, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28785314

RESUMO

BACKGROUND: BarraCUDA is an open source C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. Recently its source code was optimised using "Genetic Improvement". RESULTS: The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60% more accurate on a short BioPlanet.com GCAT alignment benchmark. GPGPU BarraCUDA running on a single K80 Tesla GPU can align short paired end nextGen sequences up to ten times faster than bwa on a 12 core server. CONCLUSIONS: The speed up was such that the GI version was adopted and has been regularly downloaded from SourceForge for more than 12 months.

Performance of genetic programming optimised Bowtie2 on genome comparison and analytic testing (GCAT) benchmarks.

Langdon, W B.

BioData Min ; 8(1): 1, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25621011

RESUMO

BACKGROUND: Genetic studies are increasingly based on short noisy next generation scanners. Typically complete DNA sequences are assembled by matching short NextGen sequences against reference genomes. Despite considerable algorithmic gains since the turn of the millennium, matching both single ended and paired end strings to a reference remains computationally demanding. Further tailoring Bioinformatics tools to each new task or scanner remains highly skilled and labour intensive. With this in mind, we recently demonstrated a genetic programming based automated technique which generated a version of the state-of-the-art alignment tool Bowtie2 which was considerably faster on short sequences produced by a scanner at the Broad Institute and released as part of The Thousand Genome Project. RESULTS: Bowtie2 (G P) and the original Bowtie2 release were compared on bioplanet's GCAT synthetic benchmarks. Bowtie2 (G P) enhancements were also applied to the latest Bowtie2 release (2.2.3, 29 May 2014) and retained both the GP and the manually introduced improvements. CONCLUSIONS: On both singled ended and paired-end synthetic next generation DNA sequence GCAT benchmarks Bowtie2GP runs up to 45% faster than Bowtie2. The lost in accuracy can be as little as 0.2-0.5% but up to 2.5% for longer sequences.

A survey of spatial defects in Homo Sapiens Affymetrix GeneChips.

Langdon, W B; Upton, G J G; da Silva Camargo, Renata; Harrison, Andrew P.

IEEE/ACM Trans Comput Biol Bioinform ; 7(4): 647-53, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-21030732

RESUMO

Modern biology has moved from a science of individual measurements to a science where data are collected on an industrial scale. Foremost, among the new tools for biochemistry are chip arrays which, in one operation, measure hundreds of thousands or even millions of DNA sequences or RNA transcripts. While this is impressive, increasingly sophisticated analysis tools have been required to convert gene array data into gene expression levels. Despite the assumption that noise levels are low, since the number of measurements for an individual gene is small, identifying which signals are affected by noise is a priority. High-density oligonucleotide array (HDONAs) from NCBI GEO shows that, even in the best Human GeneChips 1/4 percent of data are affected by spatial noise. Earlier designs are noisier and spatial defects may affect more than 25 percent of probes. BioConductor R code is available as supplementary material which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.108 and via http://bioinformatics.essex.ac.uk/users/wlangdon/TCBB-2007-11-0161.tar.gz.

Assuntos

Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de DNA/métodos , Coleta de Dados , Bases de Dados Genéticas , Humanos

Scaling of program fitness spaces.

Langdon, W B.

Evol Comput ; 7(4): 399-428, 1999.

Artigo em Inglês | MEDLINE | ID: mdl-10578029

RESUMO

We investigate the distribution of fitness of programs concentrating on those represented as parse trees and, particularly, how such distributions scale with respect to changes in the size of the programs. By using a combination of enumeration and Monte Carlo sampling on a large number of problems from three very different areas, we suggest that, in general, once some minimum size threshold has been exceeded, the distribution of performance is approximately independent of program length. We proof this for both linear programs and simple side effect free parse trees. We give the density of solutions to the parity problems in program trees which are composed of XOR building blocks. Limited experiments with programs including side effects and iteration suggest a similar result may also hold for this wider class of programs.

Assuntos

Algoritmos , Modelos Genéticos , Simulação por Computador , Modelos Lineares , Modelos Estatísticos , Método de Monte Carlo , Análise de Regressão , Processos Estocásticos

Schema theory for genetic programming with one-point crossover and point mutation.

Poli, R; Langdon, W B.

Evol Comput ; 6(3): 231-52, 1998.

Artigo em Inglês | MEDLINE | ID: mdl-10021748

RESUMO

We review the main results obtained in the theory of schemata in genetic programming (GP), emphasizing their strengths and weaknesses. Then we propose a new, simpler definition of the concept of schema for GP, which is closer to the original concept of schema in genetic algorithms (GAs). Along with a new form of crossover, one-point crossover, and point mutation, this concept of schema has been used to derive an improved schema theorem for GP that describes the propagation of schemata from one generation to the next. We discuss this result and show that our schema theorem is the natural counterpart for GP of the schema theorem for GAs, to which it asymptotically converges.

Assuntos

Algoritmos , Troca Genética , Mutação Puntual , Criança , Pai , Humanos , Mães

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa