Pesquisa | Secretaria de Estado da Saúde

Fur: Find unique genomic regions for diagnostic PCR.

Haubold, Bernhard; Klötzl, Fabian; Hellberg, Lars; Thompson, Daniel; Cavalar, Markus.

Bioinformatics ; 37(15): 2081-2087, 2021 Aug 09.

Artigo em Inglês | MEDLINE | ID: mdl-33515232

RESUMO

MOTIVATION: Unique marker sequences are highly sought after in molecular diagnostics. Nevertheless, there are only few programs available to search for marker sequences, compared to the many programs for similarity search. We therefore wrote the program Fur for Finding Unique genomic Regions. RESULTS: Fur takes as input a sample of target sequences and a sample of closely related neighbors. It returns the regions present in all targets and absent from all neighbors. The recently published program genmap can also be used for this purpose and we compared it to fur. When analyzing a sample of 33 genomes representing the major phylogroups of E.coli, fur was 40 times faster than genmap but used three times more memory. On the other hand, genmap yielded three times more markers, but they were less accurate when tested in silico on a sample of 237 E.coli genomes. We also designed phylogroup-specific PCR primers based on the markers proposed by genmap and fur, and tested them by analyzing their virtual amplicons in GenBank. Finally, we used fur to design primers specific to a Lactobacillus species, and found excellent sensitivity and specificity in vitro. AVAILABILITY AND IMPLEMENTATION: Fur sources and documentation are available from https://github.com/evolbioinf/fur. The compiled software is posted as a docker container at https://hub.docker.com/r/haubold/fox. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Phylonium: fast estimation of evolutionary distances from large samples of similar genomes.

Klötzl, Fabian; Haubold, Bernhard.

Bioinformatics ; 36(7): 2040-2046, 2020 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-31790149

RESUMO

MOTIVATION: Tracking disease outbreaks by whole-genome sequencing leads to the collection of large samples of closely related sequences. Five years ago, we published a method to accurately compute all pairwise distances for such samples by indexing each sequence. Since indexing is slow, we now ask whether it is possible to achieve similar accuracy when indexing only a single sequence. RESULTS: We have implemented this idea in the program phylonium and show that it is as accurate as its predecessor and roughly 100 times faster when applied to all 2678 Escherichia coli genomes contained in ENSEMBL. One of the best published programs for rapidly computing pairwise distances, mash, analyzes the same dataset four times faster but, with default settings, it is less accurate than phylonium. AVAILABILITY AND IMPLEMENTATION: Phylonium runs under the UNIX command line; its C++ sources and documentation are available from github.com/evolbioinf/phylonium. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genômica , Software , Algoritmos , Genoma , Análise de Sequência de DNA

hotspot: software to support sperm-typing for investigating recombination hotspots.

Odenthal-Hesse, Linda; Dutheil, Julien Y; Klötzl, Fabian; Haubold, Bernhard.

Bioinformatics ; 32(16): 2554-5, 2016 08 15.

Artigo em Inglês | MEDLINE | ID: mdl-27153632

RESUMO

MOTIVATION: In many organisms, including humans, recombination clusters within recombination hotspots. The standard method for de novo detection of recombinants at hotspots is sperm typing. This relies on allele-specific PCR at single nucleotide polymorphisms. Designing allele-specific primers by hand is time-consuming. We have therefore written a package to support hotspot detection and analysis. RESULTS: hotspot consists of four programs: asp looks up SNPs and designs allele-specific primers; aso constructs allele-specific oligos for mapping recombinants; xov implements a maximum-likelihood method for estimating the crossover rate; six, finally, simulates typing data. AVAILABILITY AND IMPLEMENTATION: hotspot is written in C. Sources are freely available under the GNU General Public License from http://github.com/evolbioinf/hotspot/ CONTACT: haubold@evolbio.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Recombinação Genética , Software , Espermatozoides , Alelos , Humanos , Funções Verossimilhança , Masculino

andi: fast and accurate estimation of evolutionary distances between closely related genomes.

Haubold, Bernhard; Klötzl, Fabian; Pfaffelhuber, Peter.

Bioinformatics ; 31(8): 1169-75, 2015 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-25504847

RESUMO

MOTIVATION: A standard approach to classifying sets of genomes is to calculate their pairwise distances. This is difficult for large samples. We have therefore developed an algorithm for rapidly computing the evolutionary distances between closely related genomes. RESULTS: Our distance measure is based on ungapped local alignments that we anchor through pairs of maximal unique matches of a minimum length. These exact matches can be looked up efficiently using enhanced suffix arrays and our implementation requires approximately only 1 s and 45 MB RAM/Mbase analysed. The pairing of matches distinguishes non-homologous from homologous regions leading to accurate distance estimation. We show this by analysing simulated data and genome samples ranging from 29 Escherichia coli/Shigella genomes to 3085 genomes of Streptococcus pneumoniae. AVAILABILITY AND IMPLEMENTATION: We have implemented the computation of anchor distances in the multithreaded UNIX command-line program andi for ANchor DIstances. C sources and documentation are posted at http://github.com/evolbioinf/andi/ CONTACT: haubold@evolbio.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Evolução Biológica , Genoma , Genômica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Bases de Dados Genéticas , Humanos , Filogenia

Marker discovery in the large.

Vieira Mourato, Beatriz; Tsers, Ivan; Denker, Svenja; Klötzl, Fabian; Haubold, Bernhard.

Bioinform Adv ; 4(1): vbae113, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39132289

RESUMO

Motivation: Markers for diagnostic polymerase chain reactions are routinely constructed by taking regions common to the genomes of a target organism and subtracting the regions found in the targets' closest relatives, their neighbors. This approach is implemented in the published package Fur, which originally required memory proportional to the number of nucleotides in the neighborhood. This does not scale well. Results: Here, we describe a new version of Fur that only requires memory proportional to the longest neighbor. In spite of its greater memory efficiency, the new Fur remains fast and is accurate. We demonstrate this by applying it to simulated sequences and comparing it to an efficient alternative. Then we use the new Fur to extract markers from 120 reference bacteria. To make this feasible, we also introduce software for automatically finding target and neighbor genomes and for assessing markers. We pick the best primers from the 10 most sequenced reference bacteria and show their excellent in silico sensitivity and specificity. Availability and implementation: Fur is available from github.com/evolbioinf/fur, in the Docker image hub.docker.com/r/beatrizvm/mapro, and in the Code Ocean capsule 10.24433/CO.7955947.v1.

Fast Phylogeny Reconstruction from Genomes of Closely Related Microbes.

Haubold, Bernhard; Klötzl, Fabian.

Methods Mol Biol ; 2242: 77-89, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33961219

RESUMO

By tracking pathogen outbreaks using whole genome sequencing, medical microbiology is currently being transformed into genomic epidemiology. This change in technology is leading to the rapid accumulation of large samples of closely related genome sequences. Summarizing such samples into phylogenies can be computationally challenging. Our program andi quickly computes accurate pairwise distances between up to thousands of bacterial genomes. Working under the UNIX command line, we show how andi can be used to transform genomes to phylogenies with support values ready to be printed or integrated into documents.

Assuntos

DNA Bacteriano/genética , Escherichia coli/genética , Genoma Bacteriano , Genômica , Filogenia , Shigella/genética , Bases de Dados Genéticas , Projetos de Pesquisa , Design de Software , Fluxo de Trabalho

Support Values for Genome Phylogenies.

Klötzl, Fabian; Haubold, Bernhard.

Life (Basel) ; 6(1)2016 Mar 07.

Artigo em Inglês | MEDLINE | ID: mdl-26959064

RESUMO

We have recently developed a distance metric for efficiently estimating the number of substitutions per site between unaligned genome sequences. These substitution rates are called "anchor distances" and can be used for phylogeny reconstruction. Most phylogenies come with bootstrap support values, which are computed by resampling with replacement columns of homologous residues from the original alignment. Unfortunately, this method cannot be applied to anchor distances, as they are based on approximate pairwise local alignments rather than the full multiple sequence alignment necessary for the classical bootstrap. We explore two alternatives: pairwise bootstrap and quartet analysis, which we compare to classical bootstrap. With simulated sequences and 53 human primate mitochondrial genomes, pairwise bootstrap gives better results than quartet analysis. However, when applied to 29 E. coli genomes, quartet analysis comes closer to the classical bootstrap.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa