Your browser doesn't support javascript.
loading
Removing contaminants from databases of draft genomes.
Lu, Jennifer; Salzberg, Steven L.
Afiliação
  • Lu J; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America.
  • Salzberg SL; Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, United States of America.
PLoS Comput Biol ; 14(6): e1006277, 2018 06.
Article em En | MEDLINE | ID: mdl-29939994
ABSTRACT
Metagenomic sequencing of patient samples is a very promising method for the diagnosis of human infections. Sequencing has the ability to capture all the DNA or RNA from pathogenic organisms in a human sample. However, complete and accurate characterization of the sequence, including identification of any pathogens, depends on the availability and quality of genomes for comparison. Thousands of genomes are now available, and as these numbers grow, the power of metagenomic sequencing for diagnosis should increase. However, recent studies have exposed the presence of contamination in published genomes, which when used for diagnosis increases the risk of falsely identifying the wrong pathogen. To address this problem, we have developed a bioinformatics system for eliminating contamination as well as low-complexity genomic sequences in the draft genomes of eukaryotic pathogens. We applied this software to identify and remove human, bacterial, archaeal, and viral sequences present in a comprehensive database of all sequenced eukaryotic pathogen genomes. We also removed low-complexity genomic sequences, another source of false positives. Using this pipeline, we have produced a database of "clean" eukaryotic pathogen genomes for use with bioinformatics classification and analysis tools. We demonstrate that when attempting to find eukaryotic pathogens in metagenomic samples, the new database provides better sensitivity than one using the original genomes while offering a dramatic reduction in false positives.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Análise de Sequência de DNA / Biologia Computacional / Metagenômica Idioma: En Ano de publicação: 2018 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Análise de Sequência de DNA / Biologia Computacional / Metagenômica Idioma: En Ano de publicação: 2018 Tipo de documento: Article País de afiliação: Estados Unidos