Your browser doesn't support javascript.
loading
Consensus assessment of the contamination level of publicly available cyanobacterial genomes.
Cornet, Luc; Meunier, Loïc; Van Vlierberghe, Mick; Léonard, Raphaël R; Durieu, Benoit; Lara, Yannick; Misztak, Agnieszka; Sirjacobs, Damien; Javaux, Emmanuelle J; Philippe, Hervé; Wilmotte, Annick; Baurain, Denis.
Afiliación
  • Cornet L; InBioS-PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium.
  • Meunier L; UR Geology-Palaeobiogeology-Palaeobotany-Palaeopalynology, University of Liège, Liège, Belgium.
  • Van Vlierberghe M; InBioS-PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium.
  • Léonard RR; InBioS-PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium.
  • Durieu B; InBioS-PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium.
  • Lara Y; InBioS-CIP, Macromolecular Crystallography, University of Liège, Liège, Belgium.
  • Misztak A; InBioS-CIP, Centre for Protein Engineering, University of Liège, Liège, Belgium.
  • Sirjacobs D; InBioS-CIP, Centre for Protein Engineering, University of Liège, Liège, Belgium.
  • Javaux EJ; InBioS-PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium.
  • Philippe H; Intercollegiate Faculty of Biotechnology UG-MUG, Gdansk, Poland.
  • Wilmotte A; InBioS-PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium.
  • Baurain D; UR Geology-Palaeobiogeology-Palaeobotany-Palaeopalynology, University of Liège, Liège, Belgium.
PLoS One ; 13(7): e0200323, 2018.
Article en En | MEDLINE | ID: mdl-30044797
ABSTRACT
Publicly available genomes are crucial for phylogenetic and metagenomic studies, in which contaminating sequences can be the cause of major problems. This issue is expected to be especially important for Cyanobacteria because axenic strains are notoriously difficult to obtain and keep in culture. Yet, despite their great scientific interest, no data are currently available concerning the quality of publicly available cyanobacterial genomes. As reliably detecting contaminants is a complex task, we designed a pipeline combining six methods in a consensus strategy to assess the contamination level of 440 genome assemblies of Cyanobacteria. Two methods are based on published reference databases of ribosomal genes (SSU rRNA 16S and ribosomal proteins), one is indirectly based on a reference database of marker genes (CheckM), and three are based on complete genome analysis. Among those genome-wide methods, Kraken and DIAMOND blastx share the same reference database that we derived from Ensembl Bacteria, whereas CONCOCT does not require any reference database, instead relying on differences in DNA tetramer frequencies. Given that all the six methods appear to have their own strengths and limitations, we used the consensus of their rankings to infer that >5% of cyanobacterial genome assemblies are highly contaminated by foreign DNA (i.e., contaminants were detected by 5 or 6 methods). Our results will help researchers to check the quality of publicly available genomic data before use in their own analyses. Moreover, we argue that journals should make mandatory the submission of raw read data along with genome assemblies in order to facilitate the detection of contaminants in sequence databases.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Genoma Bacteriano / Cianobacterias / Contaminación de ADN Tipo de estudio: Guideline Idioma: En Revista: PLoS One Asunto de la revista: CIENCIA / MEDICINA Año: 2018 Tipo del documento: Article País de afiliación: Bélgica

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Genoma Bacteriano / Cianobacterias / Contaminación de ADN Tipo de estudio: Guideline Idioma: En Revista: PLoS One Asunto de la revista: CIENCIA / MEDICINA Año: 2018 Tipo del documento: Article País de afiliación: Bélgica