RESUMEN
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from â¼700 newly sequenced microorganisms and â¼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Asunto(s)
Metagenómica , Programas Informáticos , Algoritmos , Benchmarking , Análisis de Secuencia de ADNRESUMEN
The predominance of rRNAs in the transcriptome is a major technical challenge in sequence-based analysis of cDNAs from microbial isolates and communities. Several approaches have been applied to deplete rRNAs from (meta)transcriptomes, but no systematic investigation of potential biases introduced by any of these approaches has been reported. Here we validated the effectiveness and fidelity of the two most commonly used approaches, subtractive hybridization and exonuclease digestion, as well as combinations of these treatments, on two synthetic five-microorganism metatranscriptomes using massively parallel sequencing. We found that the effectiveness of rRNA removal was a function of community composition and RNA integrity for these treatments. Subtractive hybridization alone introduced the least bias in relative transcript abundance, whereas exonuclease and in particular combined treatments greatly compromised mRNA abundance fidelity. Illumina sequencing itself also can compromise quantitative data analysis by introducing a G+C bias between runs.
Asunto(s)
Bacterias/clasificación , Euryarchaeota/clasificación , Perfilación de la Expresión Génica/métodos , ARN de Archaea/genética , ARN Bacteriano/genética , ARN Mensajero/genética , ARN Ribosómico/genética , Bacterias/genética , ADN de Archaea/genética , ADN Bacteriano/genética , Euryarchaeota/genética , Exonucleasas/metabolismo , Hibridación in Situ , ARN Ribosómico 16S/genética , ARN Ribosómico 23S/genética , Reproducibilidad de los Resultados , Alineación de Secuencia , Análisis de Secuencia de ARNRESUMEN
Fosmid end sequencing has been widely utilized in genome sequence assemblies and genome structural variation studies. We have developed a new approach to construct fosmid paired-end libraries that is suitable for Illumina sequencing platform. This approach employs a newly modified fosmid vector (pFosClip) which contains two loxP sites with identical orientation and two inverse Illumina adaptor priming sites flanking the cloning site. DNA prepared from the fosmid library constructed with pFosClip can be treated with the Cre recombinase to remove most of the vector DNA, leaving only 107 bp of the vector sequence with insert DNA. Frequent cutting restriction enzymes and ligase are used to digest the fosmid DNA to small (less than 1 Kb) fragments and recircularize the fosmid ends and all the internal fragments. Finally an inverse PCR step with the Illumina primers is used to enrich the fosmid paired ends (PEs) for sequencing. The advantages of this approach are the following: (1) the circularization of short fragments with sticky ends is efficient; therefore the success rate is higher than other approaches that attempt to join both blunt ends of large fosmid vectors; and (2) the restriction enzyme cutting generates an identifiable junction tag for splitting the paired reads. (3) Multiple restriction enzymes can be used to overcome possible enzyme-cutting bias. Our results have shown that this approach has produced mostly fosmid size (30-40 Kb) pairs from the targeted fungi and plant genomes and has drastically increased the scaffold sizes in the assembled genomes.
Asunto(s)
Biblioteca de Genes , Genoma Fúngico , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Integrasas/genética , Recombinación Genética , Saccharomyces cerevisiae/genética , Basidiomycota/genética , Basidiomycota/metabolismo , Enzimas de Restricción del ADN/metabolismo , ADN Circular/genética , ADN Circular/metabolismo , ADN de Hongos/genética , ADN de Hongos/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Integrasas/metabolismo , Plásmidos/química , Plásmidos/metabolismo , Reacción en Cadena de la Polimerasa/métodos , Saccharomyces cerevisiae/metabolismo , Análisis de Secuencia de ADN/métodosRESUMEN
Large insert mate pair reads have been used in de novo assembly and discovery of structural variants. We developed a new approach, Cre-LoxP inverse PCR paired end (CLIP-PE), which exploits the advantages of (1) Cre-LoxP recombination system to efficiently circularize large DNA fragments, (2) inverse PCR to enrich for the desired products that contain both ends of the large DNA fragments, and (3) use of restriction enzymes to introduce a recognizable junction site between ligated fragment ends. We have successfully created CLIP-PE libraries of up to 22 kb jumping pairs and demonstrated their ability to improve genome assemblies. The CLIP-PE methodology can be implemented with existing and future next-generation sequencing platforms.
Asunto(s)
Biblioteca de Genes , Genoma Fúngico , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Integrasas/genética , Recombinación Genética , Saccharomyces cerevisiae/genética , Enzimas de Restricción del ADN/metabolismo , ADN de Hongos/genética , ADN de Hongos/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Integrasas/metabolismo , Plásmidos/química , Plásmidos/metabolismo , Reacción en Cadena de la Polimerasa/métodos , Saccharomyces cerevisiae/metabolismo , Análisis de Secuencia de ADN/métodosRESUMEN
Large insert mate pair reads have a major impact on the overall success of de novo assembly and the discovery of inherited and acquired structural variants. The positional information of mate pair reads generally improves genome assembly by resolving repeat elements and/or ordering contigs. Currently available methods for building such libraries have one or more of limitations, such as relatively small insert size; unable to distinguish the junction of two ends; and/or low throughput. We developed a new approach, Cre-LoxP Inverse PCR Paired-End (CLIP-PE), which exploits the advantages of (1) Cre-LoxP recombination system to efficiently circularize large DNA fragments, (2) inverse PCR to enrich for the desired products that contain both ends of the large DNA fragments, and (3) the use of restriction enzymes to introduce a recognizable junction site between ligated fragment ends and to improve the self-ligation efficiency. We have successfully created CLIP-PE libraries up to 22 kb that are rich in informative read pairs and low in small fragment background. These libraries have demonstrated the ability to improve genome assemblies. The CLIP-PE methodology can be implemented with existing and future next-generation sequencing platforms.