RESUMEN
Proksee (https://proksee.ca) provides users with a powerful, easy-to-use, and feature-rich system for assembling, annotating, analysing, and visualizing bacterial genomes. Proksee accepts Illumina sequence reads as compressed FASTQ files or pre-assembled contigs in raw, FASTA, or GenBank format. Alternatively, users can supply a GenBank accession or a previously generated Proksee map in JSON format. Proksee then performs assembly (for raw sequence data), generates a graphical map, and provides an interface for customizing the map and launching further analysis jobs. Notable features of Proksee include unique and informative assembly metrics provided via a custom reference database of assemblies; a deeply integrated high-performance genome browser for viewing and comparing analysis results at individual base resolution (developed specifically for Proksee); an ever-growing list of embedded analysis tools whose results can be seamlessly added to the map or searched and explored in other formats; and the option to export graphical maps, analysis results, and log files for data sharing and research reproducibility. All these features are provided via a carefully designed multi-server cloud-based system that can easily scale to meet user demand and that ensures the web server is robust and responsive.
Asunto(s)
Genoma Bacteriano , Programas Informáticos , Reproducibilidad de los Resultados , Bases de Datos de Ácidos Nucleicos , InternetRESUMEN
Hepatitis B virus (HBV) vaccination starting at birth is approximately 95% effective in preventing mother-to-child transmission to infants born to HBV-infected mothers. A higher risk of transmission is associated with birth to a highly viremic mother, often due to transplacental exposure, while later horizontal transmission is much less common, particularly following complete vaccination. This study reports a case of infection in an older child despite appropriate immunoprophylaxis starting at birth and an apparent protective immune response post-vaccination. Two immune escape mutations within the antigenic determinant of the surface antigen-coding region were observed in the child's dominant HBV sequence, whereas the maternal HBV variant lacked mutations at both sites. Ultra-deep sequencing confirmed the presence of 1 mutation at low levels within the maternal HBV quasispecies population, suggesting early exposure to the child followed by viral evolution resulting in immunoprophylaxis escape and chronic infection.
Asunto(s)
Vacunas contra Hepatitis B/inmunología , Virus de la Hepatitis B/inmunología , Hepatitis B/transmisión , Evasión Inmune/inmunología , Mutación/inmunología , Preescolar , Femenino , Hepatitis B/inmunología , Hepatitis B/prevención & control , Antígenos de Superficie de la Hepatitis B/genética , Antígenos de Superficie de la Hepatitis B/inmunología , Virus de la Hepatitis B/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Transmisión Vertical de Enfermedad Infecciosa/prevención & control , Embarazo , Complicaciones Infecciosas del Embarazo/inmunología , Complicaciones Infecciosas del Embarazo/virologíaRESUMEN
The ready availability of vast amounts of genomic sequence data has created the need to rethink comparative genomics algorithms using 'big data' approaches. Neptune is an efficient system for rapidly locating differentially abundant genomic content in bacterial populations using an exact k-mer matching strategy, while accommodating k-mer mismatches. Neptune's loci discovery process identifies sequences that are sufficiently common to a group of target sequences and sufficiently absent from non-targets using probabilistic models. Neptune uses parallel computing to efficiently identify and extract these loci from draft genome assemblies without requiring multiple sequence alignments or other computationally expensive comparative sequence analyses. Tests on simulated and real datasets showed that Neptune rapidly identifies regions that are both sensitive and specific. We demonstrate that this system can identify trait-specific loci from different bacterial lineages. Neptune is broadly applicable for comparative bacterial analyses, yet will particularly benefit pathogenomic applications, owing to efficient and sensitive discovery of differentially abundant genomic loci. The software is available for download at: http://github.com/phac-nml/neptune.
Asunto(s)
Bacterias/genética , Biología Computacional/métodos , Análisis Mutacional de ADN/métodos , Estudios de Asociación Genética , Técnicas Microbiológicas/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Bacillus anthracis/genética , Regulación Bacteriana de la Expresión Génica , Genoma Bacteriano , Transcriptoma , Vibrio cholerae/genéticaRESUMEN
BACKGROUND: Second-generation sequencers generate millions of relatively short, but error-prone, reads. These errors make sequence assembly and other downstream projects more challenging. Correcting these errors improves the quality of assemblies and projects which benefit from error-free reads. RESULTS: We have developed a general-purpose error corrector that corrects errors introduced by Illumina, Ion Torrent, and Roche 454 sequencing technologies and can be applied to single- or mixed-genome data. In addition to correcting substitution errors, we locate and correct insertion, deletion, and homopolymer errors while remaining sensitive to low coverage areas of sequencing projects. Using published data sets, we correct 94% of Illumina MiSeq errors, 88% of Ion Torrent PGM errors, 85% of Roche 454 GS Junior errors. Introduced errors are 20 to 70 times more rare than successfully corrected errors. Furthermore, we show that the quality of assemblies improves when reads are corrected by our software. CONCLUSIONS: Pollux is highly effective at correcting errors across platforms, and is consistently able to perform as well or better than currently available error correction software. Pollux provides general-purpose error correction and may be used in applications with or without assembly.
Asunto(s)
Algoritmos , Bacterias/genética , Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Análisis de Secuencia de ADN/instrumentación , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Bacterias/clasificación , Biología Computacional , ADN Bacteriano/análisis , Bases de Datos GenéticasRESUMEN
Introduction: Serum hepatitis B virus (HBV) RNA is a promising new biomarker to manage and predict clinical outcomes of chronic hepatitis B (CHB) infection. However, the HBV serum transcriptome within encapsidated particles, which is the biomarker analyte measured in serum, remains poorly characterized. This study aimed to evaluate serum HBV RNA transcript composition and proportionality by PCR-cDNA nanopore sequencing of samples from CHB patients having varied HBV genotype (gt, A to F) and HBeAg status. Methods: Longitudinal specimens from 3 individuals during and following pregnancy (approximately 7 months between time points) were also investigated. HBV RNA extracted from 16 serum samples obtained from 13 patients (73.3% female, 84.6% Asian) was sequenced and serum HBV RNA isoform detection and quantification were performed using three bioinformatic workflows; FLAIR, RATTLE, and a GraphMap-based workflow within the Galaxy application. A spike-in RNA variant (SIRV) control mix was used to assess run quality and coverage. The proportionality of transcript isoforms was based on total HBV reads determined by each workflow. Results: All chosen isoform detection workflows showed high agreement in transcript proportionality and composition for most samples. HBV pregenomic RNA (pgRNA) was the most frequently observed transcript isoform (93.8% of patient samples), while other detected transcripts included pgRNA spliced variants, 3' truncated variants and HBx mRNA, depending on the isoform detection method. Spliced variants of pgRNA were primarily observed in HBV gtB, C, E, or F-infected patients, with the Sp1 spliced variant detected most frequently. Twelve other pgRNA spliced variant transcripts were identified, including 3 previously unidentified transcripts, although spliced isoform identification was very dependent on the workflow used to analyze sequence data. Longitudinal sampling among pregnant and post-partum antiviral-treated individuals showed increasing proportions of 3' truncated pgRNA variants over time. Conclusions: This study demonstrated long-read sequencing as a promising tool for the characterization of the serum HBV transcriptome. However, further studies are needed to better understand how serum HBV RNA isoform type and proportion are linked to CHB disease progression and antiviral treatment response.
RESUMEN
Next generation sequencing (NGS) is a trending new standard for genotypic HIV-1 drug resistance (HIVDR) testing. Many NGS HIVDR data analysis pipelines have been independently developed, each with variable outputs and data management protocols. Standardization of such analytical methods and comparison of available pipelines are lacking, yet may impact subsequent HIVDR interpretation and other downstream applications. Here we compared the performance of five NGS HIVDR pipelines using proficiency panel samples from NIAID Virology Quality Assurance (VQA) program. Ten VQA panel specimens were genotyped by each of six international laboratories using their own in-house NGS assays. Raw NGS data were then processed using each of the five different pipelines including HyDRA, MiCall, PASeq, Hivmmer and DEEPGEN. All pipelines detected amino acid variants (AAVs) at full range of frequencies (1~100%) and demonstrated good linearity as compared to the reference frequency values. While the sensitivity in detecting low abundance AAVs, with frequencies between 1~20%, is less a concern for all pipelines, their specificity dramatically decreased at AAV frequencies <2%, suggesting that 2% threshold may be a more reliable reporting threshold for ensured specificity in AAV calling and reporting. More variations were observed among the pipelines when low abundance AAVs are concerned, likely due to differences in their NGS read quality control strategies. Findings from this study highlight the need for standardized strategies for NGS HIVDR data analysis, especially for the detection of minority HIVDR variants.
Asunto(s)
Farmacorresistencia Viral/genética , VIH-1/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Aminoácidos/genética , Variación Genética/genética , Genotipo , Infecciones por VIH/virología , Seropositividad para VIH , Humanos , Sensibilidad y EspecificidadRESUMEN
Developments in high-throughput next generation sequencing (NGS) technology have rapidly advanced the understanding of overall microbial ecology as well as occurrence and diversity of specific genes within diverse environments. In the present study, we compared the ability of varying sequencing depths to generate meaningful information about the taxonomic structure and prevalence of antimicrobial resistance genes (ARGs) in the bovine fecal microbial community. Metagenomic sequencing was conducted on eight composite fecal samples originating from four beef cattle feedlots. Metagenomic DNA was sequenced to various depths, D1, D0.5 and D0.25, with average sample read counts of 117, 59 and 26 million, respectively. A comparative analysis of the relative abundance of reads aligning to different phyla and antimicrobial classes indicated that the relative proportions of read assignments remained fairly constant regardless of depth. However, the number of reads being assigned to ARGs as well as to microbial taxa increased significantly with increasing depth. We found a depth of D0.5 was suitable to describe the microbiome and resistome of cattle fecal samples. This study helps define a balance between cost and required sequencing depth to acquire meaningful results.
Asunto(s)
Archaea/genética , Bacterias/genética , Resistencia a Medicamentos/genética , Hongos/genética , Microbioma Gastrointestinal/genética , Virus/genética , Animales , Antibacterianos/farmacología , Archaea/clasificación , Archaea/efectos de los fármacos , Archaea/aislamiento & purificación , Proteínas Arqueales/genética , Proteínas Arqueales/metabolismo , Bacterias/clasificación , Bacterias/efectos de los fármacos , Bacterias/aislamiento & purificación , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Bovinos , Heces/microbiología , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Hongos/clasificación , Hongos/efectos de los fármacos , Hongos/aislamiento & purificación , Microbioma Gastrointestinal/efectos de los fármacos , Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Metagenómica/métodos , Filogenia , Proteínas Virales/genética , Proteínas Virales/metabolismo , Virus/clasificación , Virus/efectos de los fármacos , Virus/aislamiento & purificaciónRESUMEN
INTRODUCTION: Next-generation sequencing (NGS) has several advantages over conventional Sanger sequencing for HIV drug resistance (HIVDR) genotyping, including detection and quantitation of low-abundance variants bearing drug resistance mutations (DRMs). However, the high HIV genomic diversity, unprecedented large volume of data, complexity of analysis and potential for error pose significant challenges for data processing. Several NGS analysis pipelines have been developed and used in HIVDR research; however, the absence of uniformity in data processing strategies results in lack of consistency and comparability of outputs from different pipelines. To fill this gap, an international symposium on bioinformatic strategies for NGS-based HIVDR testing was held in February 2018 in Winnipeg, Canada, convening laboratory scientists, bioinformaticians and clinicians involved in four recently developed, publicly available NGS HIVDR pipelines. The goal of this symposium was to establish a consensus on effective bioinformatic strategies for NGS data management and its use for HIVDR reporting. DISCUSSION: Essential functionalities of an NGS HIVDR pipeline were divided into five analytic blocks: (1) NGS read quality control (QC)/quality assurance (QA); (2) NGS read alignment and reference mapping; (3) HIV variant calling and variant QC; (4) NGS HIVDR reporting; and (5) extended data applications and additional considerations for data management. The consensuses reached among the participants on all major aspects of these blocks are summarized here. They encompass not only recommended data management and analysis strategies, but also detailed bioinformatic approaches that help ensure accuracy of the derived HIVDR analysis outputs for both research and potential clinical use. CONCLUSIONS: While NGS is being adopted more broadly in HIVDR testing laboratories, data processing is often a bottleneck hindering its generalized application. The proposed standardization of NGS read QC/QA, read alignment and reference mapping, variant calling and QC, HIVDR reporting and relevant data management strategies in this "Winnipeg Consensus" may serve as a starting guideline for NGS HIVDR data processing that informs the refinement of existing pipelines and those yet to be developed. Moreover, the bioinformatic strategies presented here may apply more broadly to NGS data analysis of microbes harbouring significant genomic diversity.