RESUMO
While the malaria parasite Plasmodium falciparum has low average genome-wide diversity levels, likely due to its recent introduction from a gorilla-infecting ancestor (approximately 10,000 to 50,000 years ago), some genes display extremely high diversity levels. In particular, certain proteins expressed on the surface of human red blood cell-infecting merozoites (merozoite surface proteins (MSPs)) possess exactly 2 deeply diverged lineages that have seemingly not recombined. While of considerable interest, the evolutionary origin of this phenomenon remains unknown. In this study, we analysed the genetic diversity of 2 of the most variable MSPs, DBLMSP and DBLMSP2, which are paralogs (descended from an ancestral duplication). Despite thousands of available Illumina WGS datasets from malaria-endemic countries, diversity in these genes has been hard to characterise as reads containing highly diverged alleles completely fail to align to the reference genome. To solve this, we developed a pipeline leveraging genome graphs, enabling us to genotype them at high accuracy and completeness. Using our newly- resolved sequences, we found that both genes exhibit 2 deeply diverged lineages in a specific protein domain (DBL) and that one of the 2 lineages is shared across the genes. We identified clear evidence of nonallelic gene conversion between the 2 genes as the likely mechanism behind sharing, leading us to propose that gene conversion between diverged paralogs, and not recombination suppression, can generate this surprising genealogy; a model that is furthermore consistent with high diversity levels in these 2 genes despite the strong historical P. falciparum transmission bottleneck.
Assuntos
Hominidae , Malária Falciparum , Malária , Parasitos , Animais , Humanos , Plasmodium falciparum/metabolismo , Parasitos/metabolismo , Conversão Gênica , Antígenos de Superfície , Malária/parasitologia , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismo , Variação GenéticaRESUMO
MOTIVATION: Metagenome-Assembled Genomes (MAGs) or Single-cell Amplified Genomes (SAGs) are often incomplete, with sequences missing due to errors in assembly or low coverage. This presents a particular challenge for the identification of true gene frequencies within a microbial population, as core genes missing in only a few assemblies will be mischaracterized by current pangenome approaches. RESULTS: Here, we present CELEBRIMBOR, a Snakemake pangenome analysis pipeline which uses a measure of genome completeness to automatically adjust the frequency threshold at which core genes are identified, enabling accurate core gene identification in MAGs and SAGs. AVAILABILITY AND IMPLEMENTATION: CELEBRIMBOR is published under open source Apache 2.0 licence at https://github.com/bacpop/CELEBRIMBOR and is available as a Docker container from this repository. Supplementary material is available in the online version of the article.
Assuntos
Metagenoma , Software , Metagenômica/métodosRESUMO
The open sharing of genomic data provides an incredibly rich resource for the study of bacterial evolution and function and even anthropogenic activities such as the widespread use of antimicrobials. However, these data consist of genomes assembled with different tools and levels of quality checking, and of large volumes of completely unprocessed raw sequence data. In both cases, considerable computational effort is required before biological questions can be addressed. Here, we assembled and characterised 661,405 bacterial genomes retrieved from the European Nucleotide Archive (ENA) in November of 2018 using a uniform standardised approach. Of these, 311,006 did not previously have an assembly. We produced a searchable COmpact Bit-sliced Signature (COBS) index, facilitating the easy interrogation of the entire dataset for a specific sequence (e.g., gene, mutation, or plasmid). Additional MinHash and pp-sketch indices support genome-wide comparisons and estimations of genomic distance. Combined, this resource will allow data to be easily subset and searched, phylogenetic relationships between genomes to be quickly elucidated, and hypotheses rapidly generated and tested. We believe that this combination of uniform processing and variety of search/filter functionalities will make this a resource of very wide utility. In terms of diversity within the data, a breakdown of the 639,981 high-quality genomes emphasised the uneven species composition of the ENA/public databases, with just 20 of the total 2,336 species making up 90% of the genomes. The overrepresented species tend to be acute/common human pathogens, aligning with research priorities at different levels from individual interests to funding bodies and national and global public health agencies.
Assuntos
Bactérias/genética , Biodiversidade , DNA Bacteriano/genética , Curadoria de Dados , Sequência de Bases , Farmacorresistência Bacteriana/genética , Especificidade da EspécieRESUMO
The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which short reads do not capture the long-range context required for resolution, and mapping approaches, in which improper alignment of reads to a reference genome that is highly diverged from that of the sample can lead to false or partial calls. Long-read technologies can potentially solve such problems but are currently unfeasible to use at scale. Here we present Corticall, a graph-based method that combines the advantages of multiple technologies and prior data sources to detect arbitrary classes of genetic variant. We construct multisample, colored de Bruijn graphs from short-read data for all samples, align long-read-derived haplotypes and multiple reference data sources to restore graph connectivity information, and call variants using graph path-finding algorithms and a model for simultaneous alignment and recombination. We validate and evaluate the approach using extensive simulations and use it to characterize the rate and spectrum of de novo mutation events in 119 progeny from four Plasmodium falciparum experimental crosses, using long-read data on the parents to inform reconstructions of the progeny and to detect several known and novel nonallelic homologous recombination events.
Assuntos
Genoma de Protozoário/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação/genética , Plasmodium falciparum/genética , Sequenciamento Completo do Genoma/métodos , Algoritmos , Sequência de Bases , Variação Genética/genética , Alinhamento de Sequência , Análise de Sequência de DNA/métodos , SoftwareRESUMO
Universal access to drug susceptibility testing for newly diagnosed tuberculosis patients is recommended. Access to culture-based diagnostics remains limited, and targeted molecular assays are vulnerable to emerging resistance mutations. Improved protocols for direct-from-sputum Mycobacterium tuberculosis sequencing would accelerate access to comprehensive drug susceptibility testing and molecular typing. We assessed a thermo-protection buffer-based direct-from-sample M. tuberculosis whole-genome sequencing protocol. We prospectively analyzed 60 acid-fast bacilli smear-positive clinical sputum samples in India and Madagascar. A diversity of semiquantitative smear positivity-level samples were included. Sequencing was performed using Illumina and MinION (monoplex and multiplex) technologies. We measured the impact of bacterial inoculum and sequencing platforms on genomic read depth, drug susceptibility prediction performance, and typing accuracy. M. tuberculosis was identified by direct sputum sequencing in 45/51 samples using Illumina, 34/38 were identified using MinION-monoplex sequencing, and 20/24 were identified using MinION-multiplex sequencing. The fraction of M. tuberculosis reads from MinION sequencing was lower than from Illumina, but monoplexing grade 3+ samples on MinION produced higher read depth than Illumina (P < 0.05) and MinION multiplexing (P < 0.01). No significant differences in sensitivity and specificity of drug susceptibility predictions were seen across sequencing modalities or within each technology when stratified by smear grade. Illumina sequencing from sputum accurately identified 1/8 (rifampin) and 6/12 (isoniazid) resistant samples, compared to 2/3 (rifampin) and 3/6 (isoniazid) accurately identified with Nanopore monoplex. Lineage agreement levels between direct and culture-based sequencing were 85% (MinION-monoplex), 88% (Illumina), and 100% (MinION-multiplex). M. tuberculosis direct-from-sample whole-genome sequencing remains challenging. Improved and affordable sample treatment protocols are needed prior to clinical deployment.
Assuntos
Mycobacterium tuberculosis , Tuberculose Resistente a Múltiplos Medicamentos , Tuberculose , Humanos , Mycobacterium tuberculosis/genética , Antituberculosos/farmacologia , Antituberculosos/uso terapêutico , Isoniazida , Rifampina , Testes de Sensibilidade Microbiana , Escarro/microbiologia , Tuberculose/diagnóstico , Tuberculose/tratamento farmacológico , Genômica , Tuberculose Resistente a Múltiplos Medicamentos/microbiologiaRESUMO
SUMMARY: Viral sequence data from clinical samples frequently contain contaminating human reads, which must be removed prior to sharing for legal and ethical reasons. To enable host read removal for SARS-CoV-2 sequencing data on low-specification laptops, we developed ReadItAndKeep, a fast lightweight tool for Illumina and nanopore data that only keeps reads matching the SARS-CoV-2 genome. Peak RAM usage is typically below 10 MB, and runtime less than 1 min. We show that by excluding the polyA tail from the viral reference, ReadItAndKeep prevents bleed-through of human reads, whereas mapping to the human genome lets some reads escape. We believe our test approach (including all possible reads from the human genome, human samples from each of the 26 populations in the 1000 genomes data and a diverse set of SARS-CoV-2 genomes) will also be useful for others. AVAILABILITY AND IMPLEMENTATION: ReadItAndKeep is implemented in C++, released under the MIT license, and available from https://github.com/GenomePathogenAnalysisService/read-it-and-keep. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
COVID-19 , Software , Humanos , Análise de Sequência de DNA , SARS-CoV-2/genética , Descontaminação , Sequenciamento de Nucleotídeos em Larga Escala , Genoma HumanoRESUMO
MOTIVATION: Short-read whole-genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences and sequencing bias reduces the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. To benchmark short-read variant calling, we used 36 diverse clinical Mycobacterium tuberculosis (Mtb) isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically studied the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias and GC content. RESULTS: Reference-based Illumina variant calling demonstrated a maximum recall of 89.0% and minimum precision of 98.5% across parameters evaluated. The approach that maximized variant recall while still maintaining high precision (<99%) was tuning the mapping quality filtering threshold, i.e. confidence of the read mapping (recall = 85.8%, precision = 99.1%, MQ ≥ 40). Additional masking of repetitive sequence content is an alternative conservative approach to variant calling that increases precision at cost to recall (recall = 70.2%, precision = 99.6%, MQ ≥ 40). Of the genomic positions typically excluded for Mtb, 68% are accurately called using Illumina WGS including 52/168 PE/PPE genes (34.5%). From these results, we present a refined list of low confidence regions across the Mtb genome, which we found to frequently overlap with regions with structural variation, low sequence uniqueness and low sequencing coverage. Our benchmarking results have broad implications for the use of WGS in the study of Mtb biology, inference of transmission in public health surveillance systems and more generally for WGS applications in other organisms. AVAILABILITY AND IMPLEMENTATION: All relevant code is available at https://github.com/farhat-lab/mtb-illumina-wgs-evaluation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Mycobacterium tuberculosis , Tuberculose , Humanos , Benchmarking , Mycobacterium tuberculosis/genética , Software , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
We diagnosed tuberculosis in an illegally wild-captured pet ring-tailed lemur manifesting lethargy, anorexia, and cervical lymphadenopathy. Whole-genome sequencing confirmed the Mycobacterium tuberculosis isolate belonged to lineage 3 and harbored streptomycin resistance. We recommend reverse zoonosis prevention and determination of whether lemurs are able to maintain M. tuberculosis infection.
Assuntos
Lemur , Tuberculose Resistente a Múltiplos Medicamentos , Animais , MadagáscarRESUMO
BACKGROUND: The World Health Organization recommends drug-susceptibility testing of Mycobacterium tuberculosis complex for all patients with tuberculosis to guide treatment decisions and improve outcomes. Whether DNA sequencing can be used to accurately predict profiles of susceptibility to first-line antituberculosis drugs has not been clear. METHODS: We obtained whole-genome sequences and associated phenotypes of resistance or susceptibility to the first-line antituberculosis drugs isoniazid, rifampin, ethambutol, and pyrazinamide for isolates from 16 countries across six continents. For each isolate, mutations associated with drug resistance and drug susceptibility were identified across nine genes, and individual phenotypes were predicted unless mutations of unknown association were also present. To identify how whole-genome sequencing might direct first-line drug therapy, complete susceptibility profiles were predicted. These profiles were predicted to be susceptible to all four drugs (i.e., pansusceptible) if they were predicted to be susceptible to isoniazid and to the other drugs or if they contained mutations of unknown association in genes that affect susceptibility to the other drugs. We simulated the way in which the negative predictive value changed with the prevalence of drug resistance. RESULTS: A total of 10,209 isolates were analyzed. The largest proportion of phenotypes was predicted for rifampin (9660 [95.4%] of 10,130) and the smallest was predicted for ethambutol (8794 [89.8%] of 9794). Resistance to isoniazid, rifampin, ethambutol, and pyrazinamide was correctly predicted with 97.1%, 97.5%, 94.6%, and 91.3% sensitivity, respectively, and susceptibility to these drugs was correctly predicted with 99.0%, 98.8%, 93.6%, and 96.8% specificity. Of the 7516 isolates with complete phenotypic drug-susceptibility profiles, 5865 (78.0%) had complete genotypic predictions, among which 5250 profiles (89.5%) were correctly predicted. Among the 4037 phenotypic profiles that were predicted to be pansusceptible, 3952 (97.9%) were correctly predicted. CONCLUSIONS: Genotypic predictions of the susceptibility of M. tuberculosis to first-line drugs were found to be correlated with phenotypic susceptibility to these drugs. (Funded by the Bill and Melinda Gates Foundation and others.).
Assuntos
Antituberculosos/farmacologia , Farmacorresistência Bacteriana/genética , Genoma Bacteriano , Mycobacterium tuberculosis/genética , Tuberculose/tratamento farmacológico , Sequenciamento Completo do Genoma , Antituberculosos/uso terapêutico , Etambutol/farmacologia , Genótipo , Humanos , Isoniazida/farmacologia , Testes de Sensibilidade Microbiana , Mutação , Mycobacterium tuberculosis/efeitos dos fármacos , Mycobacterium tuberculosis/isolamento & purificação , Fenótipo , Pirazinamida/farmacologia , Rifampina/farmacologia , Tuberculose/microbiologiaRESUMO
MOTIVATION: In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets. RESULTS: We used REINDEER to index the abundances of sequences within 2585 human RNA-seq experiments in 45 h using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of â¼4 billion distinct k-mers across 2585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph of each dataset, then conceptually merges those de Bruijn graphs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances. AVAILABILITY AND IMPLEMENTATION: https://github.com/kamimrcht/REINDEER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Análise de Sequência de DNA , Software , Algoritmos , Humanos , Análise de Sequência de RNARESUMO
We are rapidly approaching the point where we have sequenced millions of human genomes. There is a pressing need for new data structures to store raw sequencing data and efficient algorithms for population scale analysis. Current reference-based data formats do not fully exploit the redundancy in population sequencing nor take advantage of shared genetic variation. In recent years, the Burrows-Wheeler transform (BWT) and FM-index have been widely employed as a full-text searchable index for read alignment and de novo assembly. We introduce the concept of a population BWT and use it to store and index the sequencing reads of 2705 samples from the 1000 Genomes Project. A key feature is that, as more genomes are added, identical read sequences are increasingly observed, and compression becomes more efficient. We assess the support in the 1000 Genomes read data for every base position of two human reference assembly versions, identifying that 3.2 Mbp with population support was lost in the transition from GRCh37 with 13.7 Mbp added to GRCh38. We show that the vast majority of variant alleles can be uniquely described by overlapping 31-mers and show how rapid and accurate SNP and indel genotyping can be carried out across the genomes in the population BWT. We use the population BWT to carry out nonreference queries to search for the presence of all known viral genomes and discover human T-lymphotropic virus 1 integrations in six samples in a recognized epidemiological distribution.
Assuntos
Genoma Humano/genética , Genômica , Alinhamento de Sequência/métodos , Sequenciamento Completo do Genoma/métodos , Alelos , Compressão de Dados , Genótipo , Humanos , Mutação INDEL/genética , Análise de Sequência de DNA , SoftwareRESUMO
Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased "Platinum" variant catalog of 4.7 million single-nucleotide variants (SNVs) plus 0.7 million small (1-50 bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and 11 children of this pedigree. Platinum genotypes are highly concordant with the current catalog of the National Institute of Standards and Technology for both SNVs (>99.99%) and indels (99.92%) and add a validated truth catalog that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission ("nonplatinum") revealed that the majority of these variants are de novo and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.
Assuntos
Genoma Humano/genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Bases de Dados Genéticas , Exoma/genética , Genótipo , Humanos , Mutação INDEL/genética , Linhagem , Polimorfismo de Nucleotídeo Único , SoftwareRESUMO
The clinical phenotype of zoonotic tuberculosis and its contribution to the global burden of disease are poorly understood and probably underestimated. This shortcoming is partly because of the inability of currently available laboratory and in silico tools to accurately identify all subspecies of the Mycobacterium tuberculosis complex (MTBC). We present SNPs to Identify TB (SNP-IT), a single-nucleotide polymorphism-based tool to identify all members of MTBC, including animal clades. By applying SNP-IT to a collection of clinical genomes from a UK reference laboratory, we detected an unexpectedly high number of M. orygis isolates. M. orygis is seen at a similar rate to M. bovis, yet M. orygis cases have not been previously described in the United Kingdom. From an international perspective, it is possible that M. orygis is an underestimated zoonosis. Accurate identification will enable study of the clinical phenotype, host range, and transmission mechanisms of all subspecies of MTBC in greater detail.
Assuntos
Mycobacterium tuberculosis/classificação , Mycobacterium tuberculosis/genética , Polimorfismo de Nucleotídeo Único , Tuberculose/epidemiologia , Tuberculose/microbiologia , Animais , Antituberculosos/farmacologia , Biologia Computacional/métodos , DNA Bacteriano , Farmacorresistência Bacteriana , Marcadores Genéticos , Humanos , Tipagem Molecular , Mycobacterium tuberculosis/efeitos dos fármacos , Mycobacterium tuberculosis/isolamento & purificação , Filogenia , Prevalência , Zoonoses/epidemiologia , Zoonoses/microbiologiaRESUMO
The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired.
Assuntos
Resistência a Medicamentos/genética , Variação Genética , Malária Falciparum/genética , Plasmodium falciparum/genética , Mapeamento Cromossômico , Variações do Número de Cópias de DNA/genética , Genoma de Protozoário/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Malária Falciparum/tratamento farmacológico , Malária Falciparum/parasitologia , Meiose/genética , Plasmodium falciparum/efeitos dos fármacos , Plasmodium falciparum/patogenicidade , Polimorfismo de Nucleotídeo Único , Recombinação Genética/genéticaRESUMO
Motivation: The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input data. Results: We present a novel assembly graph data structure: the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both our de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterize the genomic context of drug-resistance genes. Availability and implementation: Linked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, which is available under the MIT license at https://github.com/mcveanlab/mccortex. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Visualização de Dados , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Humanos , Klebsiella pneumoniae/genéticaRESUMO
Motivation: Correct and rapid determination of Mycobacterium tuberculosis (MTB) resistance against available tuberculosis (TB) drugs is essential for the control and management of TB. Conventional molecular diagnostic test assumes that the presence of any well-studied single nucleotide polymorphisms is sufficient to cause resistance, which yields low sensitivity for resistance classification. Summary: Given the availability of DNA sequencing data from MTB, we developed machine learning models for a cohort of 1839 UK bacterial isolates to classify MTB resistance against eight anti-TB drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, ciprofloxacin, moxifloxacin, ofloxacin, streptomycin) and to classify multi-drug resistance. Results: Compared to previous rules-based approach, the sensitivities from the best-performing models increased by 2-4% for isoniazid, rifampicin and ethambutol to 97% (P < 0.01), respectively; for ciprofloxacin and multi-drug resistant TB, they increased to 96%. For moxifloxacin and ofloxacin, sensitivities increased by 12 and 15% from 83 and 81% based on existing known resistance alleles to 95% and 96% (P < 0.01), respectively. Particularly, our models improved sensitivities compared to the previous rules-based approach by 15 and 24% to 84 and 87% for pyrazinamide and streptomycin (P < 0.01), respectively. The best-performing models increase the area-under-the-ROC curve by 10% for pyrazinamide and streptomycin (P < 0.01), and 4-8% for other drugs (P < 0.01). Availability and implementation: The details of source code are provided at http://www.robots.ox.ac.uk/~davidc/code.php. Contact: david.clifton@eng.ox.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Antituberculosos/uso terapêutico , Aprendizado de Máquina , Mycobacterium tuberculosis/genética , Análise de Sequência de DNA/métodos , Tuberculose Resistente a Múltiplos Medicamentos/genética , Ciprofloxacina/uso terapêutico , Etambutol/uso terapêutico , Humanos , Isoniazida/uso terapêutico , Testes de Sensibilidade Microbiana , Moxifloxacina/uso terapêutico , Mycobacterium tuberculosis/classificação , Ofloxacino/uso terapêutico , Pirazinamida/uso terapêutico , Rifampina/uso terapêutico , Estreptomicina/uso terapêutico , Tuberculose Resistente a Múltiplos Medicamentos/tratamento farmacológicoRESUMO
In principle, whole-genome sequencing (WGS) can predict phenotypic resistance directly from a genotype, replacing laboratory-based tests. However, the contribution of different bioinformatics methods to genotype-phenotype discrepancies has not been systematically explored to date. We compared three WGS-based bioinformatics methods (Genefinder [read based], Mykrobe [de Bruijn graph based], and Typewriter [BLAST based]) for predicting the presence/absence of 83 different resistance determinants and virulence genes and overall antimicrobial susceptibility in 1,379 Staphylococcus aureus isolates previously characterized by standard laboratory methods (disc diffusion, broth and/or agar dilution, and PCR). In total, 99.5% (113,830/114,457) of individual resistance-determinant/virulence gene predictions were identical between all three methods, with only 627 (0.5%) discordant predictions, demonstrating high overall agreement (Fleiss' kappa = 0.98, P < 0.0001). Discrepancies when identified were in only one of the three methods for all genes except the cassette recombinase, ccrC(b). The genotypic antimicrobial susceptibility prediction matched the laboratory phenotype in 98.3% (14,224/14,464) of cases (2,720 [18.8%] resistant, 11,504 [79.5%] susceptible). There was greater disagreement between the laboratory phenotypes and the combined genotypic predictions (97 [0.7%] phenotypically susceptible, but all bioinformatic methods reported resistance; 89 [0.6%] phenotypically resistant, but all bioinformatics methods reported susceptible) than within the three bioinformatics methods (54 [0.4%] cases, 16 phenotypically resistant, 38 phenotypically susceptible). However, in 36/54 (67%) cases, the consensus genotype matched the laboratory phenotype. In this study, the choice between these three specific bioinformatic methods to identify resistance determinants or other genes in S. aureus did not prove critical, with all demonstrating high concordance with each other and phenotypic/molecular methods. However, each has some limitations; therefore, consensus methods provide some assurance.
Assuntos
Biologia Computacional/métodos , Farmacorresistência Bacteriana/genética , Genoma Bacteriano/genética , Staphylococcus aureus/genética , Fatores de Virulência/genética , Antibacterianos/farmacologia , Farmacorresistência Bacteriana/efeitos dos fármacos , Genótipo , Humanos , Testes de Sensibilidade Microbiana , Fenótipo , Sensibilidade e Especificidade , Análise de Sequência de DNA , Software , Infecções Estafilocócicas/microbiologia , Staphylococcus aureus/efeitos dos fármacos , Staphylococcus aureus/isolamento & purificaçãoRESUMO
Use of whole-genome sequencing (WGS) for routine mycobacterial species identification and drug susceptibility testing (DST) is becoming a reality. We compared the performances of WGS and standard laboratory workflows prospectively, by parallel processing at a major mycobacterial reference service over the course of 1 year, for species identification, first-line Mycobacterium tuberculosis resistance prediction, and turnaround time. Among 2,039 isolates with line probe assay results for species identification, 74 (3.6%) failed sequencing or WGS species identification. Excluding these isolates, clinically important species were identified for 1,902 isolates, of which 1,825 (96.0%) were identified as the same species by WGS and the line probe assay. A total of 2,157 line probe test results for detection of resistance to the first-line drugs isoniazid and rifampin were available for 728 M. tuberculosis complex isolates. Excluding 216 (10.0%) cases where there were insufficient sequencing data for WGS to make a prediction, overall concordance was 99.3% (95% confidence interval [CI], 98.9 to 99.6%), sensitivity was 97.6% (91.7 to 99.7%), and specificity was 99.5% (99.0 to 99.7%). A total of 2,982 phenotypic DST results were available for 777 M. tuberculosis complex isolates. Of these, 356 (11.9%) had no WGS comparator due to insufficient sequencing data, and in 154 (5.2%) cases the WGS prediction was indeterminate due to discovery of novel, previously uncharacterized mutations. Excluding these data, overall concordance was 99.2% (98.7 to 99.5%), sensitivity was 94.2% (88.4 to 97.6%), and specificity was 99.4% (99.0 to 99.7%). Median processing times for the routine laboratory tests versus WGS were similar overall, i.e., 20 days (interquartile range [IQR], 15 to 31 days) and 21 days (15 to 29 days), respectively (P = 0.41). In conclusion, WGS predicts species and drug susceptibility with great accuracy, but work is needed to increase the proportion of predictions made.
Assuntos
Farmacorresistência Bacteriana/genética , Genoma Bacteriano/genética , Tipagem Molecular/métodos , Mycobacterium tuberculosis/isolamento & purificação , Tuberculose/microbiologia , Antituberculosos/farmacologia , Farmacorresistência Bacteriana/efeitos dos fármacos , Humanos , Isoniazida/farmacologia , Mycobacterium tuberculosis/efeitos dos fármacos , Mycobacterium tuberculosis/genética , Estudos Prospectivos , Rifampina/farmacologia , Sensibilidade e Especificidade , Fatores de Tempo , Tuberculose/diagnósticoRESUMO
Diversity at pathogen genetic loci can be driven by host adaptive immune selection pressure and may reveal proteins important for parasite biology. Population-based genome sequencing of Plasmodium falciparum, the parasite responsible for the most severe form of malaria, has highlighted two related polymorphic genes called dblmsp and dblmsp2, which encode Duffy binding-like (DBL) domain-containing proteins located on the merozoite surface but whose function remains unknown. Using recombinant proteins and transgenic parasites, we show that DBLMSP and DBLMSP2 directly and avidly bind human IgM via their DBL domains. We used whole genome sequence data from over 400 African and Asian P. falciparum isolates to show that dblmsp and dblmsp2 exhibit extreme protein polymorphism in their DBL domain, with multiple variants of two major allelic classes present in every population tested. Despite this variability, the IgM binding function was retained across diverse sequence representatives. Although this interaction did not seem to have an effect on the ability of the parasite to invade red blood cells, binding of DBLMSP and DBLMSP2 to IgM inhibited the overall immunoreactivity of these proteins to IgG from patients who had been exposed to the parasite. This suggests that IgM binding might mask these proteins from the host humoral immune system.
Assuntos
Antígenos de Protozoários/metabolismo , Imunoglobulina M/metabolismo , Plasmodium falciparum/metabolismo , Proteínas de Protozoários/metabolismo , Animais , Humanos , Ligação ProteicaRESUMO
Routine full characterization of Mycobacterium tuberculosis is culture based, taking many weeks. Whole-genome sequencing (WGS) can generate antibiotic susceptibility profiles to inform treatment, augmented with strain information for global surveillance; such data could be transformative if provided at or near the point of care. We demonstrate a low-cost method of DNA extraction directly from patient samples for M. tuberculosis WGS. We initially evaluated the method by using the Illumina MiSeq sequencer (40 smear-positive respiratory samples obtained after routine clinical testing and 27 matched liquid cultures). M. tuberculosis was identified in all 39 samples from which DNA was successfully extracted. Sufficient data for antibiotic susceptibility prediction were obtained from 24 (62%) samples; all results were concordant with reference laboratory phenotypes. Phylogenetic placement was concordant between direct and cultured samples. With Illumina MiSeq/MiniSeq, the workflow from patient sample to results can be completed in 44/16 h at a reagent cost of £96/£198 per sample. We then employed a nonspecific PCR-based library preparation method for sequencing on an Oxford Nanopore Technologies MinION sequencer. We applied this to cultured Mycobacterium bovis strain BCG DNA and to combined culture-negative sputum DNA and BCG DNA. For flow cell version R9.4, the estimated turnaround time from patient to identification of BCG, detection of pyrazinamide resistance, and phylogenetic placement was 7.5 h, with full susceptibility results 5 h later. Antibiotic susceptibility predictions were fully concordant. A critical advantage of MinION is the ability to continue sequencing until sufficient coverage is obtained, providing a potential solution to the problem of variable amounts of M. tuberculosis DNA in direct samples.