RESUMO
MOTIVATION: The size and complexity of modern large-scale genome variation studies demand novel approaches for exploring and sharing the data. In order to unlock the potential of these data for a broad audience of scientists with various areas of expertise, a unified exploration framework is required that is accessible, coherent and user-friendly. RESULTS: Panoptes is an open-source software framework for collaborative visual exploration of large-scale genome variation data and associated metadata in a web browser. It relies on technology choices that allow it to operate in near real-time on very large datasets. It can be used to browse rich, hybrid content in a coherent way, and offers interactive visual analytics approaches to assist the exploration. We illustrate its application using genome variation data of Anopheles gambiae, Plasmodium falciparum and Plasmodium vivax. AVAILABILITY AND IMPLEMENTATION: Freely available at https://github.com/cggh/panoptes, under the GNU Affero General Public License. CONTACT: paul.vauterin@gmail.com.
Assuntos
Variação Genética , Análise de Sequência de DNA/métodos , Software , Animais , Anopheles/genética , Genômica/métodos , Internet , Metadados , Plasmodium falciparum/genética , Plasmodium vivax/genética , NavegadorRESUMO
The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired.
Assuntos
Resistência a Medicamentos/genética , Variação Genética , Malária Falciparum/genética , Plasmodium falciparum/genética , Mapeamento Cromossômico , Variações do Número de Cópias de DNA/genética , Genoma de Protozoário/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Malária Falciparum/tratamento farmacológico , Malária Falciparum/parasitologia , Meiose/genética , Plasmodium falciparum/efeitos dos fármacos , Plasmodium falciparum/patogenicidade , Polimorfismo de Nucleotídeo Único , Recombinação Genética/genéticaRESUMO
After its emergence in 2003, a livestock-associated (LA-)MRSA clade (CC398) has caused an impressive increase in the number of isolates submitted for the Dutch national MRSA surveillance and now comprises 40% of all isolates. The currently used molecular typing techniques have limited discriminatory power for this MRSA clade, which hampers studies on the origin and transmission routes. Recently, a new molecular analysis technique named whole genome mapping was introduced. This method creates high-resolution, ordered whole genome restriction maps that may have potential for strain typing. In this study, we assessed and validated the capability of whole genome mapping to differentiate LA-MRSA isolates. Multiple validation experiments showed that whole genome mapping produced highly reproducible results. Assessment of the technique on two well-documented MRSA outbreaks showed that whole genome mapping was able to confirm one outbreak, but revealed major differences between the maps of a second, indicating that not all isolates belonged to this outbreak. Whole genome mapping of LA-MRSA isolates that were epidemiologically unlinked provided a much higher discriminatory power than spa-typing or MLVA. In contrast, maps created from LA-MRSA isolates obtained during a proven LA-MRSA outbreak were nearly indistinguishable showing that transmission of LA-MRSA can be detected by whole genome mapping. Finally, whole genome maps of LA-MRSA isolates originating from two unrelated veterinarians and their household members showed that veterinarians may carry and transmit different LA-MRSA strains at the same time. No such conclusions could be drawn based spa-typing and MLVA. Although PFGE seems to be suitable for molecular typing of LA-MRSA, WGM provides a much higher discriminatory power. Furthermore, whole genome mapping can provide a comparison with other maps within 2 days after the bacterial culture is received, making it suitable to investigate transmission events and outbreaks caused by LA-MRSA.
Assuntos
Técnicas de Tipagem Bacteriana/métodos , Mapeamento Cromossômico , Genoma Bacteriano , Staphylococcus aureus Resistente à Meticilina/classificação , Staphylococcus aureus Resistente à Meticilina/genética , Infecções Estafilocócicas , Animais , Surtos de Doenças , Gado/microbiologia , Staphylococcus aureus Resistente à Meticilina/isolamento & purificação , Países Baixos , Infecções Estafilocócicas/classificação , Infecções Estafilocócicas/genética , Infecções Estafilocócicas/transmissão , Infecções Estafilocócicas/veterináriaRESUMO
In the classical approach to tree reconstruction schemes, such as pair group methods, maximum parsimony or minimum spanning trees, two major problems are not addressed at a fundamental level. First, for numerous kinds of experimental data, these methods produce equivalent solutions, but provide no way of handling those degeneracies. Second, the real-life data fed to these methods is treated as exact data, and possible measurement errors cannot be taken into account. We provide a statistical solution for both the degeneracy and data imperfection problem, which is built as a framework around the clustering method. It is therefore independent of the particular choice of clustering or population modeling algorithm and is applicable to any of the presently known methods that are subject to one or both of these problems.
Assuntos
Algoritmos , Análise por Conglomerados , Modelos Estatísticos , FilogeniaRESUMO
At present, there is much variability between MALDI-TOF MS methodology for the characterization of bacteria through differences in e.g., sample preparation methods, matrix solutions, organic solvents, acquisition methods and data analysis methods. After evaluation of the existing methods, a standard protocol was developed to generate MALDI-TOF mass spectra obtained from a collection of reference strains belonging to the genera Leuconostoc, Fructobacillus and Lactococcus. Bacterial cells were harvested after 24h of growth at 28°C on the media MRS or TSA. Mass spectra were generated, using the CHCA matrix combined with a 50:48:2 acetonitrile:water:trifluoroacetic acid matrix solution, and analyzed by the cell smear method and the cell extract method. After a data preprocessing step, the resulting high quality data set was used for PCA, distance calculation and multi-dimensional scaling. Using these analyses, species-specific information in the MALDI-TOF mass spectra could be demonstrated. As a next step, the spectra, as well as the binary character set derived from these spectra, were successfully used for species identification within the genera Leuconostoc, Fructobacillus, and Lactococcus. Using MALDI-TOF MS identification libraries for Leuconostoc and Fructobacillus strains, 84% of the MALDI-TOF mass spectra were correctly identified at the species level. Similarly, the same analysis strategy within the genus Lactococcus resulted in 94% correct identifications, taking species and subspecies levels into consideration. Finally, two machine learning techniques were evaluated as alternative species identification tools. The two techniques, support vector machines and random forests, resulted in accuracies between 94% and 98% for the identification of Leuconostoc and Fructobacillus species, respectively.
Assuntos
Inteligência Artificial , Lactococcus/química , Lactococcus/classificação , Leuconostocaceae/química , Leuconostocaceae/classificação , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Estatística como Assunto/métodosRESUMO
The PulseNet USA subtyping network recently established a standardized protocol for multiple-locus variable-number tandem repeat analysis (MLVA) to characterize Shiga toxin-producing Escherichia coli O157. To enable data comparisons from different laboratories in the same database, reproducibility and high quality of the data must be ensured. The aim of this study was to test the robustness and reproducibility of the proposed standardized protocol by subjecting it to a multilaboratory validation process and to address any discrepancies that may have arisen from the study. A set of 50 strains was tested in 10 PulseNet participating laboratories that used capillary electrophoresis instruments from two manufacturers. Six out of the 10 laboratories were able to generate correct MLVA types for 46 (92%) or more strains. The discrepancies in MLVA type assignment were caused mainly by difficulties in optimizing polymerase chain reactions that were attributed to technical inexperience of the staff and suboptimal quality of reagents and instrumentation. It was concluded that proper training of staff must be an integral part of technology transfer. The interlaboratory reproducibility of fragment sizing was excellent when the same capillary electrophoresis platform was used. However, sizing discrepancies of up to six base pairs for the same fragment were detected between the two platforms. These discrepancies were attributed to different dye and polymer chemistries employed by the manufacturers. A novel software script was developed to assign alleles based on two platform-specific (Beckman Coulter CEQ8000 and Applied Biosystems Genetic Analyzer 3130xl) look-up tables containing fragment size ranges for all alleles. The new allele assignment method was validated at the PulseNet central laboratory using a diverse set of 502 Shiga toxin-producing Escherichia coli O157 isolates. The validation confirmed that the script reliably assigned the same allele for the same fragment regardless of the platform used to size the fragment.
Assuntos
DNA Bacteriano/análise , Eletroforese Capilar/normas , Infecções por Escherichia coli/microbiologia , Escherichia coli O157/classificação , Escherichia coli O157/genética , Alelos , Sequência de Bases , Fragmentação do DNA , Eletroforese Capilar/instrumentação , Eletroforese Capilar/métodos , Escherichia coli O157/metabolismo , Microbiologia de Alimentos , Humanos , Laboratórios/normas , Filogenia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Toxinas Shiga/biossíntese , Sequências de Repetição em TandemRESUMO
Recently, there has been an increase in The Netherlands in the number of cases of invasive disease caused by Haemophilus influenzae serotype b (Hib). To study a possible change in the Hib population that could explain the rise in incidence, a multiple-locus variable number tandem repeats analysis (MLVA) was developed to genotype H. influenzae isolates. The MLVA enabled the differentiation of H. influenzae serotype b strains with higher discriminatory power than multilocus sequence typing (MLST). MLVA profiles of noncapsulated H. influenzae and H. influenzae serotype f strains were more heterogeneous than serotype b strains and were distinct from Hib, although some overlap occurred. The MLVA was used to genotype a collection of 520 H. influenzae serotype b strains isolated from patients in The Netherlands with invasive disease. The strains were collected from 1983 from 2002, covering a time period of 10 years before and 9 years after the introduction of the Hib vaccine in the Dutch national vaccination program. MLVA revealed a sharp increase in genetic diversity of Hib strains isolated from neonates to 4-year-old patients after 1993, when the Hib vaccine was introduced. Hib strains isolated from patients older than 4 years in age were genetically diverse, and no significant change in diversity was seen after the introduction of the vaccine. These observations suggest that after the introduction of the Hib vaccine young children no longer constitute the reservoir for Hib and that they are infected by adults carrying genetically diverse Hib strains.
Assuntos
Variação Genética , Infecções por Haemophilus/prevenção & controle , Vacinas Anti-Haemophilus/administração & dosagem , Haemophilus influenzae tipo b/classificação , Polissacarídeos Bacterianos/administração & dosagem , Adulto , Cápsulas Bacterianas , Pré-Escolar , Genótipo , Infecções por Haemophilus/microbiologia , Haemophilus influenzae tipo b/genética , Haemophilus influenzae tipo b/imunologia , Humanos , Programas de Imunização , Lactente , Recém-Nascido , Repetições Minissatélites/genética , Programas Nacionais de Saúde , Países Baixos , Análise de Sequência de DNA , Sorotipagem , VacinaçãoRESUMO
The PulseNet National Database, established by the Centers for Disease Control and Prevention in 1996, consists of pulsed-field gel electrophoresis (PFGE) patterns obtained from isolates of food-borne pathogens (currently Escherichia coli O157:H7, Salmonella, Shigella, and Listeria) and textual information about the isolates. Electronic images and accompanying text are submitted from over 60 U.S. public health and food regulatory agency laboratories. The PFGE patterns are generated according to highly standardized PFGE protocols. Normalization and accurate comparison of gel images require the use of a well-characterized size standard in at least three lanes of each gel. Originally, a well-characterized strain of each organism was chosen as the reference standard for that particular database. The increasing number of databases, difficulty in identifying an organism-specific standard for each database, the increased range of band sizes generated by the use of additional restriction endonucleases, and the maintenance of many different organism-specific strains encouraged us to search for a more versatile and universal DNA size marker. A Salmonella serotype Braenderup strain (H9812) was chosen as the universal size standard. This strain was subjected to rigorous testing in our laboratories to ensure that it met the desired criteria, including coverage of a wide range of DNA fragment sizes, even distribution of bands, and stability of the PFGE pattern. The strategy used to convert and compare data generated by the new and old reference standards is described.
Assuntos
Bases de Dados como Assunto , Eletroforese em Gel de Campo Pulsado/normas , Eletroforese em Gel de Campo Pulsado/métodos , Escherichia coli O157/genética , Listeria monocytogenes/genética , Padrões de Referência , Salmonella/genética , SorotipagemRESUMO
Bordetella pertussis, the causative agent of whooping cough, has remained endemic in The Netherlands despite extensive nationwide vaccination since 1953. In the 1990s, several epidemic periods have resulted in many cases of pertussis. We have proposed that strain variation has played a major role in the upsurges of this disease in The Netherlands. Therefore, molecular characterization of strains is important in identifying the causes of pertussis epidemiology. For this reason, we have developed a multiple-locus variable-number tandem repeat analysis (MLVA) typing system for B. pertussis. By combining the MLVA profile with the allelic profile based on multiple-antigen sequence typing, we were able to further differentiate strains. The relationships between the various genotypes were visualized by constructing a minimum spanning tree. MLVA of Dutch strains of B. pertussis revealed that the genotypes of the strains isolated in the prevaccination period were diverse and clearly distinct from the strains isolated in the 1990s. Furthermore, there was a decrease in diversity in the strains from the late 1990s, with a remarkable clonal expansion that coincided with the epidemic periods. Using this genotyping, we have been able to show that B. pertussis is much more dynamic than expected.