RESUMO
The Uppsala University Chlamydia trachomatis multilocus sequence type (MLST) database (http://mlstdb.bmc.uu.se) is based on five target regions (non-housekeeping genes) and the ompA gene. Each target has various numbers of alleles-hctB, 89; CT058, 51; CT144, 30; CT172, 38; and pbpB, 35-derived from 13 studies. Our aims were to perform an overall analysis of all C. trachomatis MLST sequence types (STs) in the database, examine STs with global spread, and evaluate the phylogenetic capability by using the five targets. A total of 415 STs were recognized from 2,089 specimens. The addition of 49 ompA gene variants created 459 profiles. ST variation and their geographical distribution were characterized using eBURST and minimum spanning tree analyses. There were 609 samples from men having sex with men (MSM), with 4 predominating STs detected in this group, comprising 63% of MSM cases. Four other STs predominated among 1,383 heterosexual cases comprising, 31% of this group. The diversity index in ocular trachoma cases was significantly lower than in sexually transmitted chlamydia infections. Predominating STs were identified in 12 available C. trachomatis whole genomes which were compared to 22 C. trachomatis full genomes without predominating STs. No specific gene in the 12 genomes with predominating STs could be linked to successful spread of certain STs. Phylogenetic analysis showed that MLST targets provide a tree similar to trees based on whole-genome analysis. The presented MLST scheme identified C. trachomatis strains with global spread. It provides a tool for epidemiological investigations and is useful for phylogenetic analyses.
Assuntos
Infecções por Chlamydia/microbiologia , Chlamydia trachomatis/classificação , Chlamydia trachomatis/genética , Variação Genética , Genótipo , Tipagem de Sequências Multilocus , Chlamydia trachomatis/isolamento & purificação , Análise por Conglomerados , Feminino , Saúde Global , Humanos , Masculino , FilogeografiaRESUMO
BACKGROUND: Pneumococcal serotypes are represented by a varying number of clonal lineages with different genetic contents, potentially affecting invasiveness. However, genetic variation within the same genetic lineage may be larger than anticipated. METHODS: A total of 715 invasive and carriage isolates from children in the same region and during the same period were compared using pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing. Bacterial genome sequencing, functional assays, and in vivo virulence mice studies were performed. RESULTS: Clonal types of the same serotype but also intraclonal variants within clonal complexes (CCs) showed differences in invasive-disease potential. CC138, a common CC, was divided into several PFGE patterns, partly explained by number, location, and type of temperate bacteriophages. Whole-genome sequencing of 4 CC138 isolates representing PFGE clones with different invasive-disease potentials revealed intraclonal sequence variations of the virulence-associated proteins pneumococcal surface protein A (PspA) and pneumococcal choline-binding protein C (PspC). A carrier isolate lacking PcpA exhibited decreased virulence in mice, and there was a differential binding of human factor H, depending on invasiveness. CONCLUSIONS: Pneumococcal clonal types but also intraclonal variants exhibited different invasive-disease potentials in children. Intraclonal variants, reflecting different prophage contents, showed differences in major surface antigens. This suggests ongoing immune selection, such as that due to PspC-mediated complement resistance through varied human factor H binding, that may affect invasiveness in children.
Assuntos
Variação Genética , Infecções Pneumocócicas/epidemiologia , Infecções Pneumocócicas/patologia , Streptococcus pneumoniae/classificação , Streptococcus pneumoniae/genética , Adolescente , Animais , Antígenos de Bactérias/análise , Portador Sadio/epidemiologia , Portador Sadio/microbiologia , Criança , Pré-Escolar , Modelos Animais de Doenças , Eletroforese em Gel de Campo Pulsado , Feminino , Genoma Bacteriano , Genótipo , Humanos , Lactente , Masculino , Proteínas de Membrana/análise , Camundongos , Camundongos Endogâmicos C57BL , Tipagem Molecular , Infecções Pneumocócicas/microbiologia , Prófagos/genética , Análise de Sequência de DNA , Fagos de Streptococcus/genética , Streptococcus pneumoniae/isolamento & purificação , VirulênciaRESUMO
Due to their submerged and cryptic lifestyle, the vast majority of fungal species are difficult to observe and describe morphologically, and many remain known to science only from sequences detected in environmental samples. The lack of practices to delimit and name most fungal species is a staggering limitation to communication and interpretation of ecology and evolution in kingdom Fungi. Here, we use environmental sequence data as taxonomical evidence and combine phylogenetic and ecological data to generate and test species hypotheses in the class Archaeorhizomycetes (Taphrinomycotina, Ascomycota). Based on environmental amplicon sequencing from a well-studied Swedish pine forest podzol soil, we generate 68 distinct species hypotheses of Archaeorhizomycetes, of which two correspond to the only described species in the class. Nine of the species hypotheses represent 78% of the sequenced Archaeorhizomycetes community, and are supported by long read data that form the backbone for delimiting species hypothesis based on phylogenetic branch lengths.Soil fungal communities are shaped by environmental filtering and competitive exclusion so that closely related species are less likely to co-occur in a niche if adaptive traits are evolutionarily conserved. In soil profiles, distinct vertical horizons represent a testable niche dimension, and we found significantly differential distribution across samples for a well-supported pair of sister species hypotheses. Based on the combination of phylogenetic and ecological evidence, we identify two novel species for which we provide molecular diagnostics and propose names. While environmental sequences cannot be automatically translated to species, they can be used to generate phylogenetically distinct species hypotheses that can be further tested using sequences as ecological evidence. We conclude that in the case of abundantly and frequently observed species, environmental sequences can support species recognition in the absences of physical specimens, while rare taxa remain uncaptured at our sampling and sequencing intensity.
RESUMO
The ever increasing speed of DNA sequencing widens the discrepancy between the number of known gene products, and the knowledge of their function and structure. Proper annotation of protein sequences is therefore crucial if the missing information is to be deduced from sequence-based similarity comparisons. These comparisons become exceedingly difficult as the pairwise identities drop to very low values. To improve the accuracy of domain identification, we exploit the fact that the three-dimensional structures of domains are much more conserved than their sequences. Based on structure-anchored multiple sequence alignments of low identity homologues we constructed 850 structure-anchored hidden Markov models (saHMMs), each representing one domain family. Since the saHMMs are highly family specific, they can be used to assign a domain to its correct family and clearly distinguish it from domains belonging to other families, even within the same superfamily. This task is not trivial and becomes particularly difficult if the unknown domain is distantly related to the rest of the domain sequences within the family. In a search with full length protein sequences, harbouring at least one domain as defined by the structural classification of proteins database (SCOP), version 1.71, versus the saHMM database based on SCOP version 1.69, we achieve an accuracy of 99.0%. All of the few hits outside the family fall within the correct superfamily. Compared to Pfam_ls HMMs, the saHMMs obtain about 11% higher coverage. A comparison with BLAST and PSI-BLAST demonstrates that the saHMMs have consistently fewer errors per query at a given coverage. Within our recommended E-value range, the same is true for a comparison with SUPERFAMILY. Furthermore, we are able to annotate 232 proteins with 530 nonoverlapping domains belonging to 102 different domain families among human proteins labelled "unknown" in the NCBI protein database. Our results demonstrate that the saHMM database represents a versatile and reliable tool for identification of domains in protein sequences. With the aid of saHMMs, homology on the family level can be assigned, even for distantly related sequences. Due to the construction of the saHMMs, the hits they provide are always associated with high quality crystal structures. The saHMM database can be accessed via the FISH server at http://babel.ucmp.umu.se/fish/.
Assuntos
Biologia Computacional/métodos , Cadeias de Markov , Estrutura Terciária de Proteína , Bases de Dados de Proteínas , Proteínas/químicaRESUMO
Epidemiological contact tracing complemented with genotyping of clinical Mycobacterium tuberculosis isolates is important for understanding disease transmission. In Sweden, tuberculosis (TB) is mostly reported in migrant and homeless where epidemiologic contact tracing could pose a problem. This study compared epidemiologic linking with genotyping in a low burden country. Mycobacterium tuberculosis isolates (n = 93) collected at Scania University Hospital in Southern Sweden were analysed with the standard genotyping method mycobacterial interspersed repetitive units-variable number tandem repeats (MIRU-VNTR) and the results were compared with whole genome sequencing (WGS). Using a maximum of twelve single nucleotide polymorphisms (SNPs) as the upper threshold of genomic relatedness noted among hosts, we identified 18 clusters with WGS comprising 52 patients with overall pairwise genetic maximum distances ranging from zero to nine SNPs. MIRU-VNTR and WGS clustered the same isolates, although the distribution differed depending on MIRU-VNTR limitations. Both genotyping techniques identified clusters where epidemiologic linking was insufficient, although WGS had higher correlation with epidemiologic data. To summarize, WGS provided better resolution of transmission than MIRU-VNTR in a setting with low TB incidence. WGS predicted epidemiologic links better which could consolidate and correct the epidemiologically linked cases, avoiding thus false clustering.
Assuntos
Genoma Bacteriano , Repetições Minissatélites , Mycobacterium tuberculosis/genética , Tuberculose Pulmonar/epidemiologia , Tuberculose Pulmonar/transmissão , Sequenciamento Completo do Genoma , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Técnicas de Tipagem Bacteriana , Criança , Pré-Escolar , Análise por Conglomerados , Busca de Comunicante/estatística & dados numéricos , Feminino , Genômica , Humanos , Lactente , Recém-Nascido , Masculino , Pessoa de Meia-Idade , Epidemiologia Molecular , Família Multigênica , Mycobacterium tuberculosis/classificação , Mycobacterium tuberculosis/isolamento & purificação , Polimorfismo de Nucleotídeo Único , Suécia/epidemiologia , Tuberculose Pulmonar/microbiologiaRESUMO
The FISH server is highly accurate in identifying the family membership of domains in a query protein sequence, even in the case of very low sequence identities to known homologues. A performance test using SCOP sequences and an E-value cut-off of 0.1 showed that 99.3% of the top hits are to the correct family saHMM. Matches to a query sequence provide the user not only with an annotation of the identified domains and hence a hint to their function, but also with probable 2D and 3D structures, as well as with pairwise and multiple sequence alignments to homologues with low sequence identity. In addition, the FISH server allows users to upload and search their own protein sequence collection or to quarry public protein sequence data bases with individual saHMMs. The FISH server can be accessed at http://babel.ucmp.umu.se/fish/.
Assuntos
Cadeias de Markov , Estrutura Terciária de Proteína , Homologia de Sequência de Aminoácidos , Software , Bases de Dados de Proteínas , Internet , Interface Usuário-ComputadorRESUMO
Reversing the loop lengths of the small protein S6 by circular permutation has a dramatic effect on the transition state structure: it changes from globally diffuse to locally condensed. The phenomenon arises from a biased dispersion of the contact energies. Stability data derived from point mutations throughout the S6 structure show that interactions between residues that are far apart in sequence are stronger than those that are close. This entropy compensation drives all parts of the protein to fold simultaneously and produces the diffuse transition-state structure typical for two-state proteins. In the circular permutant, where strong contacts and short sequence separations are engineered to concur, the transition state becomes atypically condensed and polarized. Taken together with earlier findings that S6 may also fold by a 'collapsed' trajectory with an intermediate, the results suggest that this protein may fold by a multiplicity of mechanisms. The observations indicate that the diffuse transition state of S6 is not required for folding but could be an evolutionary development to optimize cooperativity.