Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38747283

RESUMO

The analysis and comparison of gene neighborhoods is a powerful approach for exploring microbial genome structure, function, and evolution. Although numerous tools exist for genome visualization and comparison, genome exploration across large genomic databases or user-generated datasets remains a challenge. Here, we introduce AnnoView, a web server designed for interactive exploration of gene neighborhoods across the bacterial and archaeal tree of life. Our server offers users the ability to identify, compare, and visualize gene neighborhoods of interest from 30 238 bacterial genomes and 1672 archaeal genomes, through integration with the comprehensive Genome Taxonomy Database and AnnoTree databases. Identified gene neighborhoods can be visualized using pre-computed functional annotations from different sources such as KEGG, Pfam and TIGRFAM, or clustered based on similarity. Alternatively, users can upload and explore their own custom genomic datasets in GBK, GFF or CSV format, or use AnnoView as a genome browser for relatively small genomes (e.g. viruses and plasmids). Ultimately, we anticipate that AnnoView will catalyze biological discovery by enabling user-friendly search, comparison, and visualization of genomic data. AnnoView is available at http://annoview.uwaterloo.ca.


Assuntos
Software , Bases de Dados Genéticas , Genoma Bacteriano , Genoma Arqueal , Genômica/métodos , Archaea/genética , Genes Microbianos/genética , Biologia Computacional/métodos , Bactérias/genética , Bactérias/classificação
2.
Nucleic Acids Res ; 52(3): 1064-1079, 2024 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-38038264

RESUMO

mRNA translation is a fundamental process for life. Selection of the translation initiation site (TIS) is crucial, as it establishes the correct open reading frame for mRNA decoding. Studies in vertebrate mRNAs discovered that a purine at -3 and a G at +4 (where A of the AUG initiator codon is numbered + 1), promote TIS recognition. However, the TIS context in other eukaryotes has been poorly experimentally analyzed. We analyzed in vitro the influence of the -3, -2, -1 and + 4 positions of the TIS context in rabbit, Drosophila, wheat, and yeast. We observed that -3A conferred the best translational efficiency across these species. However, we found variability at the + 4 position for optimal translation. In addition, the Kozak motif that was defined from mammalian cells was only weakly predictive for wheat and essentially non-predictive for yeast. We discovered eight conserved sequences that significantly disfavored translation. Due to the big differences in translational efficiency observed among weak TIS context sequences, we define a novel category that we termed 'barren AUG context sequences (BACS)', which represent sequences disfavoring translation. Analysis of mRNA-ribosomal complexes structures provided insights into the function of BACS. The gene ontology of the BACS-containing mRNAs is presented.


Assuntos
Códon de Iniciação , Sequência Conservada , Biossíntese de Proteínas , Animais , Coelhos , Códon de Iniciação/genética , Mamíferos/genética , Iniciação Traducional da Cadeia Peptídica , RNA Mensageiro/metabolismo , Leveduras , Eucariotos/genética , Eucariotos/metabolismo
3.
Nucleic Acids Res ; 49(D1): D461-D467, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33170213

RESUMO

The Transporter Classification Database (TCDB; tcdb.org) is a freely accessible reference resource, which provides functional, structural, mechanistic, medical and biotechnological information about transporters from organisms of all types. TCDB is the only transport protein classification database adopted by the International Union of Biochemistry and Molecular Biology (IUBMB) and now (October 1, 2020) consists of 20 653 proteins classified in 15 528 non-redundant transport systems with 1567 tabulated 3D structures, 18 336 reference citations describing 1536 transporter families, of which 26% are members of 82 recognized superfamilies. Overall, this is an increase of over 50% since the last published update of the database in 2016. This comprehensive update of the database contents and features include (i) adoption of a chemical ontology for substrates of transporters, (ii) inclusion of new superfamilies, (iii) a domain-based characterization of transporter families for the identification of new members as well as functional and evolutionary relationships between families, (iv) development of novel software to facilitate curation and use of the database, (v) addition of new subclasses of transport systems including 11 novel types of channels and 3 types of group translocators and (vi) the inclusion of many man-made (artificial) transmembrane pores/channels and carriers.


Assuntos
Bases de Dados de Proteínas , Proteínas de Membrana Transportadoras/química , Metagenômica , Domínios Proteicos , Software , Especificidade por Substrato
4.
BMC Genomics ; 22(1): 663, 2021 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-34521345

RESUMO

BACKGROUND: A substantial fraction of genes identified within bacterial genomes encode proteins of unknown function. Identifying which of these proteins represent potential virulence factors, and mapping their key virulence determinants, is a challenging but important goal. RESULTS: To facilitate virulence factor discovery, we performed a comprehensive analysis of 17,929 protein domain families within the Pfam database, and scored them based on their overrepresentation in pathogenic versus non-pathogenic species, taxonomic distribution, relative abundance in metagenomic datasets, and other factors. CONCLUSIONS: We identify pathogen-associated domain families, candidate virulence factors in the human gut, and eukaryotic-like mimicry domains with likely roles in virulence. Furthermore, we provide an interactive database called PathFams to allow users to explore pathogen-associated domains as well as identify pathogen-associated domains and domain architectures in user-uploaded sequences of interest. PathFams is freely available at https://pathfams.uwaterloo.ca .


Assuntos
Metagenômica , Fatores de Virulência , Genoma Bacteriano , Humanos , Metagenoma , Domínios Proteicos , Fatores de Virulência/genética
5.
BMC Genomics ; 21(1): 741, 2020 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-33099302

RESUMO

BACKGROUND: Finding orthologs remains an important bottleneck in comparative genomics analyses. While the authors of software for the quick comparison of protein sequences evaluate the speed of their software and compare their results against the most usual software for the task, it is not common for them to evaluate their software for more particular uses, such as finding orthologs as reciprocal best hits (RBH). Here we compared RBH results obtained using software that runs faster than blastp. Namely, lastal, diamond, and MMseqs2. RESULTS: We found that lastal required the least time to produce results. However, it yielded fewer results than any other program when comparing the proteins encoded by evolutionarily distant genomes. The program producing the most similar number of RBH to blastp was diamond ran with the "ultra-sensitive" option. However, this option was diamond's slowest, with the "very-sensitive" option offering the best balance between speed and RBH results. The speeding up of the programs was much more evident when dealing with eukaryotic genomes, which code for more numerous proteins. For example, lastal took a median of approx. 1.5% of the blastp time to run with bacterial proteomes and 0.6% with eukaryotic ones, while diamond with the very-sensitive option took 7.4% and 5.2%, respectively. Though estimated error rates were very similar among the RBH obtained with all programs, RBH obtained with MMseqs2 had the lowest error rates among the programs tested. CONCLUSIONS: The fast algorithms for pairwise protein comparison produced results very similar to blast in a fraction of the time, with diamond offering the best compromise in speed, sensitivity and quality, as long as a sensitivity option, other than the default, was chosen.


Assuntos
Diamante , Software , Algoritmos , Sequência de Aminoácidos , Genoma
6.
BMC Evol Biol ; 18(1): 148, 2018 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-30285626

RESUMO

BACKGROUND: Members of the Bacillus genus have been isolated from a variety of environments. However, the relationship between potential metabolism and the niche from which bacteria of this genus have been isolated has not been extensively studied. The existence of a monophyletic aquatic Bacillus group, composed of members isolated from both marine and fresh water has been proposed. Here, we present a phylogenetic/phylogenomic analysis to investigate the potential relationship between the environment from which group members have been isolated and their evolutionary origin. We also carried out hierarchical clustering based on functional content to test for potential environmental effects on the genetic content of these bacteria. RESULTS: The phylogenetic reconstruction showed that Bacillus strains classified as aquatic have evolutionary origins in different lineages. Although we observed the presence of a clade consisting exclusively of aquatic Bacillus, it is not comprised of the same strains previously reported. In contrast to phylogeny, clustering based on the functional categories of the encoded proteomes resulted in groups more compatible with the environments from which the organisms were isolated. This evidence suggests a detectable environmental influence on bacterial genetic content, despite their different evolutionary origins. CONCLUSION: Our results suggest that aquatic Bacillus species have polyphyletic origins, but exhibit convergence at the gene content level.


Assuntos
Bacillus/classificação , Bacillus/genética , Meio Ambiente , Genes Bacterianos , Análise por Conglomerados , Evolução Molecular , Genômica , Filogenia
7.
Nucleic Acids Res ; 44(D1): D372-9, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26546518

RESUMO

The Transporter Classification Database (TCDB; http://www.tcdb.org) is a freely accessible reference database for transport protein research, which provides structural, functional, mechanistic, evolutionary and disease/medical information about transporters from organisms of all types. TCDB is the only transport protein classification database adopted by the International Union of Biochemistry and Molecular Biology (IUBMB). It consists of more than 10,000 non-redundant transport systems with more than 11 000 reference citations, classified into over 1000 transporter families. Transporters in TCDB can be single or multi-component systems, categorized in a functional/phylogenetic hierarchical system of classes, subclasses, families, subfamilies and transport systems. TCDB also includes updated software designed to analyze the distinctive features of transport proteins, extending its usefulness. Here we present a comprehensive update of the database contents and features and summarize recent discoveries recorded in TCDB.


Assuntos
Bases de Dados de Proteínas , Proteínas de Membrana Transportadoras/classificação , Proteínas de Membrana Transportadoras/química , Proteínas de Membrana Transportadoras/metabolismo , Análise de Sequência de Proteína
8.
PLoS Genet ; 10(2): e1004120, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24586182

RESUMO

Large-scale proteomic analyses in Escherichia coli have documented the composition and physical relationships of multiprotein complexes, but not their functional organization into biological pathways and processes. Conversely, genetic interaction (GI) screens can provide insights into the biological role(s) of individual gene and higher order associations. Combining the information from both approaches should elucidate how complexes and pathways intersect functionally at a systems level. However, such integrative analysis has been hindered due to the lack of relevant GI data. Here we present a systematic, unbiased, and quantitative synthetic genetic array screen in E. coli describing the genetic dependencies and functional cross-talk among over 600,000 digenic mutant combinations. Combining this epistasis information with putative functional modules derived from previous proteomic data and genomic context-based methods revealed unexpected associations, including new components required for the biogenesis of iron-sulphur and ribosome integrity, and the interplay between molecular chaperones and proteases. We find that functionally-linked genes co-conserved among γ-proteobacteria are far more likely to have correlated GI profiles than genes with divergent patterns of evolution. Overall, examining bacterial GIs in the context of protein complexes provides avenues for a deeper mechanistic understanding of core microbial systems.


Assuntos
Epistasia Genética , Escherichia coli/genética , Complexos Multiproteicos/genética , Proteômica , Citoplasma/metabolismo , Genoma Bacteriano , Humanos , Chaperonas Moleculares/genética , Chaperonas Moleculares/metabolismo , Complexos Multiproteicos/metabolismo , Mutação , Análise de Sequência com Séries de Oligonucleotídeos , Mapas de Interação de Proteínas
9.
Bioinformatics ; 29(7): 947-9, 2013 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-23396122

RESUMO

MOTIVATION: Analyses in comparative genomics often require non-redundant genome datasets. Eliminating redundancy is not as simple as keeping one strain for each named species because genomes might be redundant at a higher taxonomic level than that of species for some analyses; some strains with different species names can be as similar as most strains sharing a species name, whereas some strains sharing a species name can be so different that they should be put into different groups; and some genomes lack a species name. RESULTS: We have implemented a method and Web server that clusters a genome dataset into groups of redundant genomes at different thresholds based on a few phylogenomic distance measures. AVAILABILITY: The Web interface, similarity and distance data and R-scripts can be accessed at http://microbiome.wlu.ca/research/redundancy/.


Assuntos
Genômica/métodos , Filogenia , Genoma , Internet , Software
10.
Appl Environ Microbiol ; 80(18): 5717-22, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25002428

RESUMO

Massively parallel sequencing of 16S rRNA genes enables the comparison of terrestrial, aquatic, and host-associated microbial communities with sufficient sequencing depth for robust assessments of both alpha and beta diversity. Establishing standardized protocols for the analysis of microbial communities is dependent on increasing the reproducibility of PCR-based molecular surveys by minimizing sources of methodological bias. In this study, we tested the effects of template concentration, pooling of PCR amplicons, and sample preparation/interlane sequencing on the reproducibility associated with paired-end Illumina sequencing of bacterial 16S rRNA genes. Using DNA extracts from soil and fecal samples as templates, we sequenced pooled amplicons and individual reactions for both high (5- to 10-ng) and low (0.1-ng) template concentrations. In addition, all experimental manipulations were repeated on two separate days and sequenced on two different Illumina MiSeq lanes. Although within-sample sequence profiles were highly consistent, template concentration had a significant impact on sample profile variability for most samples. Pooling of multiple PCR amplicons, sample preparation, and interlane variability did not influence sample sequence data significantly. This systematic analysis underlines the importance of optimizing template concentration in order to minimize variability in microbial-community surveys and indicates that the practice of pooling multiple PCR amplicons prior to sequencing contributes proportionally less to reducing bias in 16S rRNA gene surveys with next-generation sequencing.


Assuntos
Bactérias/classificação , Bactérias/genética , Erros de Diagnóstico , Sequenciamento de Nucleotídeos em Larga Escala/normas , RNA Ribossômico 16S/genética , Viés , DNA Ribossômico/química , DNA Ribossômico/genética , Fezes/microbiologia , Genes de RNAr , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Reprodutibilidade dos Testes , Microbiologia do Solo
11.
Nucleic Acids Res ; 40(15): 7104-12, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22618875

RESUMO

Using profiles of phylogenetic profiles (P-cubic) we compared the evolutionary dynamics of different kinds of functional associations. Ordered from most to least evolutionarily stable, these associations were genes in the same operons, genes whose products participate in the same biochemical pathway, genes coding for physically interacting proteins and genes in the same regulons. Regulons showed the most plastic functional interactions with evolutionary stabilities barely better than those of unrelated genes. Further regulon analyses showed that global regulators contain less evolutionarily stable associations than local regulators. Genes co-repressed by global regulators had a higher evolutionary conservation than genes co-activated by global regulators. However, the reverse was true for genes co-repressed and co-activated by local regulators. Of all the regulon-related associations, the relationship between regulators and their target genes showed the most evolutionary stability. Different negative data sets built to contrast against each of the analysed kinds of modules also differed in evolutionary conservation revealing further underlying genome organization. Applying P-cubic analyses to other genomes might help visualize genome organization, understand the evolutionary importance and plasticity of functional associations and compare the quality of data sets expected to reflect functional interactions, such as those coming from high-throughput experiments.


Assuntos
Escherichia coli K12/genética , Evolução Molecular , Regulon , Proteínas de Escherichia coli/genética , Genes Bacterianos , Óperon , Filogenia , Transcrição Gênica
12.
PLoS Genet ; 7(11): e1002377, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22125496

RESUMO

As the interface between a microbe and its environment, the bacterial cell envelope has broad biological and clinical significance. While numerous biosynthesis genes and pathways have been identified and studied in isolation, how these intersect functionally to ensure envelope integrity during adaptive responses to environmental challenge remains unclear. To this end, we performed high-density synthetic genetic screens to generate quantitative functional association maps encompassing virtually the entire cell envelope biosynthetic machinery of Escherichia coli under both auxotrophic (rich medium) and prototrophic (minimal medium) culture conditions. The differential patterns of genetic interactions detected among > 235,000 digenic mutant combinations tested reveal unexpected condition-specific functional crosstalk and genetic backup mechanisms that ensure stress-resistant envelope assembly and maintenance. These networks also provide insights into the global systems connectivity and dynamic functional reorganization of a universal bacterial structure that is both broadly conserved among eubacteria (including pathogens) and an important target.


Assuntos
Membrana Celular/genética , Epistasia Genética/genética , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Membrana/genética , Proteínas Associadas aos Microtúbulos/genética , Meios de Cultura , Resistência a Medicamentos/genética , Escherichia coli/crescimento & desenvolvimento , Regulação Bacteriana da Expressão Gênica , Interação Gene-Ambiente , Proteínas de Membrana/metabolismo , Redes e Vias Metabólicas/genética , Microscopia Eletrônica , Proteínas Associadas aos Microtúbulos/metabolismo , Anotação de Sequência Molecular , Análise de Sequência com Séries de Oligonucleotídeos
13.
Sci Rep ; 14(1): 9155, 2024 04 21.
Artigo em Inglês | MEDLINE | ID: mdl-38644393

RESUMO

Deep learning models (DLMs) have gained importance in predicting, detecting, translating, and classifying a diversity of inputs. In bioinformatics, DLMs have been used to predict protein structures, transcription factor-binding sites, and promoters. In this work, we propose a hybrid model to identify transcription factors (TFs) among prokaryotic and eukaryotic protein sequences, named Deep Regulation (DeepReg) model. Two architectures were used in the DL model: a convolutional neural network (CNN), and a bidirectional long-short-term memory (BiLSTM). DeepReg reached a precision of 0.99, a recall of 0.97, and an F1-score of 0.98. The quality of our predictions, the bias-variance trade-off approach, and the characterization of new TF predictions were evaluated and compared against those produced by DeepTFactor, as well as against experimental data from three model organisms. Predictions based on our DLM tended to exhibit less variance and bias than those from DeepTFactor, thus increasing reliability and decreasing overfitting.


Assuntos
Aprendizado Profundo , Fatores de Transcrição , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Biologia Computacional/métodos , Células Procarióticas/metabolismo , Redes Neurais de Computação , Eucariotos/genética , Genoma , Células Eucarióticas/metabolismo , Sítios de Ligação
14.
Nucleic Acids Res ; 39(5): 1732-8, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21051341

RESUMO

Almost 50 years following the discovery of the prokaryotic operon, the functional relevance of gene order within operons remains unclear. In this work, we take advantage of the eroded genome of Mycobacterium leprae to add evidence supporting the notion that functionally less important genes have a tendency to be located at the end of its operons. M. leprae's genome includes 1133 pseudogenes and 1614 protein-coding genes and can be compared with the close genome of M. tuberculosis. Assuming M. leprae's pseudogenes to represent dispensable genes, we have studied the position of these pseudogenes in the operons of M. leprae and of their orthologs in M. tuberculosis. We observed that both tend to be located in the 3' (downstream) half of the operon (P-values of 0.03 and 0.18, respectively). Analysis of pseudogenes in all available prokaryotic genomes confirms this trend (P-value of 7.1 × 10(-7)). In a complementary analysis, we found a significant tendency for essential genes to be located at the 5' (upstream) half of the operon (P-value of 0.006). Our work provides an indication that, in prokarya, functionally less important genes have a tendency to be located at the end of operons, while more relevant genes tend to be located toward operon starts.


Assuntos
Mycobacterium leprae/genética , Óperon , Pseudogenes , Ordem dos Genes , Genes Bacterianos , Genômica
15.
PLoS One ; 18(9): e0291492, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37708115

RESUMO

Average Nucleotide Identity (ANI) is becoming a standard measure for bacterial species delimitation. However, its calculation can take orders of magnitude longer than similarity estimates based on sampling of short nucleotides, compiled into so-called sketches. These estimates are widely used. However, their variable correlation with ANI has suggested that they might not be as accurate. For a where-the-rubber-meets-the-road assessment, we compared two sketching programs, mash and dashing, against ANI, in delimiting species among Esterobacterales genomes. Receiver Operating Characteristic (ROC) analysis found Area Under the Curve (AUC) values of 0.99, almost perfect species discrimination for all three measures. Subsampling to avoid over-represented species reduced these AUC values to 0.92, still highly accurate. Focused tests with ten genera, each represented by more than three species, also showed almost identical results for all methods. Shigella showed the lowest AUC values (0.68), followed by Citrobacter (0.80). All other genera, Dickeya, Enterobacter, Escherichia, Klebsiella, Pectobacterium, Proteus, Providencia and Yersinia, produced AUC values above 0.90. The species delimitation thresholds varied, with species distance ranges in a few genera overlapping the genus ranges of other genera. Mash was able to separate the E. coli + Shigella complex into 25 apparent phylogroups, four of them corresponding, roughly, to the four Shigella species represented in the data. Our results suggest that fast estimates of genome similarity are as good as ANI for species delimitation. Therefore, these estimates might suffice for covering the role of genomic similarity in bacterial taxonomy, and should increase confidence in their use for efficient bacterial identification and clustering, from epidemiological to genome-based detection of potential contaminants in farming and industry settings.


Assuntos
Escherichia coli , Gammaproteobacteria , Animais , Dickeya , Genômica , Agricultura
16.
Heliyon ; 9(3): e13955, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36938451

RESUMO

Cytokinin is a major phytohormone that has been used in agriculture as a plant-growth stimulating compound since its initial discovery in the 1960s. Isopentenyl transferase (IPT) is a rate-limiting enzyme for cytokinin biosynthesis, which is produced by plants as well as bacteria including both plant pathogenic species and plant growth-promoting bacteria (PGPB). It has been hypothesized that there may be differences in IPT function between plant pathogens and PGPB. However, a comprehensive comparison of IPT genes between plant pathogenic and PGPB species has not been performed. Here, we performed a global comparison of IPT genes across bacteria, analyzing their DNA sequences, codon usage, phyletic distribution, promoter structure and genomic context. We found that adenylate type IPT genes are highly specific to plant-associated bacteria and subdivide into two major clades: clade A, largely composed of proteobacterial plant pathogens; and clade B, largely composed of actinomycete PGPB species. Besides these phylogenetic differences, we identified several genomic features that suggest differences in IPT regulation between pathogens and PGPB. Pathogen-associated IPTs tended to occur in predicted virulence loci, whereas PGPB-associated IPTs tended to co-occur with other genes involved in cytokinin metabolism and degradation. Pathogen-associated IPTs also showed elevated gene copy numbers, significant deviation in codon usage patterns, and extended promoters, suggesting differences in regulation and activity levels. Our results are consistent with the hypothesis that differences in IPT regulation and activity exist between plant pathogens and PGPB, which determine their effect on plant host phenotypes through the control of cytokinin levels.

17.
Microb Physiol ; 33(1): 49-62, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37321192

RESUMO

Members of the Piezo family of mechanically activated cation channels are involved in multiple physiological processes in higher eukaryotes, including vascular development, cell differentiation, touch perception, hearing, and more, but they are also common in single-celled eukaryotic microorganisms. Mutations in these proteins in humans are associated with a variety of diseases, such as colorectal adenomatous polyposis, dehydrated hereditary stomatocytosis, and hereditary xerocytosis. Available 3D structures for Piezo proteins show nine regions of four transmembrane segments each that have the same fold. Despite the remarkable similarity among the nine characteristic structural repeats in the family, no significant sequence similarity among them has been reported. Using bioinformatics approaches and the Transporter Classification Database (TCDB) as reference, we reliably identified sequence similarity among repeats based on four lines of evidence: (1) hidden Markov model-profile similarities across repeats at the family level, (2) pairwise sequence similarities between different repeats across Piezo homologs, (3) Piezo-specific conserved sequence signatures that consistently identify the same regions across repeats, and (4) conserved residues that maintain the same orientation and location in 3D space.


Assuntos
Toxinas Bacterianas , Clostridioides difficile , Humanos , Clostridioides difficile/metabolismo , Canais Iônicos/genética , Canais Iônicos/química , Canais Iônicos/metabolismo , Mutação , Sequência Conservada
18.
PLoS Biol ; 7(4): e96, 2009 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-19402753

RESUMO

One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a "systems-wide" functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.


Assuntos
Proteínas de Escherichia coli/genética , Escherichia coli/genética , Genoma Bacteriano , Proteoma/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/metabolismo , Complexos Multiproteicos/genética , Mapeamento de Interação de Proteínas/métodos
19.
PeerJ ; 10: e13784, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35891643

RESUMO

Bacteria of the genus Klebsiella are among the most important multi-drug resistant human pathogens, though they have been isolated from a variety of environments. The importance and ubiquity of these organisms call for quick and accurate methods for their classification. Average Nucleotide Identity (ANI) is becoming a standard for species delimitation based on whole genome sequence comparison. However, much faster genome comparison tools have been appearing in the literature. In this study we tested the quality of different approaches for genome-based species delineation against ANI. To this end, we compared 1,189 Klebsiella genomes using measures calculated with Mash, Dashing, and DNA compositional signatures, all of which run in a fraction of the time required to obtain ANI. Receiver Operating Characteristic (ROC) curve analyses showed equal quality in species discrimination for ANI, Mash and Dashing, with Area Under the Curve (AUC) values above 0.99, followed by DNA signatures (AUC: 0.96). Accordingly, groups obtained at optimized cutoffs largely agree with species designation, with ANI, Mash and Dashing producing 15 species-level groups. DNA signatures broke the dataset into more than 30 groups. Testing Mash to map species after adding draft genomes to the dataset also showed excellent results (AUC above 0.99), producing a total of 26 Klebsiella species-level groups. The ecological niches of Klebsiella strains were found to neither be related to species delimitation, nor to protein functional content, suggesting that a single Klebsiella species can have a wide repertoire of ecological functions.


Assuntos
Genoma Bacteriano , Klebsiella , Humanos , Klebsiella/genética , Genoma Bacteriano/genética , Bactérias , DNA
20.
PeerJ ; 10: e13843, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36065404

RESUMO

Orthologs separate after lineages split from each other and paralogs after gene duplications. Thus, orthologs are expected to remain more functionally coherent across lineages, while paralogs have been proposed as a source of new functions. Because protein functional divergence follows from non-synonymous substitutions, we performed an analysis based on the ratio of non-synonymous to synonymous substitutions (dN/dS), as proxy for functional divergence. We used five working definitions of orthology, including reciprocal best hits (RBH), among other definitions based on network analyses and clustering. The results showed that orthologs, by all definitions tested, had values of dN/dS noticeably lower than those of paralogs, suggesting that orthologs generally tend to be more functionally stable than paralogs. The differences in dN/dS ratios remained suggesting the functional stability of orthologs after eliminating gene comparisons with potential problems, such as genes with high codon usage biases, low coverage of either of the aligned sequences, or sequences with very high similarities. Separation by percent identity of the encoded proteins showed that the differences between the dN/dS ratios of orthologs and paralogs were more evident at high sequence identity, less so as identity dropped. The last results suggest that the differences between dN/dS ratios were partially related to differences in protein identity. However, they also suggested that paralogs undergo functional divergence relatively early after duplication. Our analyses indicate that choosing orthologs as probably functionally coherent remains the right approach in comparative genomics.


Assuntos
Genômica , Proteínas , Genômica/métodos , Duplicação Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA