RESUMO
This paper discusses the properties of proteins and their relations in the interactomes of the selected subsets of SARS-CoV-2 proteome-the membrane protein, nonstructural proteins, and, finally, full proteome. Protein disorder according to several measures, liquid-liquid phase separation probabilities, and protein node degrees in the interaction networks were singled out as the features of interest. Additionally, viral interactomes were combined with the interactome of human lung tissue so as to examine if the new connections in the resulting viral-host interactome are linked to protein disorder. Correlation analysis shows that there is no clear relationship between raw features of interest, whereas there is a positive correlation between the protein disorder and its neighborhood mean disorder. There are also indications that highly connected viral hubs tend to be on average more ordered than proteins with a small number of connections. This is in contrast to previous similar studies conducted on eukaryotic interactomes and possibly raises new questions in research on viral interactomes.
RESUMO
BACKGROUND: In the last decade and a half it has been firmly established that a large number of proteins do not adopt a well-defined (ordered) structure under physiological conditions. Such intrinsically disordered proteins (IDPs) and intrinsically disordered (protein) regions (IDRs) are involved in essential cell processes through two basic mechanisms: the entropic chain mechanism which is responsible for rapid fluctuations among many alternative conformations, and molecular recognition via short recognition elements that bind to other molecules. IDPs possess a high adaptive potential and there is special interest in investigating their involvement in organism evolution. RESULTS: We analyzed 2554 Bacterial and 139 Archaeal proteomes, with a total of 8,455,194 proteins for disorder content and its implications for adaptation of organisms, using three disorder predictors and three measures. Along with other findings, we revealed that for all three predictors and all three measures (1) Bacteria exhibit significantly more disorder than Archaea; (2) plasmid-encoded proteins contain considerably more IDRs than proteins encoded on chromosomes (or whole genomes) in both prokaryote superkingdoms; (3) plasmid proteins are significantly more disordered than chromosomal proteins only in the group of proteins with no COG category assigned; (4) antitoxin proteins in comparison to other proteins, are the most disordered (almost double) in both Bacterial and Archaeal proteomes; (5) plasmidal proteins are more disordered than chromosomal proteins in Bacterial antitoxins and toxin-unclassified proteins, but have almost the same disorder content in toxin proteins. CONCLUSION: Our results suggest that while disorder content depends on genome and proteome characteristics, it is more influenced by functional engagements than by gene location (on chromosome or plasmid).
Assuntos
Archaea/genética , Proteínas Arqueais/química , Bactérias/genética , Proteínas de Bactérias/química , Proteínas Intrinsicamente Desordenadas/química , Plasmídeos/metabolismo , Cromossomos de Archaea/metabolismo , Cromossomos Bacterianos/metabolismo , Proteoma/metabolismo , Toxinas Biológicas/químicaRESUMO
BACKGROUND: A significant number of proteins have been shown to be intrinsically disordered, meaning that they lack a fixed 3 D structure or contain regions that do not posses a well defined 3 D structure. It has also been proven that a protein's disorder content is related to its function. We have performed an exhaustive analysis and comparison of the disorder content of proteins from prokaryotic organisms (i.e., superkingdoms Archaea and Bacteria) with respect to functional categories they belong to, i.e., Clusters of Orthologous Groups of proteins (COGs) and groups of COGs-Cellular processes (Cp), Information storage and processing (Isp), Metabolism (Me) and Poorly characterized (Pc). We also analyzed the disorder content of proteins with respect to various genomic, metabolic and ecological characteristics of the organism they belong to. We used correlations and association rule mining in order to identify the most confident associations between specific modalities of the characteristics considered and disorder content. RESULTS: Bacteria are shown to have a somewhat higher level of protein disorder than archaea, except for proteins in the Me functional group. It is demonstrated that the Isp and Cp functional groups in particular (L-repair function and N-cell motility and secretion COGs of proteins in specific) possess the highest disorder content, while Me proteins, in general, posses the lowest. Disorder fractions have been confirmed to have the lowest level for the so-called order-promoting amino acids and the highest level for the so-called disorder promoters. For each pair of organism characteristics, specific modalities are identified with the maximum disorder proteins in the corresponding organisms, e.g., high genome size-high GC content organisms, facultative anaerobic-low GC content organisms, aerobic-high genome size organisms, etc. Maximum disorder in archaea is observed for high GC content-low genome size organisms, high GC content-facultative anaerobic or aquatic or mesophilic organisms, etc. Maximum disorder in bacteria is observed for high GC content-high genome size organisms, high genome size-aerobic organisms, etc. Some of the most reliable association rules mined establish relationships between high GC content and high protein disorder, medium GC content and both medium and low protein disorder, anaerobic organisms and medium protein disorder, Gammaproteobacteria and low protein disorder, etc. A web site Prokaryote Disorder Database has been designed and implemented at the address http://bioinfo.matf.bg.ac.rs/disorder, which contains complete results of the analysis of protein disorder performed for 296 prokaryotic completely sequenced genomes. CONCLUSIONS: Exhaustive disorder analysis has been performed by functional classes of proteins, for a larger dataset of prokaryotic organisms than previously done. Results obtained are well correlated to those previously published, with some extension in the range of disorder level and clear distinction between functional classes of proteins. Wide correlation and association analysis between protein disorder and genomic and ecological characteristics has been performed for the first time. The results obtained give insight into multi-relationships among the characteristics and protein disorder. Such analysis provides for better understanding of the evolutionary process and may be useful for taxon determination. The main drawback of the approach is the fact that the disorder considered has been predicted and not experimentally established.
Assuntos
Proteínas Arqueais/análise , Proteínas de Bactérias/análise , Biologia Computacional/métodos , Aminoácidos/análise , Archaea/genética , Archaea/metabolismo , Proteínas Arqueais/química , Bactérias/genética , Bactérias/metabolismo , Proteínas de Bactérias/química , Composição de Bases , Análise por Conglomerados , Bases de Dados de Proteínas , Genômica/métodos , Internet , Conformação Proteica , Proteoma/análiseRESUMO
The correlation of molecular function and protein intrinsic disorder is an important aspect of understanding the relationship between function, sequence and structure. This research was inspired by statistical correlation evaluation method described by Xie et al. (J Proteome Res 6 (2007) 1882-1898, reference study), where the authors analyzed the relationship between structure and function of proteins from Swiss-Prot database and where these functions were described with Swiss-Prot function keywords. In this research, we investigated whether the conclusions from the reference study stand for another dataset with richer functional annotation. We used CAFA3 challenge training dataset where the function was described with terms from Gene Ontology (GO terms). In order to compare the results with the previous work, we associated the GO terms with the corresponding Swiss-Prot function keywords. The results were compared with the reference study by first repeating the analysis with Swiss-Prot function keywords and then by GO terms. We used PONDR VSL2b disorder predictor to label over 66,000 CAFA3 proteins as putatively disordered or ordered. Out of 186 Swiss-Prot keywords (belonging to molecular function type) with more than 20 annotated proteins, we found 47 to be highly order related and 44 highly disorder related. Using the same dataset and annotation constraints, out of 1781 GO term (belonging to molecular function type), we found 746 to be highly order related and 564 highly disorder related. GO term results are presented as interactive graphs displaying complex hierarchical structure of Gene Ontology. Comparison of two functional annotations, GO and Swiss-Prot keywords, showed consistent results in cases when it was possible to map a Swiss-Prot keyword to a corresponding GO term. Because of the small number of such cases, we propose a new method for deriving the missing mappings between Swiss-Prot keywords and GO terms with the highest likelihood by measuring similarity (Jaccard index) between sets of protein annotated by different functions. Comparison with results from the reference study revealed prevalence of binding related functions (disorder related) in the current dataset even though the same functions were not present in previous results.
Assuntos
Biologia Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Conformação Proteica , Desdobramento de Proteína , Análise de Sequência de ProteínaRESUMO
A bioinformatics analysis of disorder content of proteins from the DisProt database has been performed with respect to position of disordered residues. Each protein chain was divided into three parts: N- and C- terminal parts with each containing 30 amino acid (AA) residues and the middle region containing the remaining AA residues. The results show that in terminal parts, the percentage of disordered AA residues is higher than that of all AA residues (17% of disordered AA residues and 11% of all). We analyzed the percentage of disorder for each of 20 AA residues in the three parts of proteins with respect to their hydropathy and molecular weight. For each AA, the percentage of disorder in the middle part is lower than that in terminal parts which is comparable at the two termini. A new scale of AAs has been introduced according to their disorder content in the middle part of proteins: CIFWMLYHRNVTAGQDSKEP. All big hydrophobic AAs are less frequently disordered, while almost all small hydrophilic AAs are more frequently disordered. The results obtained may be useful for construction and improving predictors for protein disorder.