Pesquisa | Portal Regional da BVS

1.

Predicting Protein-Protein Interactions via Random Ferns with Evolutionary Matrix Representation.

Li, Yang; Wang, Zheng; You, Zhu-Hong; Li, Li-Ping; Hu, Xuegang.

Comput Math Methods Med ; 2022: 7191684, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35242211

RESUMO

Protein-protein interactions (PPIs) play a crucial role in understanding disease pathogenesis, genetic mechanisms, guiding drug design, and other biochemical processes, thus, the identification of PPIs is of great importance. With the rapid development of high-throughput sequencing technology, a large amount of PPIs sequence data has been accumulated. Researchers have designed many experimental methods to detect PPIs by using these sequence data, hence, the prediction of PPIs has become a research hotspot in proteomics. However, since traditional experimental methods are both time-consuming and costly, it is difficult to analyze and predict the massive amount of PPI data quickly and accurately. To address these issues, many computational systems employing machine learning knowledge were widely applied to PPIs prediction, thereby improving the overall recognition rate. In this paper, a novel and efficient computational technology is presented to implement a protein interaction prediction system using only protein sequence information. First, the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST) was employed to generate a position-specific scoring matrix (PSSM) containing protein evolutionary information from the initial protein sequence. Second, we used a novel data processing feature representation scheme, MatFLDA, to extract the essential information of PSSM for protein sequences and obtained five training and five testing datasets by adopting a five-fold cross-validation method. Finally, the random fern (RFs) classifier was employed to infer the interactions among proteins, and a model called MatFLDA_RFs was developed. The proposed MatFLDA_RFs model achieved good prediction performance with 95.03% average accuracy on Yeast dataset and 85.35% average accuracy on H. pylori dataset, which effectively outperformed other existing computational methods. The experimental results indicate that the proposed method is capable of yielding better prediction results of PPIs, which provides an effective tool for the detection of new PPIs and the in-depth study of proteomics. Finally, we also developed a web server for the proposed model to predict protein-protein interactions, which is freely accessible online at http://120.77.11.78:5001/webserver/MatFLDA_RFs.

Assuntos

Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genética , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Análise Discriminante , Evolução Molecular , Helicobacter pylori/genética , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Aprendizado de Máquina , Matrizes de Pontuação de Posição Específica , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Máquina de Vetores de Suporte

2.

Identification of Helicobacter pylori Membrane Proteins Using Sequence-Based Features.

Liu, Mujiexin; Chen, Hui; Gao, Dong; Ma, Cai-Yi; Zhang, Zhao-Yue.

Comput Math Methods Med ; 2022: 7493834, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35069791

RESUMO

Helicobacter pylori (H. pylori) is the most common risk factor for gastric cancer worldwide. The membrane proteins of the H. pylori are involved in bacterial adherence and play a vital role in the field of drug discovery. Thus, an accurate and cost-effective computational model is needed to predict the uncharacterized membrane proteins of H. pylori. In this study, a reliable benchmark dataset consisted of 114 membrane and 219 nonmembrane proteins was constructed based on UniProt. A support vector machine- (SVM-) based model was developed for discriminating H. pylori membrane proteins from nonmembrane proteins by using sequence information. Cross-validation showed that our method achieved good performance with an accuracy of 91.29%. It is anticipated that the proposed model will be useful for the annotation of H. pylori membrane proteins and the development of new anti-H. pylori agents.

Assuntos

Proteínas de Bactérias/genética , Helicobacter pylori/genética , Proteínas de Membrana/genética , Sequência de Aminoácidos , Aminoácidos/análise , Proteínas de Bactérias/química , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Helicobacter pylori/química , Helicobacter pylori/patogenicidade , Interações entre Hospedeiro e Microrganismos , Humanos , Proteínas de Membrana/química , Máquina de Vetores de Suporte

3.

A paradigm shift in structural biology.

Subramaniam, Sriram; Kleywegt, Gerard J.

Nat Methods ; 19(1): 20-23, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-35017736

Assuntos

Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Modelos Moleculares , Proteínas/química , Proteínas de Ciclo Celular/química , Proteínas de Ciclo Celular/metabolismo , Microscopia Crioeletrônica , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Ligantes , Conformação Proteica , Mapas de Interação de Proteínas , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo

4.

MemDis: Predicting Disordered Regions in Transmembrane Proteins.

Dobson, Laszlo; Tusnády, Gábor E.

Int J Mol Sci ; 22(22)2021 Nov 12.

Artigo em Inglês | MEDLINE | ID: mdl-34830151

RESUMO

Transmembrane proteins (TMPs) play important roles in cells, ranging from transport processes and cell adhesion to communication. Many of these functions are mediated by intrinsically disordered regions (IDRs), flexible protein segments without a well-defined structure. Although a variety of prediction methods are available for predicting IDRs, their accuracy is very limited on TMPs due to their special physico-chemical properties. We prepared a dataset containing membrane proteins exclusively, using X-ray crystallography data. MemDis is a novel prediction method, utilizing convolutional neural network and long short-term memory networks for predicting disordered regions in TMPs. In addition to attributes commonly used in IDR predictors, we defined several TMP specific features to enhance the accuracy of our method further. MemDis achieved the highest prediction accuracy on TMP-specific dataset among other popular IDR prediction methods.

Assuntos

Biologia Computacional/métodos , Proteínas Intrinsicamente Desordenadas/química , Proteínas de Membrana/química , Redes Neurais de Computação , Sequência de Aminoácidos , Mineração de Dados/métodos , Bases de Dados de Proteínas/estatística & dados numéricos , Internet , Modelos Moleculares , Conformação Proteica , Reprodutibilidade dos Testes

5.

GPRuler: Metabolic gene-protein-reaction rules automatic reconstruction.

Di Filippo, Marzia; Damiani, Chiara; Pescini, Dario.

PLoS Comput Biol ; 17(11): e1009550, 2021 11.

Artigo em Inglês | MEDLINE | ID: mdl-34748537

RESUMO

Metabolic network models are increasingly being used in health care and industry. As a consequence, many tools have been released to automate their reconstruction process de novo. In order to enable gene deletion simulations and integration of gene expression data, these networks must include gene-protein-reaction (GPR) rules, which describe with a Boolean logic relationships between the gene products (e.g., enzyme isoforms or subunits) associated with the catalysis of a given reaction. Nevertheless, the reconstruction of GPRs still remains a largely manual and time consuming process. Aiming at fully automating the reconstruction process of GPRs for any organism, we propose the open-source python-based framework GPRuler. By mining text and data from 9 different biological databases, GPRuler can reconstruct GPRs starting either from just the name of the target organism or from an existing metabolic model. The performance of the developed tool is evaluated at small-scale level for a manually curated metabolic model, and at genome-scale level for three metabolic models related to Homo sapiens and Saccharomyces cerevisiae organisms. By exploiting these models as benchmarks, the proposed tool shown its ability to reproduce the original GPR rules with a high level of accuracy. In all the tested scenarios, after a manual investigation of the mismatches between the rules proposed by GPRuler and the original ones, the proposed approach revealed to be in many cases more accurate than the original models. By complementing existing tools for metabolic network reconstruction with the possibility to reconstruct GPRs quickly and with a few resources, GPRuler paves the way to the study of context-specific metabolic networks, representing the active portion of the complete network in given conditions, for organisms of industrial or biomedical interest that have not been characterized metabolically yet.

Assuntos

Redes e Vias Metabólicas/genética , Modelos Biológicos , Software , Biologia Computacional , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Modelos Genéticos , Anotação de Sequência Molecular , Mapas de Interação de Proteínas/genética , Estrutura Quaternária de Proteína , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo

6.

A Deep Learning Approach for Predicting Antigenic Variation of Influenza A H3N2.

Xia, Yuan-Ling; Li, Weihua; Li, Yongping; Ji, Xing-Lai; Fu, Yun-Xin; Liu, Shu-Qun.

Comput Math Methods Med ; 2021: 9997669, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34697557

RESUMO

Modeling antigenic variation in influenza (flu) virus A H3N2 using amino acid sequences is a promising approach for improving the prediction accuracy of immune efficacy of vaccines and increasing the efficiency of vaccine screening. Antigenic drift and antigenic jump/shift, which arise from the accumulation of mutations with small or moderate effects and from a major, abrupt change with large effects on the surface antigen hemagglutinin (HA), respectively, are two types of antigenic variation that facilitate immune evasion of flu virus A and make it challenging to predict the antigenic properties of new viral strains. Despite considerable progress in modeling antigenic variation based on the amino acid sequences, few studies focus on the deep learning framework which could be most suitable to be applied to this task. Here, we propose a novel deep learning approach that incorporates a convolutional neural network (CNN) and bidirectional long-short-term memory (BLSTM) neural network to predict antigenic variation. In this approach, CNN extracts the complex local contexts of amino acids while the BLSTM neural network captures the long-distance sequence information. When compared to the existing methods, our deep learning approach achieves the overall highest prediction performance on the validation dataset, and more encouragingly, it achieves prediction agreements of 99.20% and 96.46% for the strains in the forthcoming year and in the next two years included in an existing set of chronological amino acid sequences, respectively. These results indicate that our deep learning approach is promising to be applied to antigenic variation prediction of flu virus A H3N2.

Assuntos

Variação Antigênica , Aprendizado Profundo , Vírus da Influenza A Subtipo H3N2/genética , Vírus da Influenza A Subtipo H3N2/imunologia , Influenza Humana/virologia , Sequência de Aminoácidos , Antígenos Virais/genética , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Glicoproteínas de Hemaglutininação de Vírus da Influenza/imunologia , Humanos , Redes Neurais de Computação

7.

iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach.

Chen, Wei; Chen, Lei; Dai, Qi.

Comput Math Methods Med ; 2021: 7681497, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34671418

RESUMO

Membrane protein is an important kind of proteins. It plays essential roles in several cellular processes. Based on the intramolecular arrangements and positions in a cell, membrane proteins can be divided into several types. It is reported that the types of a membrane protein are highly related to its functions. Determination of membrane protein types is a hot topic in recent years. A plenty of computational methods have been proposed so far. Some of them used functional domain information to encode proteins. However, this procedure was still crude. In this study, we designed a novel feature extraction scheme to obtain informative features of proteins from their functional domain information. Such scheme termed domains as words and proteins, represented by its domains, as sentences. The natural language processing approach, word2vector, was applied to access the features of domains, which were further refined to protein features. Based on these features, RAndom k-labELsets with random forest as the base classifier was employed to build the multilabel classifier, namely, iMPT-FDNPL. The tenfold cross-validation results indicated the good performance of such classifier. Furthermore, such classifier was superior to other classifiers based on features derived from functional domains via one-hot scheme or derived from other properties of proteins, suggesting the effectiveness of protein features generated by the proposed scheme.

Assuntos

Proteínas de Membrana/química , Proteínas de Membrana/classificação , Processamento de Linguagem Natural , Algoritmos , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Domínios Proteicos , Máquina de Vetores de Suporte

8.

OGT Protein Interaction Network (OGT-PIN): A Curated Database of Experimentally Identified Interaction Proteins of OGT.

Ma, Junfeng; Hou, Chunyan; Li, Yaoxiang; Chen, Shufu; Wu, Ci.

Int J Mol Sci ; 22(17)2021 Sep 06.

Artigo em Inglês | MEDLINE | ID: mdl-34502531

RESUMO

Interactions between proteins are essential to any cellular process and constitute the basis for molecular networks that determine the functional state of a cell. With the technical advances in recent years, an astonishingly high number of protein-protein interactions has been revealed. However, the interactome of O-linked N-acetylglucosamine transferase (OGT), the sole enzyme adding the O-linked ß-N-acetylglucosamine (O-GlcNAc) onto its target proteins, has been largely undefined. To that end, we collated OGT interaction proteins experimentally identified in the past several decades. Rigorous curation of datasets from public repositories and O-GlcNAc-focused publications led to the identification of up to 929 high-stringency OGT interactors from multiple species studied (including Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Arabidopsis thaliana, and others). Among them, 784 human proteins were found to be interactors of human OGT. Moreover, these proteins spanned a very diverse range of functional classes (e.g., DNA repair, RNA metabolism, translational regulation, and cell cycle), with significant enrichment in regulating transcription and (co)translation. Our dataset demonstrates that OGT is likely a hub protein in cells. A webserver OGT-Protein Interaction Network (OGT-PIN) has also been created, which is freely accessible.

Assuntos

Acetilglucosamina/metabolismo , Curadoria de Dados/métodos , Bases de Dados de Proteínas/estatística & dados numéricos , N-Acetilglucosaminiltransferases/metabolismo , Mapas de Interação de Proteínas , Processamento de Proteína Pós-Traducional , Animais , Proteínas de Arabidopsis/metabolismo , Proteínas de Drosophila/metabolismo , Humanos , Camundongos , Ratos

9.

Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics.

Gardner, Miranda L; Freitas, Michael A.

Int J Mol Sci ; 22(17)2021 Sep 06.

Artigo em Inglês | MEDLINE | ID: mdl-34502557

RESUMO

Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of "missing at random" (MAR) across batches of samples and varying rates of "missing not at random" (MNAR) depending on the inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data; (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data.

Assuntos

Confiabilidade dos Dados , Bases de Dados de Proteínas/estatística & dados numéricos , Espectrometria de Massas/métodos , Proteômica/estatística & dados numéricos , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Linhagem Celular Tumoral , Glucose/metabolismo , Humanos , Proteômica/métodos , Proteômica/normas

10.

Experimental and computational investigation of enzyme functional annotations uncovers misannotation in the EC 1.1.3.15 enzyme class.

Rembeza, Elzbieta; Engqvist, Martin K M.

PLoS Comput Biol ; 17(9): e1009446, 2021 09.

Artigo em Inglês | MEDLINE | ID: mdl-34555022

RESUMO

Only a small fraction of genes deposited to databases have been experimentally characterised. The majority of proteins have their function assigned automatically, which can result in erroneous annotations. The reliability of current annotations in public databases is largely unknown; experimental attempts to validate the accuracy within individual enzyme classes are lacking. In this study we performed an overview of functional annotations to the BRENDA enzyme database. We first applied a high-throughput experimental platform to verify functional annotations to an enzyme class of S-2-hydroxyacid oxidases (EC 1.1.3.15). We chose 122 representative sequences of the class and screened them for their predicted function. Based on the experimental results, predicted domain architecture and similarity to previously characterised S-2-hydroxyacid oxidases, we inferred that at least 78% of sequences in the enzyme class are misannotated. We experimentally confirmed four alternative activities among the misannotated sequences and showed that misannotation in the enzyme class increased over time. Finally, we performed a computational analysis of annotations to all enzyme classes in the BRENDA database, and showed that nearly 18% of all sequences are annotated to an enzyme class while sharing no similarity or domain architecture to experimentally characterised representatives. We showed that even well-studied enzyme classes of industrial relevance are affected by the problem of functional misannotation.

Assuntos

Oxirredutases do Álcool/classificação , Bases de Dados de Proteínas/estatística & dados numéricos , Anotação de Sequência Molecular/estatística & dados numéricos , Oxirredutases do Álcool/química , Oxirredutases do Álcool/genética , Animais , Biologia Computacional , Enzimas/química , Enzimas/classificação , Enzimas/genética , Humanos , Modelos Moleculares , Domínios Proteicos , Homologia de Sequência de Aminoácidos

11.

PPIDomainMiner: Inferring domain-domain interactions from multiple sources of protein-protein interactions.

Alborzi, Seyed Ziaeddin; Ahmed Nacer, Amina; Najjar, Hiba; Ritchie, David W; Devignes, Marie-Dominique.

PLoS Comput Biol ; 17(8): e1008844, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-34370723

RESUMO

Many biological processes are mediated by protein-protein interactions (PPIs). Because protein domains are the building blocks of proteins, PPIs likely rely on domain-domain interactions (DDIs). Several attempts exist to infer DDIs from PPI networks but the produced datasets are heterogeneous and sometimes not accessible, while the PPI interactome data keeps growing. We describe a new computational approach called "PPIDM" (Protein-Protein Interactions Domain Miner) for inferring DDIs using multiple sources of PPIs. The approach is an extension of our previously described "CODAC" (Computational Discovery of Direct Associations using Common neighbors) method for inferring new edges in a tripartite graph. The PPIDM method has been applied to seven widely used PPI resources, using as "Gold-Standard" a set of DDIs extracted from 3D structural databases. Overall, PPIDM has produced a dataset of 84,552 non-redundant DDIs. Statistical significance (p-value) is calculated for each source of PPI and used to classify the PPIDM DDIs in Gold (9,175 DDIs), Silver (24,934 DDIs) and Bronze (50,443 DDIs) categories. Dataset comparison reveals that PPIDM has inferred from the 2017 releases of PPI sources about 46% of the DDIs present in the 2020 release of the 3did database, not counting the DDIs present in the Gold-Standard. The PPIDM dataset contains 10,229 DDIs that are consistent with more than 13,300 PPIs extracted from the IMEx database, and nearly 23,300 DDIs (27.5%) that are consistent with more than 214,000 human PPIs extracted from the STRING database. Examples of newly inferred DDIs covering more than 10 PPIs in the IMEx database are provided. Further exploitation of the PPIDM DDI reservoir includes the inventory of possible partners of a protein of interest and characterization of protein interactions at the domain level in combination with other methods. The result is publicly available at http://ppidm.loria.fr/.

Assuntos

Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Mapas de Interação de Proteínas , Algoritmos , Biologia Computacional , Mineração de Dados/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Software

12.

Accurate Identification of Antioxidant Proteins Based on a Combination of Machine Learning Techniques and Hidden Markov Model Profiles.

Shen, Zhehan; Liu, Taigang; Xu, Ting.

Comput Math Methods Med ; 2021: 5770981, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34413898

RESUMO

Antioxidant proteins (AOPs) play important roles in the management and prevention of several human diseases due to their ability to neutralize excess free radicals. However, the identification of AOPs by using wet-lab experimental techniques is often time-consuming and expensive. In this study, we proposed an accurate computational model, called AOP-HMM, to predict AOPs by extracting discriminatory evolutionary features from hidden Markov model (HMM) profiles. First, auto cross-covariance (ACC) variables were applied to transform the HMM profiles into fixed-length feature vectors. Then, we performed the analysis of variance (ANOVA) method to reduce the dimensionality of the raw feature space. Finally, a support vector machine (SVM) classifier was adopted to conduct the prediction of AOPs. To comprehensively evaluate the performance of the proposed AOP-HMM model, the 10-fold cross-validation (CV), the jackknife CV, and the independent test were carried out on two widely used benchmark datasets. The experimental results demonstrated that AOP-HMM outperformed most of the existing methods and could be used to quickly annotate AOPs and guide the experimental process.

Assuntos

Antioxidantes/química , Aprendizado de Máquina , Peroxirredoxinas/química , Proteínas/química , Algoritmos , Aminoácidos/análise , Antioxidantes/classificação , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Evolução Molecular , Humanos , Cadeias de Markov , Peroxirredoxinas/classificação , Proteínas/classificação

13.

Critical assessment of coiled-coil predictions based on protein structure data.

Simm, Dominic; Hatje, Klas; Waack, Stephan; Kollmar, Martin.

Sci Rep ; 11(1): 12439, 2021 06 14.

Artigo em Inglês | MEDLINE | ID: mdl-34127723

RESUMO

Coiled-coil regions were among the first protein motifs described structurally and theoretically. The simplicity of the motif promises that coiled-coil regions can be detected with reasonable accuracy and precision in any protein sequence. Here, we re-evaluated the most commonly used coiled-coil prediction tools with respect to the most comprehensive reference data set available, the entire Protein Data Bank, down to each amino acid and its secondary structure. Apart from the 30-fold difference in minimum and maximum number of coiled coils predicted the tools strongly vary in where they predict coiled-coil regions. Accordingly, there is a high number of false predictions and missed, true coiled-coil regions. The evaluation of the binary classification metrics in comparison with naïve coin-flip models and the calculation of the Matthews correlation coefficient, the most reliable performance metric for imbalanced data sets, suggests that the tested tools' performance is close to random. This implicates that the tools' predictions have only limited informative value. Coiled-coil predictions are often used to interpret biochemical data and are part of in-silico functional genome annotation. Our results indicate that these predictions should be treated very cautiously and need to be supported and validated by experimental evidence.

Assuntos

Motivos de Aminoácidos , Modelos Moleculares , Estrutura Secundária de Proteína , Sequência de Aminoácidos , Bases de Dados de Proteínas/estatística & dados numéricos , Software

14.

Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences.

Wang, Yaoxin; Xu, Yingjie; Yang, Zhenyu; Liu, Xiaoqing; Dai, Qi.

Comput Math Methods Med ; 2021: 5529389, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34055035

RESUMO

Many combinations of protein features are used to improve protein structural class prediction, but the information redundancy is often ignored. In order to select the important features with strong classification ability, we proposed a recursive feature selection with random forest to improve protein structural class prediction. We evaluated the proposed method with four experiments and compared it with the available competing prediction methods. The results indicate that the proposed feature selection method effectively improves the efficiency of protein structural class prediction. Only less than 5% features are used, but the prediction accuracy is improved by 4.6-13.3%. We further compared different protein features and found that the predicted secondary structural features achieve the best performance. This understanding can be used to design more powerful prediction methods for the protein structural class.

Assuntos

Proteínas/química , Proteínas/classificação , Algoritmos , Sequência de Aminoácidos , Aminoácidos/química , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Interações Hidrofóbicas e Hidrofílicas , Conformação Proteica , Elementos Estruturais de Proteínas , Estrutura Secundária de Proteína , Homologia de Sequência de Aminoácidos , Máquina de Vetores de Suporte

15.

Structure-based protein function prediction using graph convolutional networks.

Gligorijevic, Vladimir; Renfrew, P Douglas; Kosciolek, Tomasz; Leman, Julia Koehler; Berenberg, Daniel; Vatanen, Tommi; Chandler, Chris; Taylor, Bryn C; Fisk, Ian M; Vlamakis, Hera; Xavier, Ramnik J; Knight, Rob; Cho, Kyunghyun; Bonneau, Richard.

Nat Commun ; 12(1): 3168, 2021 05 26.

Artigo em Inglês | MEDLINE | ID: mdl-34039967

RESUMO

The rapid increase in the number of proteins in sequence databases and the diversity of their functions challenge computational approaches for automated function prediction. Here, we introduce DeepFRI, a Graph Convolutional Network for predicting protein functions by leveraging sequence features extracted from a protein language model and protein structures. It outperforms current leading methods and sequence-based Convolutional Neural Networks and scales to the size of current sequence repositories. Augmenting the training set of experimental structures with homology models allows us to significantly expand the number of predictable functions. DeepFRI has significant de-noising capability, with only a minor drop in performance when experimental structures are replaced by protein models. Class activation mapping allows function predictions at an unprecedented resolution, allowing site-specific annotations at the residue-level in an automated manner. We show the utility and high performance of our method by annotating structures from the PDB and SWISS-MODEL, making several new confident function predictions. DeepFRI is available as a webserver at https://beta.deepfri.flatironinstitute.org/ .

Assuntos

Biologia Computacional/métodos , Aprendizado Profundo , Modelos Biológicos , Estrutura Terciária de Proteína , Proteínas/fisiologia , Sequência de Aminoácidos , Bases de Dados de Proteínas/estatística & dados numéricos , Conjuntos de Dados como Assunto , Modelos Moleculares , Proteínas/ultraestrutura , Relação Estrutura-Atividade

16.

Harnessing machine learning to guide phylogenetic-tree search algorithms.

Azouri, Dana; Abadi, Shiran; Mansour, Yishay; Mayrose, Itay; Pupko, Tal.

Nat Commun ; 12(1): 1983, 2021 03 31.

Artigo em Inglês | MEDLINE | ID: mdl-33790270

RESUMO

Inferring a phylogenetic tree is a fundamental challenge in evolutionary studies. Current paradigms for phylogenetic tree reconstruction rely on performing costly likelihood optimizations. With the aim of making tree inference feasible for problems involving more than a handful of sequences, inference under the maximum-likelihood paradigm integrates heuristic approaches to evaluate only a subset of all potential trees. Consequently, existing methods suffer from the known tradeoff between accuracy and running time. In this proof-of-concept study, we train a machine-learning algorithm over an extensive cohort of empirical data to predict the neighboring trees that increase the likelihood, without actually computing their likelihood. This provides means to safely discard a large set of the search space, thus potentially accelerating heuristic tree searches without losing accuracy. Our analyses suggest that machine learning can guide tree-search methodologies towards the most promising candidate trees.

Assuntos

Algoritmos , Evolução Molecular , Aprendizado de Máquina , Filogenia , Animais , Bases de Dados Genéticas/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Modelos Genéticos

17.

CoronaPep: An Anti-Coronavirus Peptide Generation Tool.

Kaushik, Aman Chandra; Mehmood, Aamir; Selvaraj, Gurudeeban; Dai, Xiaofeng; Pan, Yi; Wei, Dong-Qing.

IEEE/ACM Trans Comput Biol Bioinform ; 18(4): 1299-1304, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33687847

RESUMO

The novel coronavirus (COVID-19) infections have adopted the shape of a global pandemic now, demanding an urgent vaccine design. The current work reports contriving an anti-coronavirus peptide scanner tool to discern anti-coronavirus targets in the embodiment of peptides. The proffered CoronaPep tool features the fast fingerprinting of the anti-coronavirus target serving supreme prominence in the current bioinformatics research. The anti-coronavirus target protein sequences reported from the current outbreak are scanned against the anti-coronavirus target data-sets via CORONAPEP which provides precision-based anti-coronavirus peptides. This tool is specifically for the coronavirus data, which can predict peptides from the whole genome, or a gene or protein's list. Besides it is relatively fast, accurate, userfriendly and can generate maximum output from the limited information. The availability of tools like CORONAPEP will immeasurably perquisite researchers in the discipline of oncology and structure-based drug design.

Assuntos

Tratamento Farmacológico da COVID-19 , COVID-19/virologia , SARS-CoV-2/química , SARS-CoV-2/efeitos dos fármacos , Software , Proteínas Virais/química , Proteínas Virais/efeitos dos fármacos , Antivirais/farmacologia , COVID-19/prevenção & controle , Vacinas contra COVID-19/química , Vacinas contra COVID-19/genética , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Desenho de Fármacos , Genoma Viral , Interações entre Hospedeiro e Microrganismos/efeitos dos fármacos , Humanos , Pandemias , Peptídeos/química , Peptídeos/efeitos dos fármacos , Peptídeos/genética , SARS-CoV-2/genética , Proteínas Virais/genética

18.

Open Science Resources for the Mass Spectrometry-Based Analysis of SARS-CoV-2.

Bittremieux, Wout; Adams, Charlotte; Laukens, Kris; Dorrestein, Pieter C; Bandeira, Nuno.

J Proteome Res ; 20(3): 1464-1475, 2021 03 05.

Artigo em Inglês | MEDLINE | ID: mdl-33605735

RESUMO

The SARS-CoV-2 virus is the causative agent of the 2020 pandemic leading to the COVID-19 respiratory disease. With many scientific and humanitarian efforts ongoing to develop diagnostic tests, vaccines, and treatments for COVID-19, and to prevent the spread of SARS-CoV-2, mass spectrometry research, including proteomics, is playing a role in determining the biology of this viral infection. Proteomics studies are starting to lead to an understanding of the roles of viral and host proteins during SARS-CoV-2 infection, their protein-protein interactions, and post-translational modifications. This is beginning to provide insights into potential therapeutic targets or diagnostic strategies that can be used to reduce the long-term burden of the pandemic. However, the extraordinary situation caused by the global pandemic is also highlighting the need to improve mass spectrometry data and workflow sharing. We therefore describe freely available data and computational resources that can facilitate and assist the mass spectrometry-based analysis of SARS-CoV-2. We exemplify this by reanalyzing a virus-host interactome data set to detect protein-protein interactions and identify host proteins that could potentially be used as targets for drug repurposing.

Assuntos

COVID-19/virologia , Disseminação de Informação/métodos , Espectrometria de Massas/métodos , SARS-CoV-2/química , COVID-19/epidemiologia , Teste para COVID-19/métodos , Teste para COVID-19/estatística & dados numéricos , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Reposicionamento de Medicamentos , Interações entre Hospedeiro e Microrganismos/fisiologia , Humanos , Espectrometria de Massas/estatística & dados numéricos , Pandemias , Domínios e Motivos de Interação entre Proteínas , Mapas de Interação de Proteínas , Processamento de Proteína Pós-Traducional , Proteômica/métodos , Proteômica/estatística & dados numéricos , SARS-CoV-2/patogenicidade , SARS-CoV-2/fisiologia , Proteínas Virais/química , Proteínas Virais/fisiologia , Tratamento Farmacológico da COVID-19

19.

IDPology of the living cell: intrinsic disorder in the subcellular compartments of the human cell.

Zhao, Bi; Katuwawala, Akila; Uversky, Vladimir N; Kurgan, Lukasz.

Cell Mol Life Sci ; 78(5): 2371-2385, 2021 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-32997198

RESUMO

Intrinsic disorder can be found in all proteomes of all kingdoms of life and in viruses, being particularly prevalent in the eukaryotes. We conduct a comprehensive analysis of the intrinsic disorder in the human proteins while mapping them into 24 compartments of the human cell. In agreement with previous studies, we show that human proteins are significantly enriched in disorder relative to a generic protein set that represents the protein universe. In fact, the fraction of proteins with long disordered regions and the average protein-level disorder content in the human proteome are about 3 times higher than in the protein universe. Furthermore, levels of intrinsic disorder in the majority of human subcellular compartments significantly exceed the average disorder content in the protein universe. Relative to the overall amount of disorder in the human proteome, proteins localized in the nucleus and cytoskeleton have significantly increased amounts of disorder, measured by both high disorder content and presence of multiple long intrinsically disordered regions. We empirically demonstrate that, on average, human proteins are assigned to 2.3 subcellular compartments, with proteins localized to few subcellular compartments being more disordered than the proteins that are localized to many compartments. Functionally, the disordered proteins localized in the most disorder-enriched subcellular compartments are primarily responsible for interactions with nucleic acids and protein partners. This is the first-time disorder is comprehensively mapped into the human cell. Our observations add a missing piece to the puzzle of functional disorder and its organization inside the cell.

Assuntos

Compartimento Celular , Células Eucarióticas/metabolismo , Espaço Intracelular/metabolismo , Proteínas Intrinsicamente Desordenadas/metabolismo , Proteoma/metabolismo , Núcleo Celular/metabolismo , Citoesqueleto/metabolismo , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Proteínas Intrinsicamente Desordenadas/classificação , Modelos Biológicos , Proteoma/classificação

20.

ADPriboDB 2.0: an updated database of ADP-ribosylated proteins.

Ayyappan, Vinay; Wat, Ricky; Barber, Calvin; Vivelo, Christina A; Gauch, Kathryn; Visanpattanasin, Pat; Cook, Garth; Sazeides, Christos; Leung, Anthony K L.

Nucleic Acids Res ; 49(D1): D261-D265, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33137182

RESUMO

ADP-ribosylation is a protein modification responsible for biological processes such as DNA repair, RNA regulation, cell cycle and biomolecular condensate formation. Dysregulation of ADP-ribosylation is implicated in cancer, neurodegeneration and viral infection. We developed ADPriboDB (adpribodb.leunglab.org) to facilitate studies in uncovering insights into the mechanisms and biological significance of ADP-ribosylation. ADPriboDB 2.0 serves as a one-stop repository comprising 48 346 entries and 9097 ADP-ribosylated proteins, of which 6708 were newly identified since the original database release. In this updated version, we provide information regarding the sites of ADP-ribosylation in 32 946 entries. The wealth of information allows us to interrogate existing databases or newly available data. For example, we found that ADP-ribosylated substrates are significantly associated with the recently identified human protein interaction networks associated with SARS-CoV-2, which encodes a conserved protein domain called macrodomain that binds and removes ADP-ribosylation. In addition, we create a new interactive tool to visualize the local context of ADP-ribosylation, such as structural and functional features as well as other post-translational modifications (e.g. phosphorylation, methylation and ubiquitination). This information provides opportunities to explore the biology of ADP-ribosylation and generate new hypotheses for experimental testing.

Assuntos

Adenosina Difosfato Ribose/metabolismo , Biologia Computacional/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Proteínas/metabolismo , ADP-Ribosilação , Sítios de Ligação , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Biologia Computacional/métodos , Humanos , Domínios Proteicos , Processamento de Proteína Pós-Traducional , Proteínas/química , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Proteínas Virais/química , Proteínas Virais/metabolismo

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA