Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Post-Alignment Adjustment and Its Automation.

Xia, Xuhua.

Genes (Basel) ; 12(11)2021 11 18.

Artigo em Inglês | MEDLINE | ID: mdl-34828415

RESUMO

Multiple sequence alignment (MSA) is the basis for almost all sequence comparison and molecular phylogenetic inferences. Large-scale genomic analyses are typically associated with automated progressive MSA without subsequent manual adjustment, which itself is often error-prone because of the lack of a consistent and explicit criterion. Here, I outlined several commonly encountered alignment errors that cannot be avoided by progressive MSA for nucleotide, amino acid, and codon sequences. Methods that could be automated to fix such alignment errors were then presented. I emphasized the utility of position weight matrix as a new tool for MSA refinement and illustrated its usage by refining the MSA of nucleotide and amino acid sequences. The main advantages of the position weight matrix approach include (1) its use of information from all sequences, in contrast to other commonly used methods based on pairwise alignment scores and inconsistency measures, and (2) its speedy computation, making it suitable for a large number of long viral genomic sequences.

Assuntos

Automação Laboratorial/métodos , Genômica/métodos , Alinhamento de Sequência/métodos , Algoritmos , Animais , Automação Laboratorial/normas , Genômica/normas , Humanos , Filogenia , Sensibilidade e Especificidade , Alinhamento de Sequência/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Análise de Sequência de Proteína/métodos , Análise de Sequência de Proteína/normas

2.

UbiComb: A Hybrid Deep Learning Model for Predicting Plant-Specific Protein Ubiquitylation Sites.

Siraj, Arslan; Lim, Dae Yeong; Tayara, Hilal; Chong, Kil To.

Genes (Basel) ; 12(5)2021 05 11.

Artigo em Inglês | MEDLINE | ID: mdl-34064731

RESUMO

Protein ubiquitylation is an essential post-translational modification process that performs a critical role in a wide range of biological functions, even a degenerative role in certain diseases, and is consequently used as a promising target for the treatment of various diseases. Owing to the significant role of protein ubiquitylation, these sites can be identified by enzymatic approaches, mass spectrometry analysis, and combinations of multidimensional liquid chromatography and tandem mass spectrometry. However, these large-scale experimental screening techniques are time consuming, expensive, and laborious. To overcome the drawbacks of experimental methods, machine learning and deep learning-based predictors were considered for prediction in a timely and cost-effective manner. In the literature, several computational predictors have been published across species; however, predictors are species-specific because of the unclear patterns in different species. In this study, we proposed a novel approach for predicting plant ubiquitylation sites using a hybrid deep learning model by utilizing convolutional neural network and long short-term memory. The proposed method uses the actual protein sequence and physicochemical properties as inputs to the model and provides more robust predictions. The proposed predictor achieved the best result with accuracy values of 80% and 81% and F-scores of 79% and 82% on the 10-fold cross-validation and an independent dataset, respectively. Moreover, we also compared the testing of the independent dataset with popular ubiquitylation predictors; the results demonstrate that our model significantly outperforms the other methods in prediction classification results.

Assuntos

Proteínas de Plantas/química , Análise de Sequência de Proteína/métodos , Software , Ubiquitinação , Motivos de Aminoácidos , Aprendizado Profundo , Proteínas de Plantas/metabolismo , Sensibilidade e Especificidade , Análise de Sequência de Proteína/normas

3.

Thousands of protein linear motif classes may still be undiscovered.

Bulavka, Denys; Aptekmann, Ariel A; Méndez, Nicolás A; Krick, Teresa; Sánchez, Ignacio E.

PLoS One ; 16(5): e0248841, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33939703

RESUMO

Linear motifs are short protein subsequences that mediate protein interactions. Hundreds of motif classes including thousands of motif instances are known. Our theory estimates how many motif classes remain undiscovered. As commonly done, we describe motif classes as regular expressions specifying motif length and the allowed amino acids at each motif position. We measure motif specificity for a pair of motif classes by quantifying how many motif-discriminating positions prevent a protein subsequence from matching the two classes at once. We derive theorems for the maximal number of motif classes that can simultaneously maintain a certain number of motif-discriminating positions between all pairs of classes in the motif universe, for a given amino acid alphabet. We also calculate the fraction of all protein subsequences that would belong to a motif class if all potential motif classes came into existence. Naturally occurring pairs of motif classes present most often a single motif-discriminating position. This mild specificity maximizes the potential number of coexisting motif classes, the expansion of the motif universe due to amino acid modifications and the fraction of amino acid sequences that code for a motif instance. As a result, thousands of linear motif classes may remain undiscovered.

Assuntos

Motivos de Aminoácidos , Análise de Sequência de Proteína/métodos , Humanos , Sensibilidade e Especificidade , Análise de Sequência de Proteína/normas

4.

Concerns with computational protein engineering programmes IPRO and OptMAVEn and metabolic pathway engineering programme optStoic.

Wood, Thomas K.

Open Biol ; 11(2): 200173, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-33529550

RESUMO

It has become customary in engineering to require a modelling component in research endeavours. In addition, as the code for these models becomes more byzantine in complexity, it is difficult for reviewers and readers to discern their value and understand the underlying code. This opinion piece summarizes the negative experience of the author with the IPRO and OptMAVEn computational protein engineering models as well as problems with the optStoic metabolic pathway model. In our hands, these models often fail to predict reliable ways to engineer proteins and metabolic pathways.

Assuntos

Biologia Computacional/métodos , Engenharia de Proteínas/métodos , Análise de Sequência de Proteína/métodos , Software/normas , Animais , Biologia Computacional/normas , Humanos , Redes e Vias Metabólicas , Engenharia de Proteínas/normas , Análise de Sequência de Proteína/normas

5.

Estimating the Quality of 3D Protein Models Using the ModFOLD7 Server.

Maghrabi, Ali H A; McGuffin, Liam J.

Methods Mol Biol ; 2165: 69-81, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32621219

RESUMO

Assessing the accuracy of 3D models has become a keystone in the protein structure prediction field. ModFOLD7 is our leading resource for Estimates of Model Accuracy (EMA), which has been upgraded by integrating a number of the pioneering pure-single- and quasi-single-model approaches. Such an integration has given our latest version the strengths to accurately score and rank predicted models, with higher consistency compared to older EMA methods. Additionally, the server provides three options for producing global score estimates, depending on the requirements of the user: (1) ModFOLD7_rank, which is optimized for ranking/selection, (2) ModFOLD7_cor, which is optimized for correlations of predicted and observed scores, and (3) ModFOLD7 global for balanced performance. ModFOLD7 has been ranked among the top few EMA methods according to independent blind testing by the CASP13 assessors. Another evaluation resource for ModFOLD7 is the CAMEO project, where the method is continuously automatically evaluated, showing a significant improvement compared to our previous versions. The ModFOLD7 server is freely available at http://www.reading.ac.uk/bioinf/ModFOLD/ .

Assuntos

Simulação de Dinâmica Molecular/normas , Conformação Proteica , Análise de Sequência de Proteína/normas , Software/normas

6.

Prediction of Intrinsic Disorder with Quality Assessment Using QUARTER.

Wu, Zhonghua; Hu, Gang; Oldfield, Christopher J; Kurgan, Lukasz.

Methods Mol Biol ; 2165: 83-101, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32621220

RESUMO

Intrinsically disordered regions (IDRs) are estimated to be highly abundant in nature. While only several thousand proteins are annotated with experimentally derived IDRs, computational methods can be used to predict IDRs for the millions of currently uncharacterized protein chains. Several dozen disorder predictors were developed over the last few decades. While some of these methods provide accurate predictions, unavoidably they also make some mistakes. Consequently, one of the challenges facing users of these methods is how to decide which predictions can be trusted and which are likely incorrect. This practical problem can be solved using quality assessment (QA) scores that predict correctness of the underlying (disorder) predictions at a residue level. We motivate and describe a first-of-its-kind toolbox of QA methods, QUARTER (QUality Assessment for pRotein inTrinsic disordEr pRedictions), which provides the scores for a diverse set of ten disorder predictors. QUARTER is available to the end users as a free and convenient webserver at http://biomine.cs.vcu.edu/servers/QUARTER/ . We briefly describe the predictive architecture of QUARTER and provide detailed instructions on how to use the webserver. We also explain how to interpret results produced by QUARTER with the help of a case study.

Assuntos

Proteínas Intrinsicamente Desordenadas/química , Conformação Proteica , Análise de Sequência de Proteína/métodos , Software , Análise de Sequência de Proteína/normas

7.

NLGenomeSweeper: A Tool for Genome-Wide NBS-LRR Resistance Gene Identification.

Toda, Nicholas; Rustenholz, Camille; Baud, Agnès; Le Paslier, Marie-Christine; Amselem, Joelle; Merdinoglu, Didier; Faivre-Rampant, Patricia.

Genes (Basel) ; 11(3)2020 03 20.

Artigo em Inglês | MEDLINE | ID: mdl-32245073

RESUMO

Although there are a number of bioinformatic tools to identify plant nucleotide-binding leucine-rich repeat (NLR) disease resistance genes based on conserved protein sequences, only a few of these tools have attempted to identify disease resistance genes that have not been annotated in the genome. The overall goal of the NLGenomeSweeper pipeline is to annotate NLR disease resistance genes, including RPW8, in the genome assembly with high specificity and a focus on complete functional genes. This is based on the identification of the complete NB-ARC domain, the most conserved domain of NLR genes, using the BLAST suite. In this way, the tool has a high specificity for complete genes and relatively intact pseudogenes. The tool returns all candidate NLR gene locations as well as InterProScan ORF and domain annotations for manual curation of the gene structure.

Assuntos

Genômica/métodos , Proteínas NLR/genética , Proteínas de Plantas/genética , Análise de Sequência de Proteína/métodos , Software/normas , Arabidopsis , Sequência Conservada , Resistência à Doença , Genômica/normas , Helianthus , Proteínas NLR/química , Proteínas de Plantas/química , Ligação Proteica , Domínios Proteicos , Análise de Sequência de Proteína/normas

8.

Analysis of protein missense alterations by combining sequence- and structure-based methods.

Gyulkhandanyan, Aram; Rezaie, Alireza R; Roumenina, Lubka; Lagarde, Nathalie; Fremeaux-Bacchi, Veronique; Miteva, Maria A; Villoutreix, Bruno O.

Mol Genet Genomic Med ; 8(4): e1166, 2020 04.

Artigo em Inglês | MEDLINE | ID: mdl-32096919

RESUMO

BACKGROUND: Different types of in silico approaches can be used to predict the phenotypic consequence of missense variants. Such algorithms are often categorized as sequence based or structure based, when they necessitate 3D structural information. In addition, many other in silico tools, not dedicated to the analysis of variants, can be used to gain additional insights about the possible mechanisms at play. METHODS: Here we applied different computational approaches to a set of 20 known missense variants present on different proteins (CYP, complement factor B, antithrombin and blood coagulation factor VIII). The tools that were used include fast computational approaches and web servers such as PolyPhen-2, PopMusic, DUET, MaestroWeb, SAAFEC, Missense3D, VarSite, FlexPred, PredyFlexy, Clustal Omega, meta-PPISP, FTMap, ClusPro, pyDock, PPM, RING, Cytoscape, and ChannelsDB. RESULTS: We observe some conflicting results among the methods but, most of the time, the combination of several engines helped to clarify the potential impacts of the amino acid substitutions. CONCLUSION: Combining different computational approaches including some that were not developed to investigate missense variants help to predict the possible impact of the amino acid substitutions. Yet, when the modified residues are involved in a salt-bridge, the tools tend to fail, even when the analysis is performed in 3D. Thus, interactive structural analysis with molecular graphics packages such as Chimera or PyMol or others are still needed to clarify automatic prediction.

Assuntos

Simulação de Dinâmica Molecular/normas , Mutação de Sentido Incorreto , Análise de Sequência de Proteína/métodos , Software/normas , Fatores de Coagulação Sanguínea/química , Fatores de Coagulação Sanguínea/genética , Sistema Enzimático do Citocromo P-450/química , Sistema Enzimático do Citocromo P-450/genética , Humanos , Análise de Sequência de Proteína/normas

9.

HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes.

Bolleman, Jerven; de Castro, Edouard; Baratin, Delphine; Gehant, Sebastien; Cuche, Beatrice A; Auchincloss, Andrea H; Coudert, Elisabeth; Hulo, Chantal; Masson, Patrick; Pedruzzi, Ivo; Rivoire, Catherine; Xenarios, Ioannis; Redaschi, Nicole; Bridge, Alan.

Gigascience ; 9(2)2020 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-32034905

RESUMO

BACKGROUND: Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. RESULTS: Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline. CONCLUSIONS: HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.

Assuntos

Genômica/métodos , Anotação de Sequência Molecular/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Software/normas , Animais , Genômica/normas , Humanos , Anotação de Sequência Molecular/normas , Análise de Sequência de DNA/normas , Análise de Sequência de Proteína/normas

10.

A step-by-step classification algorithm of protein secondary structures based on double-layer SVM model.

Ge, Yongzhen; Zhao, Shuo; Zhao, Xiqiang.

Genomics ; 112(2): 1941-1946, 2020 03.

Artigo em Inglês | MEDLINE | ID: mdl-31740293

RESUMO

In this paper, a step-by-step classification algorithm based on double-layer SVM model is constructed to predict the secondary structure of proteins. The most important feature of this algorithm is to improve the prediction accuracy of α+ß and α/ß classes through transforming the prediction of two classes of proteins, α+ß and α/ß classes, with low accuracy in the past, into the prediction of all-α and all-ß classes with high accuracy. A widely-used dataset, 25PDB dataset with sequence similarity lower than 40%, is used to evaluate this method. The results show that this method has good performance, and on the basis of ensuring the accuracy of other three structural classes of proteins, the accuracy of α+ß class proteins is improved significantly.

Assuntos

Análise de Sequência de Proteína/métodos , Máquina de Vetores de Suporte , Animais , Humanos , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Análise de Sequência de Proteína/normas

11.

End-to-End Differentiable Learning of Protein Structure.

AlQuraishi, Mohammed.

Cell Syst ; 8(4): 292-301.e3, 2019 04 24.

Artigo em Inglês | MEDLINE | ID: mdl-31005579

RESUMO

Predicting protein structure from sequence is a central challenge of biochemistry. Co-evolution methods show promise, but an explicit sequence-to-structure map remains elusive. Advances in deep learning that replace complex, human-designed pipelines with differentiable models optimized end to end suggest the potential benefits of similarly reformulating structure prediction. Here, we introduce an end-to-end differentiable model for protein structure learning. The model couples local and global protein structure via geometric units that optimize global geometry without violating local covalent chemistry. We test our model using two challenging tasks: predicting novel folds without co-evolutionary data and predicting known folds without structural templates. In the first task, the model achieves state-of-the-art accuracy, and in the second, it comes within 1-2 Å; competing methods using co-evolution and experimental templates have been refined over many years, and it is likely that the differentiable approach has substantial room for further improvement, with applications ranging from drug discovery to protein design.

Assuntos

Aprendizado de Máquina , Análise de Sequência de Proteína/métodos , Software , Evolução Molecular , Dobramento de Proteína , Análise de Sequência de Proteína/normas

12.

Prediction of Signal Peptides in Proteins from Malaria Parasites.

Burdukiewicz, Michal; Sobczyk, Piotr; Chilimoniuk, Jaroslaw; Gagat, Przemyslaw; Mackiewicz, Pawel.

Int J Mol Sci ; 19(12)2018 Nov 22.

Artigo em Inglês | MEDLINE | ID: mdl-30469512

RESUMO

Signal peptides are N-terminal presequences responsible for targeting proteins to the endomembrane system, and subsequent subcellular or extracellular compartments, and consequently condition their proper function. The significance of signal peptides stimulates development of new computational methods for their detection. These methods employ learning systems trained on datasets comprising signal peptides from different types of proteins and taxonomic groups. As a result, the accuracy of predictions are high in the case of signal peptides that are well-represented in databases, but might be low in other, atypical cases. Such atypical signal peptides are present in proteins found in apicomplexan parasites, causative agents of malaria and toxoplasmosis. Apicomplexan proteins have a unique amino acid composition due to their AT-biased genomes. Therefore, we designed a new, more flexible and universal probabilistic model for recognition of atypical eukaryotic signal peptides. Our approach called signalHsmm includes knowledge about the structure of signal peptides and physicochemical properties of amino acids. It is able to recognize signal peptides from the malaria parasites and related species more accurately than popular programs. Moreover, it is still universal enough to provide prediction of other signal peptides on par with the best preforming predictors.

Assuntos

Plasmodium/química , Sinais Direcionadores de Proteínas , Proteínas de Protozoários/química , Análise de Sequência de Proteína/métodos , Aminoácidos/química , Cadeias de Markov , Análise de Sequência de Proteína/normas

13.

Decision-Tree Based Meta-Strategy Improved Accuracy of Disorder Prediction and Identified Novel Disordered Residues Inside Binding Motifs.

Zhao, Bi; Xue, Bin.

Int J Mol Sci ; 19(10)2018 Oct 07.

Artigo em Inglês | MEDLINE | ID: mdl-30301243

RESUMO

Using computational techniques to identify intrinsically disordered residues is practical and effective in biological studies. Therefore, designing novel high-accuracy strategies is always preferable when existing strategies have a lot of room for improvement. Among many possibilities, a meta-strategy that integrates the results of multiple individual predictors has been broadly used to improve the overall performance of predictors. Nonetheless, a simple and direct integration of individual predictors may not effectively improve the performance. In this project, dual-threshold two-step significance voting and neural networks were used to integrate the predictive results of four individual predictors, including: DisEMBL, IUPred, VSL2, and ESpritz. The new meta-strategy has improved the prediction performance of intrinsically disordered residues significantly, compared to all four individual predictors and another four recently-designed predictors. The improvement was validated using five-fold cross-validation and in independent test datasets.

Assuntos

Proteínas Intrinsicamente Desordenadas/química , Redes Neurais de Computação , Análise de Sequência de Proteína/métodos , Humanos , Proteínas Intrinsicamente Desordenadas/metabolismo , Análise de Sequência de Proteína/normas , Software

14.

Identification of Bacteriophage Virion Proteins Using Multinomial Naïve Bayes with g-Gap Feature Tree.

Pan, Yanyuan; Gao, Hui; Lin, Hao; Liu, Zhen; Tang, Lixia; Li, Songtao.

Int J Mol Sci ; 19(6)2018 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-29914091

RESUMO

Bacteriophages, which are tremendously important to the ecology and evolution of bacteria, play a key role in the development of genetic engineering. Bacteriophage virion proteins are essential materials of the infectious viral particles and in charge of several of biological functions. The correct identification of bacteriophage virion proteins is of great importance for understanding both life at the molecular level and genetic evolution. However, few computational methods are available for identifying bacteriophage virion proteins. In this paper, we proposed a new method to predict bacteriophage virion proteins using a Multinomial Naïve Bayes classification model based on discrete feature generated from the g-gap feature tree. The accuracy of the proposed model reaches 98.37% with MCC of 96.27% in 10-fold cross-validation. This result suggests that the proposed method can be a useful approach in identifying bacteriophage virion proteins from sequence information. For the convenience of experimental scientists, a web server (PhagePred) that implements the proposed predictor is available, which can be freely accessed on the Internet.

Assuntos

Bacteriófagos/química , Análise de Sequência de Proteína/métodos , Proteínas Estruturais Virais/química , Teorema de Bayes , Análise de Sequência de Proteína/normas , Software

15.

RBLOSUM performs better than CorBLOSUM with lesser error per query.

Govindarajan, Renganayaki; Leela, Biji Christopher; Nair, Achuthsankar S.

BMC Res Notes ; 11(1): 328, 2018 May 21.

Artigo em Inglês | MEDLINE | ID: mdl-29784028

RESUMO

OBJECTIVE: BLOSUM matrices serve as standard matrices for many protein sequence alignment programs. BLOSUM matrices have been constructed using BLOCKS version5.0 with 27,102 BLOCKS, whereas the latest updated version14.3 has 6,739,916 BLOCKS. We read with interest the research article by Hess et al. (BMC Bioinform 17:189, 2016) on CorBLOSUM, wherein it is argued that an inaccuracy in the BLOSUM code affects the cluster memberships of sequences. They show that replacing the integer based clustering threshold to floating point arguably improves the performances of CorBLOSUM over BLOSUM and RBLOSUM matrices. They compare BLOSUM6214.3 against RBLOSUM69, with relative entropies of 0.2685 and 0.2662 respectively. The present work attempts to repeat the computation to verify the respective analog matrices. RESULTS: In our attempt to repeat the computation, we observed that the relative entropy of BLOSUM6214.3 is 0.2360 and BLOSUM5014.3 is 0.1198. As only matrices of similar entropies can be compared, BLOSUM62 can be compared only with RBLOSUM66 and BLOSUM50 can be compared only with RBLOSUM56. We conducted experiments with Astral data sets, and demonstrated the improved accuracy in the coverage. Our results imply that RBLOSUM performs statistically better than CorBLOSUM and BLOSUM matrices.

Assuntos

Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Análise de Sequência de Proteína/normas

16.

The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction.

Li, Hongjian; Peng, Jiangjun; Leung, Yee; Leung, Kwong-Sak; Wong, Man-Hon; Lu, Gang; Ballester, Pedro J.

Biomolecules ; 8(1)2018 03 14.

Artigo em Inglês | MEDLINE | ID: mdl-29538331

RESUMO

It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future.

Assuntos

Aprendizado de Máquina , Simulação de Acoplamento Molecular/normas , Mapeamento de Interação de Proteínas/métodos , Análise de Sequência de Proteína/normas , Mapeamento de Interação de Proteínas/normas

17.

Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate.

Yu, Chun Yan; Li, Xiao Xu; Yang, Hong; Li, Ying Hong; Xue, Wei Wei; Chen, Yu Zong; Tao, Lin; Zhu, Feng.

Int J Mol Sci ; 19(1)2018 Jan 08.

Artigo em Inglês | MEDLINE | ID: mdl-29316706

RESUMO

The function of a protein is of great interest in the cutting-edge research of biological mechanisms, disease development and drug/target discovery. Besides experimental explorations, a variety of computational methods have been designed to predict protein function. Among these in silico methods, the prediction of BLAST is based on protein sequence similarity, while that of machine learning is also based on the sequence, but without the consideration of their similarity. This unique characteristic of machine learning makes it a good complement to BLAST and many other approaches in predicting the function of remotely relevant proteins and the homologous proteins of distinct function. However, the identification accuracies of these in silico methods and their false discovery rate have not yet been assessed so far, which greatly limits the usage of these algorithms. Herein, a comprehensive comparison of the performances among four popular prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these methods was systematically assessed by four standard statistical indexes based on the independent test datasets of 93 functional protein families defined by UniProtKB keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model organisms (Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae and Mycobacterium tuberculosis). As a result, the substantially higher sensitivity of SVM and BLAST was observed compared with that of PNN and KNN. However, the machine learning algorithms (PNN, KNN and SVM) were found capable of substantially reducing the false discovery rate (SVM < PNN < KNN). In sum, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research.

Assuntos

Análise de Sequência de Proteína/normas , Software , Aprendizado de Máquina , Proteômica/métodos , Proteômica/normas , Reprodutibilidade dos Testes , Análise de Sequência de Proteína/métodos

18.

A Computational-Based Method for Predicting Drug-Target Interactions by Using Stacked Autoencoder Deep Neural Network.

Wang, Lei; You, Zhu-Hong; Chen, Xing; Xia, Shi-Xiong; Liu, Feng; Yan, Xin; Zhou, Yong; Song, Ke-Jian.

J Comput Biol ; 25(3): 361-373, 2018 03.

Artigo em Inglês | MEDLINE | ID: mdl-28891684

RESUMO

Identifying the interaction between drugs and target proteins is an important area of drug research, which provides a broad prospect for low-risk and faster drug development. However, due to the limitations of traditional experiments when revealing drug-protein interactions (DTIs), the screening of targets not only takes a lot of time and money but also has high false-positive and false-negative rates. Therefore, it is imperative to develop effective automatic computational methods to accurately predict DTIs in the postgenome era. In this article, we propose a new computational method for predicting DTIs from drug molecular structure and protein sequence by using the stacked autoencoder of deep learning, which can adequately extract the raw data information. The proposed method has the advantage that it can automatically mine the hidden information from protein sequences and generate highly representative features through iterations of multiple layers. The feature descriptors are then constructed by combining the molecular substructure fingerprint information, and fed into the rotation forest for accurate prediction. The experimental results of fivefold cross-validation indicate that the proposed method achieves superior performance on gold standard data sets (enzymes, ion channels, GPCRs [G-protein-coupled receptors], and nuclear receptors) with accuracy of 0.9414, 0.9116, 0.8669, and 0.8056, respectively. We further comprehensively explore the performance of the proposed method by comparing it with other feature extraction algorithms, state-of-the-art classifiers, and other excellent methods on the same data set. The excellent comparison results demonstrate that the proposed method is highly competitive when predicting drug-target interactions.

Assuntos

Aprendizado Profundo , Simulação de Acoplamento Molecular/métodos , Análise de Sequência de Proteína/métodos , Bases de Dados de Compostos Químicos , Simulação de Acoplamento Molecular/normas , Ligação Proteica , Reprodutibilidade dos Testes , Análise de Sequência de Proteína/normas

19.

Mapping genetic variations to three-dimensional protein structures to enhance variant interpretation: a proposed framework.

Glusman, Gustavo; Rose, Peter W; Prlic, Andreas; Dougherty, Jennifer; Duarte, José M; Hoffman, Andrew S; Barton, Geoffrey J; Bendixen, Emøke; Bergquist, Timothy; Bock, Christian; Brunk, Elizabeth; Buljan, Marija; Burley, Stephen K; Cai, Binghuang; Carter, Hannah; Gao, JianJiong; Godzik, Adam; Heuer, Michael; Hicks, Michael; Hrabe, Thomas; Karchin, Rachel; Leman, Julia Koehler; Lane, Lydie; Masica, David L; Mooney, Sean D; Moult, John; Omenn, Gilbert S; Pearl, Frances; Pejaver, Vikas; Reynolds, Sheila M; Rokem, Ariel; Schwede, Torsten; Song, Sicheng; Tilgner, Hagen; Valasatava, Yana; Zhang, Yang; Deutsch, Eric W.

Genome Med ; 9(1): 113, 2017 Dec 18.

Artigo em Inglês | MEDLINE | ID: mdl-29254494

RESUMO

The translation of personal genomics to precision medicine depends on the accurate interpretation of the multitude of genetic variants observed for each individual. However, even when genetic variants are predicted to modify a protein, their functional implications may be unclear. Many diseases are caused by genetic variants affecting important protein features, such as enzyme active sites or interaction interfaces. The scientific community has catalogued millions of genetic variants in genomic databases and thousands of protein structures in the Protein Data Bank. Mapping mutations onto three-dimensional (3D) structures enables atomic-level analyses of protein positions that may be important for the stability or formation of interactions; these may explain the effect of mutations and in some cases even open a path for targeted drug development. To accelerate progress in the integration of these data types, we held a two-day Gene Variation to 3D (GVto3D) workshop to report on the latest advances and to discuss unmet needs. The overarching goal of the workshop was to address the question: what can be done together as a community to advance the integration of genetic variants and 3D protein structures that could not be done by a single investigator or laboratory? Here we describe the workshop outcomes, review the state of the field, and propose the development of a framework with which to promote progress in this arena. The framework will include a set of standard formats, common ontologies, a common application programming interface to enable interoperation of the resources, and a Tool Registry to make it easy to find and apply the tools to specific analysis problems. Interoperability will enable integration of diverse data sources and tools and collaborative development of variant effect prediction methods.

Assuntos

Estudo de Associação Genômica Ampla/métodos , Polimorfismo Genético , Conformação Proteica , Análise de Sequência de Proteína/métodos , Algoritmos , Congressos como Assunto , Estudo de Associação Genômica Ampla/normas , Humanos , Análise de Sequência de Proteína/normas

20.

Validation of Structures in the Protein Data Bank.

Gore, Swanand; Sanz García, Eduardo; Hendrickx, Pieter M S; Gutmanas, Aleksandras; Westbrook, John D; Yang, Huanwang; Feng, Zukang; Baskaran, Kumaran; Berrisford, John M; Hudson, Brian P; Ikegawa, Yasuyo; Kobayashi, Naohiro; Lawson, Catherine L; Mading, Steve; Mak, Lora; Mukhopadhyay, Abhik; Oldfield, Thomas J; Patwardhan, Ardan; Peisach, Ezra; Sahni, Gaurav; Sekharan, Monica R; Sen, Sanchayita; Shao, Chenghua; Smart, Oliver S; Ulrich, Eldon L; Yamashita, Reiko; Quesada, Martha; Young, Jasmine Y; Nakamura, Haruki; Markley, John L; Berman, Helen M; Burley, Stephen K; Velankar, Sameer; Kleywegt, Gerard J.

Structure ; 25(12): 1916-1927, 2017 12 05.

Artigo em Inglês | MEDLINE | ID: mdl-29174494

RESUMO

The Worldwide PDB recently launched a deposition, biocuration, and validation tool: OneDep. At various stages of OneDep data processing, validation reports for three-dimensional structures of biological macromolecules are produced. These reports are based on recommendations of expert task forces representing crystallography, nuclear magnetic resonance, and cryoelectron microscopy communities. The reports provide useful metrics with which depositors can evaluate the quality of the experimental data, the structural model, and the fit between them. The validation module is also available as a stand-alone web server and as a programmatically accessible web service. A growing number of journals require the official wwPDB validation reports (produced at biocuration) to accompany manuscripts describing macromolecular structures. Upon public release of the structure, the validation report becomes part of the public PDB archive. Geometric quality scores for proteins in the PDB archive have improved over the past decade.

Assuntos

Bases de Dados de Proteínas/normas , Estudos de Validação como Assunto , Análise de Sequência de Proteína/métodos , Análise de Sequência de Proteína/normas

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA