Búsqueda | Portal de Búsqueda de la BVS Enfermería

1.

DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options.

Basu, Sushmita; Zhao, Bi; Biró, Bálint; Faraggi, Eshel; Gsponer, Jörg; Hu, Gang; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Söding, Johannes; Steinegger, Martin; Wang, Duolin; Wang, Kui; Xu, Dong; Zhang, Jian; Kurgan, Lukasz.

Nucleic Acids Res ; 52(D1): D426-D433, 2024 Jan 05.

Artículo en Inglés | MEDLINE | ID: mdl-37933852

RESUMEN

The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.

Asunto(s)

Aminoácidos , Proteoma , Proteoma/química , Bases de Datos Factuales

2.

Rapid discrimination between deleterious and benign missense mutations in the CAGI 6 experiment.

Faraggi, Eshel; Jernigan, Robert L; Kloczkowski, Andrzej.

Hum Genomics ; 18(1): 89, 2024 Aug 27.

Artículo en Inglés | MEDLINE | ID: mdl-39192324

RESUMEN

We describe the machine learning tool that we applied in the CAGI 6 experiment to predict whether single residue mutations in proteins are deleterious or benign. This tool was trained using only single sequences, i.e., without multiple sequence alignments or structural information. Instead, we used global characterizations of the protein sequence. Training and testing data for human gene mutations was obtained from ClinVar (ncbi.nlm.nih.gov/pub/ClinVar/), and for non-human gene mutations from Uniprot (www.uniprot.org). Testing was done on post-training data from ClinVar. This testing yielded high AUC and Matthews correlation coefficient (MCC) for well trained examples but low generalizability. For genes with either sparse or unbalanced training data, the prediction accuracy is poor. The resulting prediction server is available online at http://www.mamiris.com/Shoni.cagi6.

Asunto(s)

Aprendizaje Automático , Mutación Missense , Humanos , Mutación Missense/genética , Programas Informáticos , Biología Computacional/métodos , Proteínas/genética

3.

Prediction of Deleterious Single Amino Acid Polymorphisms with a Consensus Holdout Sampler.

Álvarez-Machancoses, Óscar; Faraggi, Eshel; deAndrés-Galiana, Enrique J; Fernández-Martínez, Juan L; Kloczkowski, Andrzej.

Curr Genomics ; 25(3): 171-184, 2024 May 31.

Artículo en Inglés | MEDLINE | ID: mdl-39086995

RESUMEN

Background: Single Amino Acid Polymorphisms (SAPs) or nonsynonymous Single Nucleotide Variants (nsSNVs) are the most common genetic variations. They result from missense mutations where a single base pair substitution changes the genetic code in such a way that the triplet of bases (codon) at a given position is coding a different amino acid. Since genetic mutations sometimes cause genetic diseases, it is important to comprehend and foresee which variations are harmful and which ones are neutral (not causing changes in the phenotype). This can be posed as a classification problem. Methods: Computational methods using machine intelligence are gradually replacing repetitive and exceedingly overpriced mutagenic tests. By and large, uneven quality, deficiencies, and irregularities of nsSNVs datasets debase the convenience of artificial intelligence-based methods. Subsequently, strong and more exact approaches are needed to address these problems. In the present work paper, we show a consensus classifier built on the holdout sampler, which appears strong and precise and outflanks all other popular methods. Results: We produced 100 holdouts to test the structures and diverse classification variables of diverse classifiers during the training phase. The finest performing holdouts were chosen to develop a consensus classifier and tested using a k-fold (1 ≤ k ≤5) cross-validation method. We also examined which protein properties have the biggest impact on the precise prediction of the effects of nsSNVs. Conclusion: Our Consensus Holdout Sampler outflanks other popular algorithms, and gives excellent results, highly accurate with low standard deviation. The advantage of our method emerges from using a tree of holdouts, where diverse LM/AI-based programs are sampled in diverse ways.

4.

DescribePROT: database of amino acid-level protein structure and function predictions.

Zhao, Bi; Katuwawala, Akila; Oldfield, Christopher J; Dunker, A Keith; Faraggi, Eshel; Gsponer, Jörg; Kloczkowski, Andrzej; Malhis, Nawar; Mirdita, Milot; Obradovic, Zoran; Söding, Johannes; Steinegger, Martin; Zhou, Yaoqi; Kurgan, Lukasz.

Nucleic Acids Res ; 49(D1): D298-D308, 2021 01 08.

Artículo en Inglés | MEDLINE | ID: mdl-33119734

RESUMEN

We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.

Asunto(s)

Aminoácidos/química , Bases de Datos de Proteínas , Genoma , Proteínas/genética , Proteoma/genética , Programas Informáticos , Secuencia de Aminoácidos , Aminoácidos/metabolismo , Animales , Archaea/genética , Archaea/metabolismo , Bacterias/genética , Bacterias/metabolismo , Sitios de Unión , Secuencia Conservada , Hongos/genética , Hongos/metabolismo , Humanos , Internet , Plantas/genética , Plantas/metabolismo , Células Procariotas/metabolismo , Unión Proteica , Estructura Secundaria de Proteína , Proteínas/química , Proteínas/clasificación , Proteínas/metabolismo , Proteoma/química , Proteoma/metabolismo , Análisis de Secuencia de Proteína , Virus/genética , Virus/metabolismo

5.

Entropy, Fluctuations, and Disordered Proteins.

Faraggi, Eshel; Dunker, A Keith; Jernigan, Robert L; Kloczkowski, Andrzej.

Entropy (Basel) ; 21(8)2019 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-32336912

RESUMEN

Entropy should directly reflect the extent of disorder in proteins. By clustering structurally related proteins and studying the multiple-sequence-alignment of the sequences of these clusters, we were able to link between sequence, structure, and disorder information. We introduced several parameters as measures of fluctuations at a given MSA site and used these as representative of the sequence and structure entropy at that site. In general, we found a tendency for negative correlations between disorder and structure, and significant positive correlations between disorder and the fluctuations in the system. We also found evidence for residue-type conservation for those residues proximate to potentially disordered sites. Mutation at the disorder site itself appear to be allowed. In addition, we found positive correlation for disorder and accessible surface area, validating that disordered residues occur in exposed regions of proteins. Finally, we also found that fluctuations in the dihedral angles at the original mutated residue and disorder are positively correlated while dihedral angle fluctuations in spatially proximal residues are negatively correlated with disorder. Our results seem to indicate permissible variability in the disordered site, but greater rigidity in the parts of the protein with which the disordered site interacts. This is another indication that disordered residues are involved in protein function.

6.

A global machine learning based scoring function for protein structure prediction.

Faraggi, Eshel; Kloczkowski, Andrzej.

Proteins ; 82(5): 752-9, 2014 May.

Artículo en Inglés | MEDLINE | ID: mdl-24264942

RESUMEN

We present a knowledge-based function to score protein decoys based on their similarity to native structure. A set of features is constructed to describe the structure and sequence of the entire protein chain. Furthermore, a qualitative relationship is established between the calculated features and the underlying electromagnetic interaction that dominates this scale. The features we use are associated with residue-residue distances, residue-solvent distances, pairwise knowledge-based potentials and a four-body potential. In addition, we introduce a new target to be predicted, the fitness score, which measures the similarity of a model to the native structure. This new approach enables us to obtain information both from decoys and from native structures. It is also devoid of previous problems associated with knowledge-based potentials. These features were obtained for a large set of native and decoy structures and a back-propagating neural network was trained to predict the fitness score. Overall this new scoring potential proved to be superior to the knowledge-based scoring functions used as its inputs. In particular, in the latest CASP (CASP10) experiment our method was ranked third for all targets, and second for freely modeled hard targets among about 200 groups for top model prediction. Ours was the only method ranked in the top three for all targets and for hard targets. This shows that initial results from the novel approach are able to capture details that were missed by a broad spectrum of protein structure prediction approaches. Source codes and executable from this work are freely available at http://mathmed.org/#Software and http://mamiris.com/.

Asunto(s)

Inteligencia Artificial , Biología Computacional/métodos , Proteínas/química , Programas Informáticos

7.

Accurate single-sequence prediction of solvent accessible surface area using local and global features.

Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej.

Proteins ; 82(11): 3170-6, 2014 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-25204636

RESUMEN

We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment-based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org.

Asunto(s)

Redes Neurales de la Computación , Proteínas/química , Algoritmos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Conformación Proteica , Proteínas/metabolismo , Solventes/química

8.

Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles.

Li, Zhixiu; Yang, Yuedong; Faraggi, Eshel; Zhan, Jian; Zhou, Yaoqi.

Proteins ; 82(10): 2565-73, 2014 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-24898915

RESUMEN

Locating sequences compatible with a protein structural fold is the well-known inverse protein-folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy-optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment-derived sequence profiles and structure-derived energy profiles. SPIN improves over the fragment-derived profile by 6.7% (from 23.6 to 30.3%) in sequence identity between predicted and wild-type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significantly better balance of hydrophilic and hydrophobic residues at protein surface. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single-body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at http://sparks-lab.org.

Asunto(s)

Modelos Moleculares , Fragmentos de Péptidos/química , Ingeniería de Proteínas/métodos , Proteínas/química , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Animales , Inteligencia Artificial , Biología Computacional , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Internet , Proteínas Mutantes/química , Proteínas Mutantes/metabolismo , Redes Neurales de la Computación , Fragmentos de Péptidos/genética , Fragmentos de Péptidos/metabolismo , Biblioteca de Péptidos , Conformación Proteica , Pliegue de Proteína , Proteínas/genética , Proteínas/metabolismo , Homología de Secuencia de Aminoácido , Programas Informáticos , Propiedades de Superficie

9.

Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A.

Jain, Shantanu; Trinidad, Marena; Nguyen, Thanh Binh; Jones, Kaiya; Neto, Santiago Diaz; Ge, Fang; Glagovsky, Ailin; Jones, Cameron; Moran, Giankaleb; Wang, Boqi; Rahimi, Kobra; Çalici, Sümeyra Zeynep; Cedillo, Luis R; Berardelli, Silvia; Özden, Buse; Chen, Ken; Katsonis, Panagiotis; Williams, Amanda; Lichtarge, Olivier; Rana, Sadhna; Pradhan, Swatantra; Srinivasan, Rajgopal; Sajeed, Rakshanda; Joshi, Dinesh; Faraggi, Eshel; Jernigan, Robert; Kloczkowski, Andrzej; Xu, Jierui; Song, Zigang; Özkan, Selen; Padilla, Natàlia; de la Cruz, Xavier; Acuna-Hidalgo, Rocio; Grafmüller, Andrea; Jiménez Barrón, Laura T; Manfredi, Matteo; Savojardo, Castrense; Babbi, Giulia; Martelli, Pier Luigi; Casadio, Rita; Sun, Yuanfei; Zhu, Shaowen; Shen, Yang; Pucci, Fabrizio; Rooman, Marianne; Cia, Gabriel; Raimondi, Daniele; Hermans, Pauline; Kwee, Sofia; Chen, Ella.

bioRxiv ; 2024 Jun 17.

Artículo en Inglés | MEDLINE | ID: mdl-38798479

RESUMEN

Continued advances in variant effect prediction are necessary to demonstrate the ability of machine learning methods to accurately determine the clinical impact of variants of unknown significance (VUS). Towards this goal, the ARSA Critical Assessment of Genome Interpretation (CAGI) challenge was designed to characterize progress by utilizing 219 experimentally assayed missense VUS in the Arylsulfatase A (ARSA) gene to assess the performance of community-submitted predictions of variant functional effects. The challenge involved 15 teams, and evaluated additional predictions from established and recently released models. Notably, a model developed by participants of a genetics and coding bootcamp, trained with standard machine-learning tools in Python, demonstrated superior performance among submissions. Furthermore, the study observed that state-of-the-art deep learning methods provided small but statistically significant improvement in predictive performance compared to less elaborate techniques. These findings underscore the utility of variant effect prediction, and the potential for models trained with modest resources to accurately classify VUS in genetic and clinical research.

10.

SPINE X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles.

Faraggi, Eshel; Zhang, Tuo; Yang, Yuedong; Kurgan, Lukasz; Zhou, Yaoqi.

J Comput Chem ; 33(3): 259-67, 2012 Jan 30.

Artículo en Inglés | MEDLINE | ID: mdl-22045506

RESUMEN

Accurate prediction of protein secondary structure is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. The accuracy of ab initio secondary structure prediction from sequence, however, has only increased from around 77 to 80% over the past decade. Here, we developed a multistep neural-network algorithm by coupling secondary structure prediction with prediction of solvent accessibility and backbone torsion angles in an iterative manner. Our method called SPINE X was applied to a dataset of 2640 proteins (25% sequence identity cutoff) previously built for the first version of SPINE and achieved a 82.0% accuracy based on 10-fold cross validation (Q(3)). Surpassing 81% accuracy by SPINE X is further confirmed by employing an independently built test dataset of 1833 protein chains, a recently built dataset of 1975 proteins and 117 CASP 9 targets (critical assessment of structure prediction techniques) with an accuracy of 81.3%, 82.3% and 81.8%, respectively. The prediction accuracy is further improved to 83.8% for the dataset of 2640 proteins if the DSSP assignment used above is replaced by a more consistent consensus secondary structure assignment method. Comparison to the popular PSIPRED and CASP-winning structure-prediction techniques is made. SPINE X predicts number of helices and sheets correctly for 21.0% of 1833 proteins, compared to 17.6% by PSIPRED. It further shows that SPINE X consistently makes more accurate prediction in helical residues (6%) without over prediction while PSIPRED makes more accurate prediction in coil residues (3-5%) and over predicts them by 7%. SPINE X Server and its training/test datasets are available at http://sparks.informatics.iupui.edu/

Asunto(s)

Proteínas/química , Solventes/química , Redes Neurales de la Computación , Estructura Secundaria de Proteína

11.

Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates.

Yang, Yuedong; Faraggi, Eshel; Zhao, Huiying; Zhou, Yaoqi.

Bioinformatics ; 27(15): 2076-82, 2011 Aug 01.

Artículo en Inglés | MEDLINE | ID: mdl-21666270

RESUMEN

MOTIVATION: In recent years, development of a single-method fold-recognition server lags behind consensus and multiple template techniques. However, a good consensus prediction relies on the accuracy of individual methods. This article reports our efforts to further improve a single-method fold recognition technique called SPARKS by changing the alignment scoring function and incorporating the SPINE-X techniques that make improved prediction of secondary structure, backbone torsion angle and solvent accessible surface area. RESULTS: The new method called SPARKS-X was tested with the SALIGN benchmark for alignment accuracy, Lindahl and SCOP benchmarks for fold recognition, and CASP 9 blind test for structure prediction. The method is compared to several state-of-the-art techniques such as HHPRED and BoostThreader. Results show that SPARKS-X is one of the best single-method fold recognition techniques. We further note that incorporating multiple templates and refinement in model building will likely further improve SPARKS-X. AVAILABILITY: The method is available as a SPARKS-X server at http://sparks.informatics.iupui.edu/

Asunto(s)

Biología Computacional/métodos , Pliegue de Proteína , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Modelos Moleculares , Estructura Secundaria de Proteína

12.

Trends in template/fragment-free protein structure prediction.

Zhou, Yaoqi; Duan, Yong; Yang, Yuedong; Faraggi, Eshel; Lei, Hongxing.

Theor Chem Acc ; 128(1): 3-16, 2011 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-21423322

RESUMEN

Predicting the structure of a protein from its amino acid sequence is a long-standing unsolved problem in computational biology. Its solution would be of both fundamental and practical importance as the gap between the number of known sequences and the number of experimentally solved structures widens rapidly. Currently, the most successful approaches are based on fragment/template reassembly. Lacking progress in template-free structure prediction calls for novel ideas and approaches. This article reviews trends in the development of physical and specific knowledge-based energy functions as well as sampling techniques for fragment-free structure prediction. Recent physical- and knowledge-based studies demonstrated that it is possible to sample and predict highly accurate protein structures without borrowing native fragments from known protein structures. These emerging approaches with fully flexible sampling have the potential to move the field forward.

13.

A Hybrid Levenberg-Marquardt Algorithm on a Recursive Neural Network for Scoring Protein Models.

Faraggi, Eshel; Jernigan, Robert L; Kloczkowski, Andrzej.

Methods Mol Biol ; 2190: 307-316, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-32804373

RESUMEN

We have studied the ability of three types of neural networks to predict the closeness of a given protein model to the native structure associated with its sequence. We show that a partial combination of the Levenberg-Marquardt algorithm and the back-propagation algorithm produced the best results, giving the lowest error and largest Pearson correlation coefficient. We also find, as previous studies, that adding associative memory to a neural network improves its performance. Additionally, we find that the hybrid method we propose was the most robust in the sense that other configurations of it experienced less decline in comparison to the other methods. We find that the hybrid networks also undergo more fluctuations on the path to convergence. We propose that these fluctuations allow for better sampling. Overall we find it may be beneficial to treat different parts of a neural network with varied computational approaches during optimization.

Asunto(s)

Redes Neurales de la Computación , Proteínas/química , Algoritmos

14.

Fluctuations of backbone torsion angles obtained from NMR-determined structures and their prediction.

Zhang, Tuo; Faraggi, Eshel; Zhou, Yaoqi.

Proteins ; 78(16): 3353-62, 2010 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-20818661

RESUMEN

Protein molecules exhibit varying degrees of flexibility throughout their three-dimensional structures. Protein structural flexibility is often characterized by fluctuations in the Cartesian coordinate space. On the other hand, the protein backbone can be mostly defined by two torsion angles Ï and ψ only. We introduce a new flexibility descriptor, backbone torsion-angle fluctuation derived from the variation of backbone torsion angles from different NMR models. The torsion-angle fluctuations correlate with mean-squared spatial fluctuations derived from the same collection of NMR models. We developed a neural-network based real-value predictor based on sequence information only. The predictor achieved ten-fold cross-validated correlation coefficients of 0.59 and 0.60, and mean absolute errors of 22.7° and 24.3° for the angle fluctuation of Ï and ψ, respectively. This predictor is expected to be useful for function prediction and protein structure prediction when predicted torsion angles are used as restraints. Both sequence- and structure-based prediction of torsion-angle fluctuation will be available at http://sparks.informatics.iupui.edu within the SPINE-X package.

Asunto(s)

Resonancia Magnética Nuclear Biomolecular , Proteínas/química , Torsión Mecánica , Aminoácidos/química , Bases de Datos de Proteínas , Docilidad , Estructura Secundaria de Proteína , Reproducibilidad de los Resultados , Solventes/química

15.

Computational Ways to Enhance Protein Inhibitor Design.

Jernigan, Robert L; Sankar, Kannan; Jia, Kejue; Faraggi, Eshel; Kloczkowski, Andrzej.

Front Mol Biosci ; 7: 607323, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-33614705

RESUMEN

Two new computational approaches are described to aid in the design of new peptide-based drugs by evaluating ensembles of protein structures from their dynamics and through the assessing of structures using empirical contact potential. These approaches build on the concept that conformational variability can aid in the binding process and, for disordered proteins, can even facilitate the binding of more diverse ligands. This latter consideration indicates that such a design process should be less restrictive so that multiple inhibitors might be effective. The example chosen here focuses on proteins/peptides that bind to hemagglutinin (HA) to block the large-scale conformational change for activation. Variability in the conformations is considered from sets of experimental structures, or as an alternative, from their simple computed dynamics; the set of designe peptides/small proteins from the David Baker lab designed to bind to hemagglutinin, is the large set considered and is assessed with the new empirical contact potentials.

16.

Many-to-one binding by intrinsically disordered protein regions.

Alterovitz, Wei-Lun; Faraggi, Eshel; Oldfield, Christopher J; Meng, Jingwei; Xue, Bin; Huang, Fei; Romero, Pedro; Kloczkowski, Andrzej; Uversky, Vladimir N; Dunker, A Keith.

Pac Symp Biocomput ; 25: 159-170, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-31797594

RESUMEN

Disordered binding regions (DBRs), which are embedded within intrinsically disordered proteins or regions (IDPs or IDRs), enable IDPs or IDRs to mediate multiple protein-protein interactions. DBR-protein complexes were collected from the Protein Data Bank for which two or more DBRs having different amino acid sequences bind to the same (100% sequence identical) globular protein partner, a type of interaction herein called many-to-one binding. Two distinct binding profiles were identified: independent and overlapping. For the overlapping binding profiles, the distinct DBRs interact by means of almost identical binding sites (herein called "similar"), or the binding sites contain both common and divergent interaction residues (herein called "intersecting"). Further analysis of the sequence and structural differences among these three groups indicate how IDP flexibility allows different segments to adjust to similar, intersecting, and independent binding pockets.

Asunto(s)

Proteínas Intrínsecamente Desordenadas , Secuencia de Aminoácidos , Biología Computacional , Bases de Datos de Proteínas , Humanos , Unión Proteica , Conformación Proteica

17.

Predicting residue-residue contact maps by a two-layer, integrated neural-network method.

Xue, Bin; Faraggi, Eshel; Zhou, Yaoqi.

Proteins ; 76(1): 176-83, 2009 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-19137600

RESUMEN

A neural network method (SPINE-2D) is introduced to provide a sequence-based prediction of residue-residue contact maps. This method is built on the success of SPINE in predicting secondary structure, residue solvent accessibility, and backbone torsion angles via large-scale training with overfit protection and a two-layer neural network. SPINE-2D achieved a 10-fold cross-validated accuracy of 47% (+/-2%) for top L/5 predicted contacts between two residues with sequence separation of six or more and an accuracy of 24 +/- 1% for nonlocal contacts with sequence separation of 24 residues or more. The accuracies of 23% and 26% for nonlocal contact predictions are achieved for two independent datasets of 500 proteins and 82 CASP 7 targets, respectively. A comparison with other methods indicates that SPINE-2D is among the most accurate methods for contact-map prediction. SPINE-2D is available as a webserver at http://sparks.informatics.iupui.edu.

Asunto(s)

Biología Computacional/métodos , Redes Neurales de la Computación , Proteínas/química , Secuencia de Aminoácidos , Sitios de Unión , Simulación por Computador , Modelos Moleculares , Estructura Secundaria de Proteína

18.

Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network.

Faraggi, Eshel; Xue, Bin; Zhou, Yaoqi.

Proteins ; 74(4): 847-56, 2009 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-18704931

RESUMEN

This article attempts to increase the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins through improved learning. Most methods developed for improving the backpropagation algorithm of artificial neural networks are limited to small neural networks. Here, we introduce a guided-learning method suitable for networks of any size. The method employs a part of the weights for guiding and the other part for training and optimization. We demonstrate this technique by predicting residue solvent accessibility and real-value backbone torsion angles of proteins. In this application, the guiding factor is designed to satisfy the intuitive condition that for most residues, the contribution of a residue to the structural properties of another residue is smaller for greater separation in the protein-sequence distance between the two residues. We show that the guided-learning method makes a 2-4% reduction in 10-fold cross-validated mean absolute errors (MAE) for predicting residue solvent accessibility and backbone torsion angles, regardless of the size of database, the number of hidden layers and the size of input windows. This together with introduction of two-layer neural network with a bipolar activation function leads to a new method that has a MAE of 0.11 for residue solvent accessibility, 36 degrees for psi, and 22 degrees for phi. The method is available as a Real-SPINE 3.0 server in http://sparks.informatics.iupui.edu.

Asunto(s)

Redes Neurales de la Computación , Proteínas/química , Sitios de Unión , Simulación por Computador , Bases de Datos de Proteínas , Modelos Moleculares , Conformación Proteica , Solventes/química

19.

Real-value prediction of backbone torsion angles.

Xue, Bin; Dor, Ofer; Faraggi, Eshel; Zhou, Yaoqi.

Proteins ; 72(1): 427-33, 2008 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-18214956

RESUMEN

The backbone structure of a protein is largely determined by the phi and psi torsion angles. Thus, knowing these angles, even if approximately, will be very useful for protein-structure prediction. However, in a previous work, a sequence-based, real-value prediction of psi angle could only achieve a mean absolute error of 54 degrees (83 degrees, 35 degrees, 33 degrees for coil, strand, and helix residues, respectively) between predicted and actual angles. Moreover, a real-value prediction of phi angle is not yet available. This article employs a neural-network based approach to improve psi prediction by taking advantage of angle periodicity and apply the new method to the prediction to phi angles. The 10-fold-cross-validated mean absolute error for the new method is 38 degrees (58 degrees, 33 degrees, 22 degrees for coil, strand, and helix, respectively) for psi and 25 degrees (35 degrees, 22 degrees, 16 degrees for coil, strand, and helix, respectively) for phi. The accuracy of real-value prediction is comparable to or more accurate than the predictions based on multistate classification of the phi-psi map. More accurate prediction of real-value angles will likely be useful for improving the accuracy of fold recognition and ab initio protein-structure prediction. The Real-SPINE 2.0 server is available on the website http://sparks.informatics.iupui.edu.

Asunto(s)

Aminoácidos/química , Torsión Mecánica , Algoritmos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Proteínas/química , Reproducibilidad de los Resultados , Solventes

20.

Reoptimized UNRES Potential for Protein Model Quality Assessment.

Faraggi, Eshel; Krupa, Pawel; Mozolewska, Magdalena A; Liwo, Adam; Kloczkowski, Andrzej.

Genes (Basel) ; 9(12)2018 Dec 03.

Artículo en Inglés | MEDLINE | ID: mdl-30513992

RESUMEN

Ranking protein structure models is an elusive problem in bioinformatics. These models are evaluated on both the degree of similarity to the native structure and the folding pathway. Here, we simulated the use of the coarse-grained UNited RESidue (UNRES) force field as a tool to choose the best protein structure models for a given protein sequence among a pool of candidate models, using server data from the CASP11 experiment. Because the original UNRES was optimized for Molecular Dynamics simulations, we reoptimized UNRES using a deep feed-forward neural network, and we show that introducing additional descriptive features can produce better results. Overall, we found that the reoptimized UNRES performs better in selecting the best structures and tracking protein unwinding from its native state. We also found a relatively poor correlation between UNRES values and the model's Template Modeling Score (TMS). This is remedied by reoptimization. We discuss some cases where our reoptimization procedure is useful. The reoptimized version of UNRES (OUNRES) is available at http://mamiris.com and http://www.unres.pl.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA