Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 193
Filter
Add more filters

Publication year range
1.
Nucleic Acids Res ; 52(2): e10, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38048333

ABSTRACT

Current predictors of DNA-binding residues (DBRs) from protein sequences belong to two distinct groups, those trained on binding annotations extracted from structured protein-DNA complexes (structure-trained) vs. intrinsically disordered proteins (disorder-trained). We complete the first empirical analysis of predictive performance across the structure- and disorder-annotated proteins for a representative collection of ten predictors. Majority of the structure-trained tools perform well on the structure-annotated proteins while doing relatively poorly on the disorder-annotated proteins, and vice versa. Several methods make accurate predictions for the structure-annotated proteins or the disorder-annotated proteins, but none performs highly accurately for both annotation types. Moreover, most predictors make excessive cross-predictions for the disorder-annotated proteins, where residues that interact with non-DNA ligand types are predicted as DBRs. Motivated by these results, we design, validate and deploy an innovative meta-model, hybridDBRpred, that uses deep transformer network to combine predictions generated by three best current predictors. HybridDBRpred provides accurate predictions and low levels of cross-predictions across the two annotation types, and is statistically more accurate than each of the ten tools and baseline meta-predictors that rely on averaging and logistic regression. We deploy hybridDBRpred as a convenient web server at http://biomine.cs.vcu.edu/servers/hybridDBRpred/ and provide the corresponding source code at https://github.com/jianzhang-xynu/hybridDBRpred.


Subject(s)
DNA-Binding Proteins , Software , Amino Acid Sequence , Amino Acids , Computational Biology/methods , Databases, Protein , DNA , Intrinsically Disordered Proteins/chemistry , DNA-Binding Proteins/chemistry
2.
Nucleic Acids Res ; 52(D1): D426-D433, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37933852

ABSTRACT

The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.


Subject(s)
Amino Acids , Proteome , Proteome/chemistry , Databases, Factual
3.
Brief Bioinform ; 24(3)2023 05 19.
Article in English | MEDLINE | ID: mdl-37068304

ABSTRACT

Human leukocyte antigen class I (HLA-I) molecules bind intracellular peptides produced by protein hydrolysis and present them to the T cells for immune recognition and response. Prediction of peptides that bind HLA-I molecules is very important in immunotherapy. A growing number of computational predictors have been developed in recent years. We survey a comprehensive collection of 27 tools focusing on their input and output data characteristics, key aspects of the underlying predictive models and their availability. Moreover, we evaluate predictive performance for eight representative predictors. We consider a wide spectrum of relevant aspects including allele-specific analysis, influence of negative to positive data ratios and runtime. We also curate high-quality benchmark datasets based on analysis of the consistency of the data labels. Results reveal that each considered method provides accurate results, which can be explained by our analysis that finds that their predictive models capture meaningful binding motifs. Although some methods are overall more accurate than others, we find that none of them is universally superior. We provide a comprehensive comparison of the convenience as well as the accuracy of the methods under specific prediction scenarios, such as for specific alleles, metrics of predictive performance and constraints on runtime. Our systematic and broad analysis provides informative clues to the users to identify the most suitable tools for a given prediction scenario and for the developers to design future methods.


Subject(s)
Histocompatibility Antigens Class I , Peptides , Humans , Protein Binding , Peptides/chemistry
4.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36458437

ABSTRACT

One of key features of intrinsically disordered regions (IDRs) is facilitation of protein-protein and protein-nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.


Subject(s)
Intrinsically Disordered Proteins , Intrinsically Disordered Proteins/chemistry , Computational Biology/methods , Amino Acid Sequence , Peptides/metabolism , Protein Domains , Databases, Protein , Protein Binding
5.
Brief Bioinform ; 24(6)2023 09 22.
Article in English | MEDLINE | ID: mdl-37874948

ABSTRACT

Proteases contribute to a broad spectrum of cellular functions. Given a relatively limited amount of experimental data, developing accurate sequence-based predictors of substrate cleavage sites facilitates a better understanding of protease functions and substrate specificity. While many protease-specific predictors of substrate cleavage sites were developed, these efforts are outpaced by the growth of the protease substrate cleavage data. In particular, since data for 100+ protease types are available and this number continues to grow, it becomes impractical to publish predictors for new protease types, and instead it might be better to provide a computational platform that helps users to quickly and efficiently build predictors that address their specific needs. To this end, we conceptualized, developed, tested and released a versatile bioinformatics platform, ProsperousPlus, that empowers users, even those with no programming or little bioinformatics background, to build fast and accurate predictors of substrate cleavage sites. ProsperousPlus facilitates the use of the rapidly accumulating substrate cleavage data to train, empirically assess and deploy predictive models for user-selected substrate types. Benchmarking tests on test datasets show that our platform produces predictors that on average exceed the predictive performance of current state-of-the-art approaches. ProsperousPlus is available as a webserver and a stand-alone software package at http://prosperousplus.unimelb-biotools.cloud.edu.au/.


Subject(s)
Machine Learning , Peptide Hydrolases , Peptide Hydrolases/metabolism , Substrate Specificity , Algorithms
6.
Nucleic Acids Res ; 51(W1): W141-W147, 2023 07 05.
Article in English | MEDLINE | ID: mdl-37140058

ABSTRACT

Intrinsic disorder in proteins is relatively abundant in nature and essential for a broad spectrum of cellular functions. While disorder can be accurately predicted from protein sequences, as it was empirically demonstrated in recent community-organized assessments, it is rather challenging to collect and compile a comprehensive prediction that covers multiple disorder functions. To this end, we introduce the DEPICTER2 (DisorderEd PredictIon CenTER) webserver that offers convenient access to a curated collection of fast and accurate disorder and disorder function predictors. This server includes a state-of-the-art disorder predictor, flDPnn, and five modern methods that cover all currently predictable disorder functions: disordered linkers and protein, peptide, DNA, RNA and lipid binding. DEPICTER2 allows selection of any combination of the six methods, batch predictions of up to 25 proteins per request and provides interactive visualization of the resulting predictions. The webserver is freely available at http://biomine.cs.vcu.edu/servers/DEPICTER2/.


Subject(s)
Computational Biology , Data Visualization , Internet , Proteins , Computational Biology/instrumentation , Computational Biology/methods , Databases, Protein , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Protein Binding , User-Computer Interface
7.
Nucleic Acids Res ; 51(5): e25, 2023 03 21.
Article in English | MEDLINE | ID: mdl-36629262

ABSTRACT

The sequence-based predictors of RNA-binding residues (RBRs) are trained on either structure-annotated or disorder-annotated binding regions. A recent study of predictors of protein-binding residues shows that they are plagued by high levels of cross-predictions (protein binding residues are predicted as nucleic acid binding) and that structure-trained predictors perform poorly for the disorder-annotated regions and vice versa. Consequently, we analyze a representative set of the structure and disorder trained predictors of RBRs to comprehensively assess quality of their predictions. Our empirical analysis that relies on a new and low-similarity benchmark dataset reveals that the structure-trained predictors of RBRs perform well for the structure-annotated proteins while the disorder-trained predictors provide accurate results for the disorder-annotated proteins. However, these methods work only modestly well on the opposite types of annotations, motivating the need for new solutions. Using an empirical approach, we design HybridRNAbind meta-model that generates accurate predictions and low amounts of cross-predictions when tested on data that combines structure and disorder-annotated RBRs. We release this meta-model as a convenient webserver which is available at https://www.csuligroup.com/hybridRNAbind/.


Subject(s)
Proteins , RNA-Binding Proteins , RNA , Computational Biology/methods , Databases, Protein , Protein Binding/genetics , Proteins/chemistry , RNA/chemistry , RNA-Binding Proteins/chemistry
8.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34905768

ABSTRACT

Proteins with intrinsically disordered regions (IDRs) are common among eukaryotes. Many IDRs interact with nucleic acids and proteins. Annotation of these interactions is supported by computational predictors, but to date, only one tool that predicts interactions with nucleic acids was released, and recent assessments demonstrate that current predictors offer modest levels of accuracy. We have developed DeepDISOBind, an innovative deep multi-task architecture that accurately predicts deoxyribonucleic acid (DNA)-, ribonucleic acid (RNA)- and protein-binding IDRs from protein sequences. DeepDISOBind relies on an information-rich sequence profile that is processed by an innovative multi-task deep neural network, where subsequent layers are gradually specialized to predict interactions with specific partner types. The common input layer links to a layer that differentiates protein- and nucleic acid-binding, which further links to layers that discriminate between DNA and RNA interactions. Empirical tests show that this multi-task design provides statistically significant gains in predictive quality across the three partner types when compared to a single-task design and a representative selection of the existing methods that cover both disorder- and structure-trained tools. Analysis of the predictions on the human proteome reveals that DeepDISOBind predictions can be encoded into protein-level propensities that accurately predict DNA- and RNA-binding proteins and protein hubs. DeepDISOBind is available at https://www.csuligroup.com/DeepDISOBind/.


Subject(s)
DNA-Binding Proteins/chemistry , DNA/chemistry , Deep Learning , Intrinsically Disordered Proteins/chemistry , RNA-Binding Proteins/chemistry , RNA/chemistry , Computational Biology/methods , DNA/metabolism , DNA-Binding Proteins/metabolism , Humans , Neural Networks, Computer , Nucleic Acids/metabolism , Protein Binding , Proteome/metabolism , RNA/metabolism , RNA-Binding Proteins/metabolism
9.
Nucleic Acids Res ; 50(W1): W434-W447, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35524557

ABSTRACT

The rapid accumulation of molecular data motivates development of innovative approaches to computationally characterize sequences, structures and functions of biological and chemical molecules in an efficient, accessible and accurate manner. Notwithstanding several computational tools that characterize protein or nucleic acids data, there are no one-stop computational toolkits that comprehensively characterize a wide range of biomolecules. We address this vital need by developing a holistic platform that generates features from sequence and structural data for a diverse collection of molecule types. Our freely available and easy-to-use iFeatureOmega platform generates, analyzes and visualizes 189 representations for biological sequences, structures and ligands. To the best of our knowledge, iFeatureOmega provides the largest scope when directly compared to the current solutions, in terms of the number of feature extraction and analysis approaches and coverage of different molecules. We release three versions of iFeatureOmega including a webserver, command line interface and graphical interface to satisfy needs of experienced bioinformaticians and less computer-savvy biologists and biochemists. With the assistance of iFeatureOmega, users can encode their molecular data into representations that facilitate construction of predictive models and analytical studies. We highlight benefits of iFeatureOmega based on three research applications, demonstrating how it can be used to accelerate and streamline research in bioinformatics, computational biology, and cheminformatics areas. The iFeatureOmega webserver is freely available at http://ifeatureomega.erc.monash.edu and the standalone versions can be downloaded from https://github.com/Superzchen/iFeatureOmega-GUI/ and https://github.com/Superzchen/iFeatureOmega-CLI/.


Subject(s)
Computational Biology , Ligands , Software , Proteins
10.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-34415020

ABSTRACT

Efforts to elucidate protein-DNA interactions at the molecular level rely in part on accurate predictions of DNA-binding residues in protein sequences. While there are over a dozen computational predictors of the DNA-binding residues, they are DNA-type agnostic and significantly cross-predict residues that interact with other ligands as DNA binding. We leverage a custom-designed machine learning architecture to introduce DNAgenie, first-of-its-kind predictor of residues that interact with A-DNA, B-DNA and single-stranded DNA. DNAgenie uses a comprehensive physiochemical profile extracted from an input protein sequence and implements a two-step refinement process to provide accurate predictions and to minimize the cross-predictions. Comparative tests on an independent test dataset demonstrate that DNAgenie outperforms the current methods that we adapt to predict residue-level interactions with the three DNA types. Further analysis finds that the use of the second (refinement) step leads to a substantial reduction in the cross predictions. Empirical tests show that DNAgenie's outputs that are converted to coarse-grained protein-level predictions compare favorably against recent tools that predict which DNA-binding proteins interact with double-stranded versus single-stranded DNAs. Moreover, predictions from the sequences of the whole human proteome reveal that the results produced by DNAgenie substantially overlap with the known DNA-binding proteins while also including promising leads for several hundred previously unknown putative DNA binders. These results suggest that DNAgenie is a valuable tool for the sequence-based characterization of protein functions. The DNAgenie's webserver is available at http://biomine.cs.vcu.edu/servers/DNAgenie/.


Subject(s)
Base Sequence , Binding Sites , Computational Biology/methods , DNA-Binding Proteins/metabolism , DNA/chemistry , Software , Amino Acid Sequence , DNA/genetics , DNA-Binding Proteins/chemistry , Databases, Genetic , Machine Learning , Models, Molecular , Protein Binding , Reproducibility of Results , Structure-Activity Relationship , Web Browser
11.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32459334

ABSTRACT

In recent years, high-throughput experimental techniques have significantly enhanced the accuracy and coverage of protein-protein interaction identification, including human-pathogen protein-protein interactions (HP-PPIs). Despite this progress, experimental methods are, in general, expensive in terms of both time and labour costs, especially considering that there are enormous amounts of potential protein-interacting partners. Developing computational methods to predict interactions between human and bacteria pathogen has thus become critical and meaningful, in both facilitating the detection of interactions and mining incomplete interaction maps. In this paper, we present a systematic evaluation of machine learning-based computational methods for human-bacterium protein-protein interactions (HB-PPIs). We first reviewed a vast number of publicly available databases of HP-PPIs and then critically evaluate the availability of these databases. Benefitting from its well-structured nature, we subsequently preprocess the data and identified six bacterium pathogens that could be used to study bacterium subjects in which a human was the host. Additionally, we thoroughly reviewed the literature on 'host-pathogen interactions' whereby existing models were summarized that we used to jointly study the impact of different feature representation algorithms and evaluate the performance of existing machine learning computational models. Owing to the abundance of sequence information and the limited scale of other protein-related information, we adopted the primary protocol from the literature and dedicated our analysis to a comprehensive assessment of sequence information and machine learning models. A systematic evaluation of machine learning models and a wide range of feature representation algorithms based on sequence information are presented as a comparison survey towards the prediction performance evaluation of HB-PPIs.


Subject(s)
Host-Pathogen Interactions , Machine Learning , Protein Interaction Mapping/methods , Algorithms , Computational Biology/methods , Humans
12.
Methods ; 204: 132-141, 2022 08.
Article in English | MEDLINE | ID: mdl-35367597

ABSTRACT

With over 40 years of research, researchers in the intrinsic disorder prediction field developed over 100 computational predictors. This review offers a holistic perspective of this field by highlighting accurate and popular disorder predictors and introducing a wide range of practical resources that support collection, interpretation and application of disorder predictions. These resources include meta webservers that expedite collection of multiple disorder predictions, large databases of pre-computed disorder predictions that ease collection of predictions particularly for large datasets of proteins, and modern quality assessment tools. The latter methods facilitate identification of accurate predictions in a specific protein sequence, reducing uncertainty associated to the use of the putative disorder. Altogether, we review eleven predictors, four meta webservers, three databases and two quality assessment tools, all of which are conveniently available online. We also offer a perspective on future developments of the disorder prediction and the quality assessment tools. The availability of this comprehensive toolbox of useful resources should stimulate further growth in the application of the disorder predictions across many areas including rational drug design, systems medicine, structural bioinformatics and structural genomics.


Subject(s)
Intrinsically Disordered Proteins , Amino Acid Sequence , Computational Biology , Databases, Protein , Drug Design , Intrinsically Disordered Proteins/chemistry
13.
Nucleic Acids Res ; 49(10): e60, 2021 06 04.
Article in English | MEDLINE | ID: mdl-33660783

ABSTRACT

Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs and proteins. Rapid accumulation of sequences requires equally pervasive development of new predictive models, which depends on the availability of effective tools that support these efforts. We introduce iLearnPlus, the first machine-learning platform with graphical- and web-based interfaces for the construction of machine-learning pipelines for analysis and predictions using nucleic acid and protein sequences. iLearnPlus provides a comprehensive set of algorithms and automates sequence-based feature extraction and analysis, construction and deployment of models, assessment of predictive performance, statistical analysis, and data visualization; all without programming. iLearnPlus includes a wide range of feature sets which encode information from the input sequences and over twenty machine-learning algorithms that cover several deep-learning approaches, outnumbering the current solutions by a wide margin. Our solution caters to experienced bioinformaticians, given the broad range of options, and biologists with no programming background, given the point-and-click interface and easy-to-follow design process. We showcase iLearnPlus with two case studies concerning prediction of long noncoding RNAs (lncRNAs) from RNA transcripts and prediction of crotonylation sites in protein chains. iLearnPlus is an open-source platform available at https://github.com/Superzchen/iLearnPlus/ with the webserver at http://ilearnplus.erc.monash.edu/.


Subject(s)
Computational Biology/methods , Machine Learning , Sequence Analysis/methods , Software , Amino Acid Sequence , Animals , Base Sequence , Humans
14.
Nucleic Acids Res ; 49(D1): D298-D308, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33119734

ABSTRACT

We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.


Subject(s)
Amino Acids/chemistry , Databases, Protein , Genome , Proteins/genetics , Proteome/genetics , Software , Amino Acid Sequence , Amino Acids/metabolism , Animals , Archaea/genetics , Archaea/metabolism , Bacteria/genetics , Bacteria/metabolism , Binding Sites , Conserved Sequence , Fungi/genetics , Fungi/metabolism , Humans , Internet , Plants/genetics , Plants/metabolism , Prokaryotic Cells/metabolism , Protein Binding , Protein Structure, Secondary , Proteins/chemistry , Proteins/classification , Proteins/metabolism , Proteome/chemistry , Proteome/metabolism , Sequence Analysis, Protein , Viruses/genetics , Viruses/metabolism
15.
Brief Bioinform ; 21(5): 1509-1522, 2020 09 25.
Article in English | MEDLINE | ID: mdl-31616935

ABSTRACT

Experimental annotations of intrinsic disorder are available for 0.1% of 147 000 000 of currently sequenced proteins. Over 60 sequence-based disorder predictors were developed to help bridge this gap. Current benchmarks of these methods assess predictive performance on datasets of proteins; however, predictions are often interpreted for individual proteins. We demonstrate that the protein-level predictive performance varies substantially from the dataset-level benchmarks. Thus, we perform first-of-its-kind protein-level assessment for 13 popular disorder predictors using 6200 disorder-annotated proteins. We show that the protein-level distributions are substantially skewed toward high predictive quality while having long tails of poor predictions. Consequently, between 57% and 75% proteins secure higher predictive performance than the currently used dataset-level assessment suggests, but as many as 30% of proteins that are located in the long tails suffer low predictive performance. These proteins typically have relatively high amounts of disorder, in contrast to the mostly structured proteins that are predicted accurately by all 13 methods. Interestingly, each predictor provides the most accurate results for some number of proteins, while the best-performing at the dataset-level method is in fact the best for only about 30% of proteins. Moreover, the majority of proteins are predicted more accurately than the dataset-level performance of the most accurate tool by at least four disorder predictors. While these results suggests that disorder predictors outperform their current benchmark performance for the majority of proteins and that they complement each other, novel tools that accurately identify the hard-to-predict proteins and that make accurate predictions for these proteins are needed.


Subject(s)
Intrinsically Disordered Proteins/chemistry , Algorithms , Computational Biology/methods , Crystallography, X-Ray , Databases, Protein , Datasets as Topic , Nuclear Magnetic Resonance, Biomolecular , Protein Conformation , Sequence Analysis, Protein/methods
16.
Bioinformatics ; 38(1): 115-124, 2021 12 22.
Article in English | MEDLINE | ID: mdl-34487138

ABSTRACT

MOTIVATION: Intrinsically disordered protein regions interact with proteins, nucleic acids and lipids. Regions that bind lipids are implicated in a wide spectrum of cellular functions and several human diseases. Motivated by the growing amount of experimental data for these interactions and lack of tools that can predict them from the protein sequence, we develop DisoLipPred, the first predictor of the disordered lipid-binding residues (DLBRs). RESULTS: DisoLipPred relies on a deep bidirectional recurrent network that implements three innovative features: transfer learning, bypass module that sidesteps predictions for putative structured residues, and expanded inputs that cover physiochemical properties associated with the protein-lipid interactions. Ablation analysis shows that these features drive predictive quality of DisoLipPred. Tests on an independent test dataset and the yeast proteome reveal that DisoLipPred generates accurate results and that none of the related existing tools can be used to indirectly identify DLBR. We also show that DisoLipPred's predictions complement the results generated by predictors of the transmembrane regions. Altogether, we conclude that DisoLipPred provides high-quality predictions of DLBRs that complement the currently available methods. AVAILABILITY AND IMPLEMENTATION: DisoLipPred's webserver is available at http://biomine.cs.vcu.edu/servers/DisoLipPred/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology , Intrinsically Disordered Proteins , Humans , Computational Biology/methods , Amino Acid Sequence , Intrinsically Disordered Proteins/chemistry , Machine Learning , Lipids
17.
Bioinformatics ; 37(23): 4366-4374, 2021 12 07.
Article in English | MEDLINE | ID: mdl-34247234

ABSTRACT

MOTIVATION: X-ray crystallography was used to produce nearly 90% of protein structures. These efforts were supported by numerous sequence-based tools that accurately predict crystallizable proteins. However, protein structures vary widely in their quality, typically measured with resolution and R-free. This impacts the ability to use these structures for some applications including rational drug design and molecular docking and motivates development of methods that accurately predict structure quality from sequence. RESULTS: We introduce XRRpred, the first predictor of the resolution and R-free values from protein sequences. XRRpred relies on original sequence profiles, hand-crafted features, empirically selected and parametrized regressors and modern resampling techniques. Using an independent test dataset, we show that XRRpred provides accurate predictions of resolution and R-free. We demonstrate that XRRpred's predictions correctly model relationship between the resolution and R-free and reproduce structure quality relations between structural classes of proteins. We also show that XRRpred significantly outperforms indirect alternative ways to predict the structure quality that include predictors of crystallization propensity and an alignment-based approach. XRRpred is available as a convenient webserver that allows batch predictions and offers informative visualization of the results. AVAILABILITY AND IMPLEMENTATION: http://biomine.cs.vcu.edu/servers/XRRPred/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Proteins , Molecular Docking Simulation , Proteins/chemistry , Amino Acid Sequence , Crystallography, X-Ray , Crystallization
18.
Cell Mol Life Sci ; 78(5): 2371-2385, 2021 Mar.
Article in English | MEDLINE | ID: mdl-32997198

ABSTRACT

Intrinsic disorder can be found in all proteomes of all kingdoms of life and in viruses, being particularly prevalent in the eukaryotes. We conduct a comprehensive analysis of the intrinsic disorder in the human proteins while mapping them into 24 compartments of the human cell. In agreement with previous studies, we show that human proteins are significantly enriched in disorder relative to a generic protein set that represents the protein universe. In fact, the fraction of proteins with long disordered regions and the average protein-level disorder content in the human proteome are about 3 times higher than in the protein universe. Furthermore, levels of intrinsic disorder in the majority of human subcellular compartments significantly exceed the average disorder content in the protein universe. Relative to the overall amount of disorder in the human proteome, proteins localized in the nucleus and cytoskeleton have significantly increased amounts of disorder, measured by both high disorder content and presence of multiple long intrinsically disordered regions. We empirically demonstrate that, on average, human proteins are assigned to 2.3 subcellular compartments, with proteins localized to few subcellular compartments being more disordered than the proteins that are localized to many compartments. Functionally, the disordered proteins localized in the most disorder-enriched subcellular compartments are primarily responsible for interactions with nucleic acids and protein partners. This is the first-time disorder is comprehensively mapped into the human cell. Our observations add a missing piece to the puzzle of functional disorder and its organization inside the cell.


Subject(s)
Cell Compartmentation , Eukaryotic Cells/metabolism , Intracellular Space/metabolism , Intrinsically Disordered Proteins/metabolism , Proteome/metabolism , Cell Nucleus/metabolism , Cytoskeleton/metabolism , Databases, Protein/statistics & numerical data , Humans , Intrinsically Disordered Proteins/classification , Models, Biological , Proteome/classification
19.
Proteins ; 89(10): 1289-1299, 2021 10.
Article in English | MEDLINE | ID: mdl-34008220

ABSTRACT

A novel virus, severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2), causing coronavirus disease 2019 (COVID-19) worldwide appeared in 2019. Detailed scientific knowledge of the members of the Coronaviridae family, including the Middle East Respiratory Syndrome Coronavirus (MERS-CoV) is currently lacking. Structural studies of the MERS-CoV proteins in the current literature are extremely limited. We present here detailed characterization of the structural properties of MERS-CoV macro domain in aqueous solution. Additionally, we studied the impacts of chosen force field parameters and parallel tempering simulation techniques on the predicted structural properties of MERS-CoV macro domain in aqueous solution. For this purpose, we conducted extensive Hamiltonian-replica exchange molecular dynamics simulations and Temperature-replica exchange molecular dynamics simulations using the CHARMM36m and AMBER99SB parameters for the macro domain. This study shows that the predicted secondary structure properties including their propensities depend on the chosen simulation technique and force field parameter. We perform structural clustering based on the radius of gyration and end-to-end distance of MERS-CoV macro domain in aqueous solution. We also report and analyze the residue-level intrinsic disorder features, flexibility and secondary structure. Furthermore, we study the propensities of this macro domain for protein-protein interactions and for the RNA and DNA binding. Overall, results are in agreement with available nuclear magnetic resonance spectroscopy findings and present more detailed insights into the structural properties of MERS CoV macro domain in aqueous solution. All in all, we present the structural properties of the aqueous MERS-CoV macro domain using different parallel tempering simulation techniques, force field parameters and bioinformatics tools.


Subject(s)
Middle East Respiratory Syndrome Coronavirus/chemistry , Middle East Respiratory Syndrome Coronavirus/metabolism , Molecular Dynamics Simulation , Water/chemistry , Water/metabolism , Humans , Protein Domains/physiology , Protein Structure, Secondary , Solutions
20.
Brief Bioinform ; 20(6): 2066-2087, 2019 11 27.
Article in English | MEDLINE | ID: mdl-30102367

ABSTRACT

Drug-protein interactions (DPIs) underlie the desired therapeutic actions and the adverse side effects of a significant majority of drugs. Computational prediction of DPIs facilitates research in drug discovery, characterization and repurposing. Similarity-based methods that do not require knowledge of protein structures are particularly suitable for druggable genome-wide predictions of DPIs. We review 35 high-impact similarity-based predictors that were published in the past decade. We group them based on three types of similarities and their combinations that they use. We discuss and compare key aspects of these methods including source databases, internal databases and their predictive models. Using our novel benchmark database, we perform comparative empirical analysis of predictive performance of seven types of representative predictors that utilize each type of similarity individually and all possible combinations of similarities. We assess predictive quality at the database-wide DPI level and we are the first to also include evaluation over individual drugs. Our comprehensive analysis shows that predictors that use more similarity types outperform methods that employ fewer similarities, and that the model combining all three types of similarities secures area under the receiver operating characteristic curve of 0.93. We offer a comprehensive analysis of sensitivity of predictive performance to intrinsic and extrinsic characteristics of the considered predictors. We find that predictive performance is sensitive to low levels of similarities between sequences of the drug targets and several extrinsic properties of the input drug structures, drug profiles and drug targets. The benchmark database and a webserver for the seven predictors are freely available at http://biomine.cs.vcu.edu/servers/CONNECTOR/.


Subject(s)
Pharmaceutical Preparations/metabolism , Proteins/metabolism , Proteome , Computational Biology/methods , Databases, Protein , Humans , Protein Binding
SELECTION OF CITATIONS
SEARCH DETAIL