Pesquisa | Secretaria de Estado da Saúde

1.

DNAgenie: accurate prediction of DNA-type-specific binding residues in protein sequences.

Zhang, Jian; Ghadermarzi, Sina; Katuwawala, Akila; Kurgan, Lukasz.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34415020

RESUMO

Efforts to elucidate protein-DNA interactions at the molecular level rely in part on accurate predictions of DNA-binding residues in protein sequences. While there are over a dozen computational predictors of the DNA-binding residues, they are DNA-type agnostic and significantly cross-predict residues that interact with other ligands as DNA binding. We leverage a custom-designed machine learning architecture to introduce DNAgenie, first-of-its-kind predictor of residues that interact with A-DNA, B-DNA and single-stranded DNA. DNAgenie uses a comprehensive physiochemical profile extracted from an input protein sequence and implements a two-step refinement process to provide accurate predictions and to minimize the cross-predictions. Comparative tests on an independent test dataset demonstrate that DNAgenie outperforms the current methods that we adapt to predict residue-level interactions with the three DNA types. Further analysis finds that the use of the second (refinement) step leads to a substantial reduction in the cross predictions. Empirical tests show that DNAgenie's outputs that are converted to coarse-grained protein-level predictions compare favorably against recent tools that predict which DNA-binding proteins interact with double-stranded versus single-stranded DNAs. Moreover, predictions from the sequences of the whole human proteome reveal that the results produced by DNAgenie substantially overlap with the known DNA-binding proteins while also including promising leads for several hundred previously unknown putative DNA binders. These results suggest that DNAgenie is a valuable tool for the sequence-based characterization of protein functions. The DNAgenie's webserver is available at http://biomine.cs.vcu.edu/servers/DNAgenie/.

Assuntos

Sequência de Bases , Sítios de Ligação , Biologia Computacional/métodos , Proteínas de Ligação a DNA/metabolismo , DNA/química , Software , Sequência de Aminoácidos , DNA/genética , Proteínas de Ligação a DNA/química , Bases de Dados Genéticas , Aprendizado de Máquina , Modelos Moleculares , Ligação Proteica , Reprodutibilidade dos Testes , Relação Estrutura-Atividade , Navegador

2.

XRRpred: accurate predictor of crystal structure quality from protein sequence.

Ghadermarzi, Sina; Krawczyk, Bartosz; Song, Jiangning; Kurgan, Lukasz.

Bioinformatics ; 37(23): 4366-4374, 2021 12 07.

Artigo em Inglês | MEDLINE | ID: mdl-34247234

RESUMO

MOTIVATION: X-ray crystallography was used to produce nearly 90% of protein structures. These efforts were supported by numerous sequence-based tools that accurately predict crystallizable proteins. However, protein structures vary widely in their quality, typically measured with resolution and R-free. This impacts the ability to use these structures for some applications including rational drug design and molecular docking and motivates development of methods that accurately predict structure quality from sequence. RESULTS: We introduce XRRpred, the first predictor of the resolution and R-free values from protein sequences. XRRpred relies on original sequence profiles, hand-crafted features, empirically selected and parametrized regressors and modern resampling techniques. Using an independent test dataset, we show that XRRpred provides accurate predictions of resolution and R-free. We demonstrate that XRRpred's predictions correctly model relationship between the resolution and R-free and reproduce structure quality relations between structural classes of proteins. We also show that XRRpred significantly outperforms indirect alternative ways to predict the structure quality that include predictors of crystallization propensity and an alignment-based approach. XRRpred is available as a convenient webserver that allows batch predictions and offers informative visualization of the results. AVAILABILITY AND IMPLEMENTATION: http://biomine.cs.vcu.edu/servers/XRRPred/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Proteínas , Simulação de Acoplamento Molecular , Proteínas/química , Sequência de Aminoácidos , Cristalografia por Raios X , Cristalização

3.

Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins.

Zhang, Jian; Ghadermarzi, Sina; Kurgan, Lukasz.

Bioinformatics ; 36(18): 4729-4738, 2020 09 15.

Artigo em Inglês | MEDLINE | ID: mdl-32860044

RESUMO

MOTIVATION: There are over 30 sequence-based predictors of the protein-binding residues (PBRs). They use either structure-annotated or disorder-annotated training datasets, potentially creating a dichotomy where the structure-/disorder-specific models may not be able to cross-over to accurately predict the other type. Moreover, the structure-trained predictors were shown to substantially cross-predict PBRs among residues that interact with non-protein partners (nucleic acids and small ligands). We address these issues by performing first-of-its-kind comparative study of a representative collection of disorder- and structure-trained predictors using a comprehensive benchmark set with the structure- and disorder-derived annotations of PBRs (to analyze the cross-over) and the protein-, nucleic acid- and small ligand-binding proteins (to study the cross-predictions). RESULTS: Three predictors provide accurate results: SCRIBER, ANCHOR and disoRDPbind. Some of the structure-trained methods make accurate predictions on the structure-annotated proteins. Similarly, the disorder-trained predictors predict well on the disorder-annotated proteins. However, the considered predictors generally fail to cross-over, with the exception of SCRIBER. Our study also reveals that virtually all methods substantially cross-predict PBRs, except for SCRIBER for the structure-annotated proteins and disoRDPbind for the disorder-annotated proteins. We formulate a novel hybrid predictor, hybridPBRpred, that combines results produced by disoRDPbind and SCRIBER to accurately predict disorder- and structure-annotated PBRs. HybridPBRpred generates accurate results that cross-over structure- and disorder-annotated proteins and produces relatively low amount of cross-predictions, offering an accurate alternative to predict PBRs. AVAILABILITY AND IMPLEMENTATION: HybridPBRpred webserver, benchmark dataset and supplementary information are available at http://biomine.cs.vcu.edu/servers/hybridPBRpred/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional , Proteínas , Benchmarking , Bases de Dados de Proteínas , Ligação Proteica

4.

Comparative evaluation of AlphaFold2 and disorder predictors for prediction of intrinsic disorder, disorder content and fully disordered proteins.

Zhao, Bi; Ghadermarzi, Sina; Kurgan, Lukasz.

Comput Struct Biotechnol J ; 21: 3248-3258, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38213902

RESUMO

We expand studies of AlphaFold2 (AF2) in the context of intrinsic disorder prediction by comparing it against a broad selection of 20 accurate, popular and recently released disorder predictors. We use 25% larger benchmark dataset with 646 proteins and cover protein-level predictions of disorder content and fully disordered proteins. AF2-based disorder predictions secure a relatively high Area Under receiver operating characteristic Curve (AUC) of 0.77 and are statistically outperformed by several modern disorder predictors that secure AUCs around 0.8 with median runtime of about 20 s compared to 1200 s for AF2. Moreover, AF2 provides modestly accurate predictions of fully disordered proteins (F1 = 0.59 vs. 0.91 for the best disorder predictor) and disorder content (mean absolute error of 0.21 vs. 0.15). AF2 also generates statistically more accurate disorder predictions for about 20% of proteins that have relatively short sequences and a few disordered regions that tend to be located at the sequence termini, and which are absent of disordered protein-binding regions. Interestingly, AF2 and the most accurate disorder predictors rely on deep neural networks, suggesting that these models are useful for protein structure and disorder predictions.

5.

Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins.

Kurgan, Lukasz; Hu, Gang; Wang, Kui; Ghadermarzi, Sina; Zhao, Bi; Malhis, Nawar; Erdos, Gábor; Gsponer, Jörg; Uversky, Vladimir N; Dosztányi, Zsuzsanna.

Nat Protoc ; 18(11): 3157-3172, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37740110

RESUMO

Intrinsic disorder is instrumental for a wide range of protein functions, and its analysis, using computational predictions from primary structures, complements secondary and tertiary structure-based approaches. In this Tutorial, we provide an overview and comparison of 23 publicly available computational tools with complementary parameters useful for intrinsic disorder prediction, partly relying on results from the Critical Assessment of protein Intrinsic Disorder prediction experiment. We consider factors such as accuracy, runtime, availability and the need for functional insights. The selected tools are available as web servers and downloadable programs, offer state-of-the-art predictions and can be used in a high-throughput manner. We provide examples and instructions for the selected tools to illustrate practical aspects related to the submission, collection and interpretation of predictions, as well as the timing and their limitations. We highlight two predictors for intrinsically disordered proteins, flDPnn as accurate and fast and IUPred as very fast and moderately accurate, while suggesting ANCHOR2 and MoRFchibi as two of the best-performing predictors for intrinsically disordered region binding. We link these tools to additional resources, including databases of predictions and web servers that integrate multiple predictive methods. Altogether, this Tutorial provides a hands-on guide to comparatively evaluating multiple predictors, submitting and collecting their own predictions, and reading and interpreting results. It is suitable for experimentalists and computational biologists interested in accurately and conveniently identifying intrinsic disorder, facilitating the functional characterization of the rapidly growing collections of protein sequences.

Assuntos

Biologia Computacional , Proteínas Intrinsicamente Desordenadas , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas Intrinsicamente Desordenadas/química , Sequência de Aminoácidos

6.

QUARTERplus: Accurate disorder predictions integrated with interpretable residue-level quality assessment scores.

Katuwawala, Akila; Ghadermarzi, Sina; Hu, Gang; Wu, Zhonghua; Kurgan, Lukasz.

Comput Struct Biotechnol J ; 19: 2597-2606, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34025946

RESUMO

A recent advance in the disorder prediction field is the development of the quality assessment (QA) scores. QA scores complement the propensities produced by the disorder predictors by identifying regions where these predictions are more likely to be correct. We develop, empirically test and release a new QA tool, QUARTERplus, that addresses several key drawbacks of the current QA method, QUARTER. QUARTERplus is the first solution that utilizes QA scores and the associated input disorder predictions to produce very accurate disorder predictions with the help of a modern deep learning meta-model. The deep neural network utilizes the QA scores to identify and fix the regions where the original/input disorder predictions are poor. More importantly, the accurate QUATERplus's predictions are accompanied by easy to interpret residue-level QA scores that reliably quantify their residue-level predictive quality. We provide these interpretable QA scores for QUARTERplus and 10 other popular disorder predictors. Empirical tests on a large and independent (low similarity) test dataset show that QUARTERplus predictions secure AUC = 0.93 and are statistically more accurate than the results of twelve state-of-the-art disorder predictors. We also demonstrate that the new QA scores produced by QUARTERplus are highly correlated with the actual predictive quality and that they can be effectively used to identify regions of correct disorder predictions. This feature empowers the users to easily identify which parts of the predictions generated by the modern disorder predictors are more trustworthy. QUARTERplus is available as a convenient webserver at http://biomine.cs.vcu.edu/servers/QUARTERplus/.

7.

flDPnn: Accurate intrinsic disorder prediction with putative propensities of disorder functions.

Hu, Gang; Katuwawala, Akila; Wang, Kui; Wu, Zhonghua; Ghadermarzi, Sina; Gao, Jianzhao; Kurgan, Lukasz.

Nat Commun ; 12(1): 4438, 2021 07 21.

Artigo em Inglês | MEDLINE | ID: mdl-34290238

RESUMO

Identification of intrinsic disorder in proteins relies in large part on computational predictors, which demands that their accuracy should be high. Since intrinsic disorder carries out a broad range of cellular functions, it is desirable to couple the disorder and disorder function predictions. We report a computational tool, flDPnn, that provides accurate, fast and comprehensive disorder and disorder function predictions from protein sequences. The recent Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment and results on other test datasets demonstrate that flDPnn offers accurate predictions of disorder, fully disordered proteins and four common disorder functions. These predictions are substantially better than the results of the existing disorder predictors and methods that predict functions of disorder. Ablation tests reveal that the high predictive performance stems from innovative ways used in flDPnn to derive sequence profiles and encode inputs. flDPnn's webserver is available at http://biomine.cs.vcu.edu/servers/flDPnn/.

Assuntos

Biologia Computacional/métodos , Proteínas Intrinsicamente Desordenadas/química , Proteínas Intrinsicamente Desordenadas/metabolismo , Aprendizado de Máquina , Ligação Proteica , Análise de Sequência de Proteína

8.

Disordered Function Conjunction: On the in-silico function annotation of intrinsically disordered regions.

Ghadermarzi, Sina; Katuwawala, Akila; Oldfield, Christopher J; Barik, Amita; Kurgan, Lukasz.

Pac Symp Biocomput ; 25: 171-182, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-31797595

RESUMO

Intrinsically disorder regions (IDRs) lack a stable structure, yet perform biological functions. The functions of IDRs include mediating interactions with other molecules, including proteins, DNA, or RNA and entropic functions, including domain linkers. Computational predictors provide residue-level indications of function for disordered proteins, which contrasts with the need to functionally annotate the thousands of experimentally and computationally discovered IDRs. In this work, we investigate the feasibility of using residue-level prediction methods for region-level function predictions. For an initial examination of the multiple function region-level prediction problem, we constructed a dataset of (likely) single function IDRs in proteins that are dissimilar to the training datasets of the residue-level function predictors. We find that available residue-level prediction methods are only modestly useful in predicting multiple region-level functions. Classification is enhanced by simultaneous use of multiple residue-level function predictions and is further improved by inclusion of amino acids content extracted from the protein sequence. We conclude that multifunction prediction for IDRs is feasible and benefits from the results produced by current residue-level function predictors, however, it has to accommodate inaccuracy in functional annotations.

Assuntos

Proteínas Intrinsicamente Desordenadas , Sequência de Aminoácidos , Biologia Computacional , Simulação por Computador , DNA , Humanos , Proteínas Intrinsicamente Desordenadas/genética

9.

Computational prediction of functions of intrinsically disordered regions.

Katuwawala, Akila; Ghadermarzi, Sina; Kurgan, Lukasz.

Prog Mol Biol Transl Sci ; 166: 341-369, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31521235

RESUMO

Intrinsically disorder regions (IDRs) are abundant in nature, particularly among Eukaryotes. While they facilitate a wide spectrum of cellular functions including signaling, molecular assembly and recognition, translation, transcription and regulation, only several hundred IDRs are annotated functionally. This annotation gap motivates the development of fast and accurate computational methods that predict IDR functions directly from protein sequences. We introduce and describe a comprehensive collection of 25 methods that provide accurate predictions of IDRs that interact with proteins and nucleic acids, that function as flexible linkers and that moonlight multiple functions. Virtually all of these predictors can be accessed online and many were developed in the last few years. They utilize a wide range of predictive architectures and take advantage of modern machine learning algorithms. Our empirical analysis shows that predictors that are available as webservers enjoy high rates of citations, attesting to their practical value and popularity. The most cited methods include DISOPRED3, ANCHOR, alpha-MoRFpred, MoRFpred, fMoRFpred and MoRFCHiBi. We present two case studies to demonstrate that predictions produced by these computational tools are relatively easy to interpret and that they deliver valuable functional clues. However, the current computational tools cover a relatively narrow range of disorder functions. Further development efforts that would cover a broader range of functions should be pursued. We demonstrate that a sufficient amount of functionally annotated IDRs that are associated with several other disorder functions is already available and can be used to design and validate novel predictors.

Assuntos

Biologia Computacional , Proteínas Intrinsicamente Desordenadas/química , Bases de Dados de Proteínas , Humanos , Anotação de Sequência Molecular

10.

Sequence-Derived Markers of Drug Targets and Potentially Druggable Human Proteins.

Ghadermarzi, Sina; Li, Xingyi; Li, Min; Kurgan, Lukasz.

Front Genet ; 10: 1075, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31803227

RESUMO

Recent research shows that majority of the druggable human proteome is yet to be annotated and explored. Accurate identification of these unexplored druggable proteins would facilitate development, screening, repurposing, and repositioning of drugs, as well as prediction of new drug-protein interactions. We contrast the current drug targets against the datasets of non-druggable and possibly druggable proteins to formulate markers that could be used to identify druggable proteins. We focus on the markers that can be extracted from protein sequences or names/identifiers to ensure that they can be applied across the entire human proteome. These markers quantify key features covered in the past works (topological features of PPIs, cellular functions, and subcellular locations) and several novel factors (intrinsic disorder, residue-level conservation, alternative splicing isoforms, domains, and sequence-derived solvent accessibility). We find that the possibly druggable proteins have significantly higher abundance of alternative splicing isoforms, relatively large number of domains, higher degree of centrality in the protein-protein interaction networks, and lower numbers of conserved and surface residues, when compared with the non-druggable proteins. We show that the current drug targets and possibly druggable proteins share involvement in the catalytic and signaling functions. However, unlike the drug targets, the possibly druggable proteins participate in the metabolic and biosynthesis processes, are enriched in the intrinsic disorder, interact with proteins and nucleic acids, and are localized across the cell. To sum up, we formulate several markers that can help with finding novel druggable human proteins and provide interesting insights into the cellular functions and subcellular locations of the current drug targets and potentially druggable proteins.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa