Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 620
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 40(6)2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38781500

RESUMO

MOTIVATION: Today, the prediction of structures of large protein complexes solely from their sequence information requires prior knowledge of the stoichiometry of the complex. To address this challenge, we have enhanced the Monte Carlo Tree Search algorithms in MoLPC to enable the assembly of protein complexes while simultaneously predicting their stoichiometry. RESULTS: In MoLPC2, we have improved the predictions by allowing sampling alternative AlphaFold predictions. Using MoLPC2, we accurately predicted the structures of 50 out of 175 nonredundant protein complexes (TM-score ≥ 0.8) without knowing the stoichiometry. MoLPC2 provides new opportunities for predicting protein complex structures without stoichiometry information. AVAILABILITY AND IMPLEMENTATION: MoLPC2 is freely available at https://github.com/hychim/molpc2. A notebook is also available from the repository for easy use.


Assuntos
Algoritmos , Método de Monte Carlo , Proteínas , Software , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Conformação Proteica , Dobramento de Proteína , Bases de Dados de Proteínas
2.
Proteins ; 92(8): 975-983, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38618860

RESUMO

Pore-forming toxins (PFTs) are proteins that form lesions in biological membranes. Better understanding of the structure and function of these proteins will be beneficial in a number of biotechnological applications, including the development of new pest control methods in agriculture. When searching for new pore formers, existing sequence homology-based methods fail to discover truly novel proteins with low sequence identity to known proteins. Search methodologies based on protein structures would help us move beyond this limitation. As the number of known structures for PFTs is very limited, it's quite challenging to identify new proteins having similar structures using computational approaches like deep learning. In this article, we therefore propose a sample-efficient graphical model, where a protein structure graph is first constructed according to consensus secondary structures. A semi-Markov conditional random fields model is then developed to perform protein sequence segmentation. We demonstrate that our method is able to distinguish structurally similar proteins even in the absence of sequence similarity (pairwise sequence identity < 0.4)-a feat not achievable by traditional approaches like HMMs. To extract proteins of interest from a genome-wide protein database for further study, we also develop an efficient framework for UniRef50 with 43 million proteins.


Assuntos
Bases de Dados de Proteínas , Proteínas Citotóxicas Formadoras de Poros , Proteínas Citotóxicas Formadoras de Poros/química , Proteínas Citotóxicas Formadoras de Poros/metabolismo , Biologia Computacional/métodos , Modelos Moleculares , Algoritmos , Cadeias de Markov , Sequência de Aminoácidos , Estrutura Secundária de Proteína , Aprendizado Profundo
3.
Biomolecules ; 14(3)2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38540707

RESUMO

Disordered linkers (DLs) are intrinsically disordered regions that facilitate movement between adjacent functional regions/domains, contributing to many key cellular functions. The recently completed second Critical Assessments of protein Intrinsic Disorder prediction (CAID2) experiment evaluated DL predictions by considering a rather narrow scenario when predicting 40 proteins that are already known to have DLs. We expand this evaluation by using a much larger set of nearly 350 test proteins from CAID2 and by investigating three distinct scenarios: (1) prediction residues in DLs vs. in non-DL regions (typical use of DL predictors); (2) prediction of residues in DLs vs. other disordered residues (to evaluate whether predictors can differentiate residues in DLs from other types of intrinsically disordered residues); and (3) prediction of proteins harboring DLs. We find that several methods provide relatively accurate predictions of DLs in the first scenario. However, only one method, APOD, accurately identifies DLs among other types of disordered residues (scenario 2) and predicts proteins harboring DLs (scenario 3). We also find that APOD's predictive performance is modest, motivating further research into the development of new and more accurate DL predictors. We note that these efforts will benefit from a growing amount of training data and the availability of sophisticated deep network models and emphasize that future methods should provide accurate results across the three scenarios.


Assuntos
Biologia Computacional , Proteínas Intrinsicamente Desordenadas , Biologia Computacional/métodos , Proteínas/química , Proteínas Intrinsicamente Desordenadas/química , Bases de Dados de Proteínas
4.
Proteins ; 91(12): 1925-1934, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37621223

RESUMO

Protein intrinsic disorder (ID) is a complex and context-dependent phenomenon that covers a continuum between fully disordered states and folded states with long dynamic regions. The lack of a ground truth that fits all ID flavors and the potential for order-to-disorder transitions depending on specific conditions makes ID prediction challenging. The CAID2 challenge aimed to evaluate the performance of different prediction methods across different benchmarks, leveraging the annotation provided by the DisProt database, which stores the coordinates of ID regions when there is experimental evidence in the literature. The CAID2 challenge demonstrated varying performance of different prediction methods across different benchmarks, highlighting the need for continued development of more versatile and efficient prediction software. Depending on the application, researchers may need to balance performance with execution time when selecting a predictor. Methods based on AlphaFold2 seem to be good ID predictors but they are better at detecting absence of order rather than ID regions as defined in DisProt. The CAID2 predictors can be freely used through the CAID Prediction Portal, and CAID has been integrated into OpenEBench, which will become the official platform for running future CAID challenges.


Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas , Software , Bases de Dados de Proteínas
5.
J Proteome Res ; 22(10): 3123-3134, 2023 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-36809008

RESUMO

Protein database search engines are an integral component of mass spectrometry-based peptidomic analyses. Given the unique computational challenges of peptidomics, many factors must be taken into consideration when optimizing search engine selection, as each platform has different algorithms by which tandem mass spectra are scored for subsequent peptide identifications. In this study, four different database search engines, PEAKS, MS-GF+, OMSSA, and X! Tandem, were compared with Aplysia californica and Rattus norvegicus peptidomics data sets, and various metrics were assessed such as the number of unique peptide and neuropeptide identifications, and peptide length distributions. Given the tested conditions, PEAKS was found to have the highest number of peptide and neuropeptide identifications out of the four search engines in both data sets. Furthermore, principal component analysis and multivariate logistic regression were employed to determine whether specific spectral features contribute to false C-terminal amidation assignments by each search engine. From this analysis, it was found that the primary features influencing incorrect peptide assignments were the precursor and fragment ion m/z errors. Finally, an assessment employing a mixed species protein database was performed to evaluate search engine precision and sensitivity when searched against an enlarged search space containing human proteins.


Assuntos
Neuropeptídeos , Ferramenta de Busca , Humanos , Animais , Ratos , Peptídeos , Algoritmos , Espectrometria de Massas em Tandem , Bases de Dados de Proteínas , Software
6.
Nat Struct Mol Biol ; 29(11): 1056-1067, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36344848

RESUMO

Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.


Assuntos
Biologia Computacional , Furilfuramida , Biologia Computacional/métodos , Sítios de Ligação , Proteínas/química , Bases de Dados de Proteínas , Conformação Proteica
7.
Protein Sci ; 31(11): e4465, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36208126

RESUMO

Automated domain annotation is an important tool for structural informatics. These pipelines typically involve searching query sequences against hidden Markov model (HMM) profiles, yielding matches to profiles for various domains. However, domain annotation can be ambiguous or inaccurate when proteins contain domains with non-contiguous residue ranges, and especially when insertional domains are hosted within them. Here, we present DomainMapper, an algorithm that accurately assigns a unique domain structure annotation to a query sequence, including those with complex topologies. We validate our domain assignments using the AlphaFold database and confirm that non-contiguity is pervasive (10.74% of all domains in yeast and 4.52% in human). Using this resource, we find that certain folds have strong propensities to be non-contiguous or insertional across the Tree of Life. DomainMapper is freely available and can be ran as a single command-line function.


Assuntos
Algoritmos , Proteínas , Humanos , Estrutura Terciária de Proteína , Proteínas/química , Cadeias de Markov , Bases de Dados de Proteínas
8.
Bioinformatics ; 38(7): 2062-2063, 2022 03 28.
Artigo em Inglês | MEDLINE | ID: mdl-35104317

RESUMO

SUMMARY: Comparisons of protein structures are critical for developing novel protein designs, annotating protein functions and predicting protein structure. The template modeling score (TM-score) is a widely used but computationally expensive measure of protein similarity that is applicable to a wide variety of structural biology problems. We introduce TMQuery-a continuously updated database containing over eight billion pre-computed TM-score values for every pair of proteins in the Protein Data Bank, allowing researchers to quickly query and download TM-scores via a web interface. AVAILABILITY AND IMPLEMENTATION: Publicly available at https://tmquery.gsk.com/.


Assuntos
Proteínas , Software , Conformação Proteica , Proteínas/química , Bases de Dados de Proteínas
9.
Structure ; 30(2): 252-262.e4, 2022 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-35026162

RESUMO

More than 70% of the experimentally determined macromolecular structures in the Protein Data Bank (PDB) contain small-molecule ligands. Quality indicators of ∼643,000 ligands present in ∼106,000 PDB X-ray crystal structures have been analyzed. Ligand quality varies greatly with regard to goodness of fit between ligand structure and experimental data, deviations in bond lengths and angles from known chemical structures, and inappropriate interatomic clashes between the ligand and its surroundings. Based on principal component analysis, correlated quality indicators of ligand structure have been aggregated into two largely orthogonal composite indicators measuring goodness of fit to experimental data and deviation from ideal chemical structure. Ranking of the composite quality indicators across the PDB archive enabled construction of uniformly distributed composite ranking score. This score is implemented at RCSB.org to compare chemically identical ligands in distinct PDB structures with easy-to-interpret two-dimensional ligand quality plots, allowing PDB users to quickly assess ligand structure quality and select the best exemplars.


Assuntos
Proteínas/química , Proteínas/metabolismo , Bibliotecas de Moléculas Pequenas/farmacologia , Bases de Dados de Proteínas , Ligantes , Modelos Moleculares , Conformação Proteica
10.
Nucleic Acids Res ; 50(D1): D11-D19, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34850134

RESUMO

The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI's core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI's online training offering.


Assuntos
Biologia Computacional/educação , Biologia Computacional/métodos , Bases de Dados Factuais , Academias e Institutos , Inteligência Artificial , COVID-19 , Bases de Dados Factuais/economia , Bases de Dados Factuais/estatística & dados numéricos , Bases de Dados de Produtos Farmacêuticos , Bases de Dados de Proteínas , Europa (Continente) , Genoma Humano , Humanos , Armazenamento e Recuperação da Informação , RNA não Traduzido/genética , SARS-CoV-2/genética
11.
Proteins ; 90(3): 720-731, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-34716620

RESUMO

Predicting the quaternary structure of protein complex is an important problem. Inter-chain residue-residue contact prediction can provide useful information to guide the ab initio reconstruction of quaternary structures. However, few methods have been developed to build quaternary structures from predicted inter-chain contacts. Here, we develop the first method based on gradient descent optimization (GD) to build quaternary structures of protein dimers utilizing inter-chain contacts as distance restraints. We evaluate GD on several datasets of homodimers and heterodimers using true/predicted contacts and monomer structures as input. GD consistently performs better than both simulated annealing and Markov Chain Monte Carlo simulation. Starting from an arbitrarily quaternary structure randomly initialized from the tertiary structures of protein chains and using true inter-chain contacts as input, GD can reconstruct high-quality structural models for homodimers and heterodimers with average TM-score ranging from 0.92 to 0.99 and average interface root mean square distance from 0.72 Å to 1.64 Å. On a dataset of 115 homodimers, using predicted inter-chain contacts as restraints, the average TM-score of the structural models built by GD is 0.76. For 46% of the homodimers, high-quality structural models with TM-score ≥ 0.9 are reconstructed from predicted contacts. There is a strong correlation between the quality of the reconstructed models and the precision and recall of predicted contacts. Only a moderate precision or recall of inter-chain contact prediction is needed to build good structural models for most homodimers. Moreover, GD improves the quality of quaternary structures predicted by AlphaFold2 on a Critical Assessment of Techniques for Protein Structure Prediction-Critical Assessments of Predictions of Interactions dataset.


Assuntos
Proteínas/química , Biologia Computacional , Bases de Dados de Proteínas , Simulação de Acoplamento Molecular , Método de Monte Carlo , Ligação Proteica , Multimerização Proteica , Estrutura Quaternária de Proteína
12.
Proteins ; 89(12): 1787-1799, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34337786

RESUMO

In CASP14, 39 research groups submitted more than 2500 3D models on 22 protein complexes. In general, the community performed well in predicting the fold of the assemblies (for 80% of the targets), although it faced significant challenges in reproducing the native contacts. This is especially the case for the complexes without whole-assembly templates. The leading predictor, BAKER-experimental, used a methodology combining classical techniques (template-based modeling, protein docking) with deep learning-based contact predictions and a fold-and-dock approach. The Venclovas team achieved the runner-up position with template-based modeling and docking. By analyzing the target interfaces, we showed that the complexes with depleted charged contacts or dominating hydrophobic interactions were the most challenging ones to predict. We also demonstrated that if AlphaFold2 predictions were at hand, the interface prediction challenge could be alleviated for most of the targets. All in all, it is evident that new approaches are needed for the accurate prediction of assemblies, which undoubtedly will expand on the significant improvements in the tertiary structure prediction field.


Assuntos
Modelos Moleculares , Conformação Proteica , Proteínas , Software , Biologia Computacional , Bases de Dados de Proteínas , Estrutura Quaternária de Proteína , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína
13.
Mol Biol Evol ; 38(12): 5806-5818, 2021 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-34459919

RESUMO

Sequence annotation is fundamental for studying the evolution of protein families, particularly when working with nonmodel species. Given the rapid, ever-increasing number of species receiving high-quality genome sequencing, accurate domain modeling that is representative of species diversity is crucial for understanding protein family sequence evolution and their inferred function(s). Here, we describe a bioinformatic tool called Taxon-Informed Adjustment of Markov Model Attributes (TIAMMAt) which revises domain profile hidden Markov models (HMMs) by incorporating homologous domain sequences from underrepresented and nonmodel species. Using innate immunity pathways as a case study, we show that revising profile HMM parameters to directly account for variation in homologs among underrepresented species provides valuable insight into the evolution of protein families. Following adjustment by TIAMMAt, domain profile HMMs exhibit changes in their per-site amino acid state emission probabilities and insertion/deletion probabilities while maintaining the overall structure of the consensus sequence. Our results show that domain revision can heavily impact evolutionary interpretations for some families (i.e., NLR's NACHT domain), whereas impact on other domains (e.g., rel homology domain and interferon regulatory factor domains) is minimal due to high levels of sequence conservation across the sampled phylogenetic depth (i.e., Metazoa). Importantly, TIAMMAt revises target domain models to reflect homologous sequence variation using the taxonomic distribution under consideration by the user. TIAMMAt's flexibility to revise any subset of the Pfam database using a user-defined taxonomic pool will make it a valuable tool for future protein evolution studies, particularly when incorporating (or focusing) on nonmodel species.


Assuntos
Biodiversidade , Imunidade Inata , Bases de Dados de Proteínas , Imunidade Inata/genética , Cadeias de Markov , Filogenia , Domínios Proteicos
14.
Nat Commun ; 12(1): 5011, 2021 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-34408149

RESUMO

Sequence-based contact prediction has shown considerable promise in assisting non-homologous structure modeling, but it often requires many homologous sequences and a sufficient number of correct contacts to achieve correct folds. Here, we developed a method, C-QUARK, that integrates multiple deep-learning and coevolution-based contact-maps to guide the replica-exchange Monte Carlo fragment assembly simulations. The method was tested on 247 non-redundant proteins, where C-QUARK could fold 75% of the cases with TM-scores (template-modeling scores) ≥0.5, which was 2.6 times more than that achieved by QUARK. For the 59 cases that had either low contact accuracy or few homologous sequences, C-QUARK correctly folded 6 times more proteins than other contact-based folding methods. C-QUARK was also tested on 64 free-modeling targets from the 13th CASP (critical assessment of protein structure prediction) experiment and had an average GDT_TS (global distance test) score that was 5% higher than the best CASP predictors. These data demonstrate, in a robust manner, the progress in modeling non-homologous protein structures using low-accuracy and sparse contact-map predictions.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Bases de Dados de Proteínas , Modelos Moleculares , Método de Monte Carlo , Conformação Proteica , Dobramento de Proteína , Proteínas/genética , Software
15.
Comput Math Methods Med ; 2021: 5770981, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34413898

RESUMO

Antioxidant proteins (AOPs) play important roles in the management and prevention of several human diseases due to their ability to neutralize excess free radicals. However, the identification of AOPs by using wet-lab experimental techniques is often time-consuming and expensive. In this study, we proposed an accurate computational model, called AOP-HMM, to predict AOPs by extracting discriminatory evolutionary features from hidden Markov model (HMM) profiles. First, auto cross-covariance (ACC) variables were applied to transform the HMM profiles into fixed-length feature vectors. Then, we performed the analysis of variance (ANOVA) method to reduce the dimensionality of the raw feature space. Finally, a support vector machine (SVM) classifier was adopted to conduct the prediction of AOPs. To comprehensively evaluate the performance of the proposed AOP-HMM model, the 10-fold cross-validation (CV), the jackknife CV, and the independent test were carried out on two widely used benchmark datasets. The experimental results demonstrated that AOP-HMM outperformed most of the existing methods and could be used to quickly annotate AOPs and guide the experimental process.


Assuntos
Antioxidantes/química , Aprendizado de Máquina , Peroxirredoxinas/química , Proteínas/química , Algoritmos , Aminoácidos/análise , Antioxidantes/classificação , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Evolução Molecular , Humanos , Cadeias de Markov , Peroxirredoxinas/classificação , Proteínas/classificação
16.
Proteins ; 89(12): 1673-1686, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34240477

RESUMO

This report describes the tertiary structure prediction assessment of difficult modeling targets in the 14th round of the Critical Assessment of Structure Prediction (CASP14). We implemented an official ranking scheme that used the same scores as the previous CASP topology-based assessment, but combined these scores with one that emphasized physically realistic models. The top performing AlphaFold2 group outperformed the rest of the prediction community on all but two of the difficult targets considered in this assessment. They provided high quality models for most of the targets (86% over GDT_TS 70), including larger targets above 150 residues, and they correctly predicted the topology of almost all the rest. AlphaFold2 performance was followed by two manual Baker methods, a Feig method that refined Zhang-server models, two notable automated Zhang server methods (QUARK and Zhang-server), and a Zhang manual group. Despite the remarkable progress in protein structure prediction of difficult targets, both the prediction community and AlphaFold2, to a lesser extent, faced challenges with flexible regions and obligate oligomeric assemblies. The official ranking of top-performing methods was supported by performance generated PCA and heatmap clusters that gave insight into target difficulties and the most successful state-of-the-art structure prediction methodologies.


Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Software , Bases de Dados de Proteínas , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína
17.
Sci Rep ; 11(1): 12439, 2021 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-34127723

RESUMO

Coiled-coil regions were among the first protein motifs described structurally and theoretically. The simplicity of the motif promises that coiled-coil regions can be detected with reasonable accuracy and precision in any protein sequence. Here, we re-evaluated the most commonly used coiled-coil prediction tools with respect to the most comprehensive reference data set available, the entire Protein Data Bank, down to each amino acid and its secondary structure. Apart from the 30-fold difference in minimum and maximum number of coiled coils predicted the tools strongly vary in where they predict coiled-coil regions. Accordingly, there is a high number of false predictions and missed, true coiled-coil regions. The evaluation of the binary classification metrics in comparison with naïve coin-flip models and the calculation of the Matthews correlation coefficient, the most reliable performance metric for imbalanced data sets, suggests that the tested tools' performance is close to random. This implicates that the tools' predictions have only limited informative value. Coiled-coil predictions are often used to interpret biochemical data and are part of in-silico functional genome annotation. Our results indicate that these predictions should be treated very cautiously and need to be supported and validated by experimental evidence.


Assuntos
Motivos de Aminoácidos , Modelos Moleculares , Estrutura Secundária de Proteína , Sequência de Aminoácidos , Bases de Dados de Proteínas/estatística & dados numéricos , Software
18.
Proteins ; 89(9): 1167-1179, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-33957009

RESUMO

A comparison of protein backbones makes clear that not more than approximately 1400 different folds exist, each specifying the three-dimensional topology of a protein domain. Large proteins are composed of specific domain combinations and many domains can accommodate different functions. These findings confirm that the reuse of domains is key for the evolution of multi-domain proteins. If reuse was also the driving force for domain evolution, ancestral fragments of sub-domain size exist that are shared between domains possessing significantly different topologies. For the fully automated detection of putatively ancestral motifs, we developed the algorithm Fragstatt that compares proteins pairwise to identify fragments, that is, instantiations of the same motif. To reach maximal sensitivity, Fragstatt compares sequences by means of cascaded alignments of profile Hidden Markov Models. If the fragment sequences are sufficiently similar, the program determines and scores the structural concordance of the fragments. By analyzing a comprehensive set of proteins from the CATH database, Fragstatt identified 12 532 partially overlapping and structurally similar motifs that clustered to 134 unique motifs. The dissemination of these motifs is limited: We found only two domain topologies that contain two different motifs and generally, these motifs occur in not more than 18% of the CATH topologies. Interestingly, motifs are enriched in topologies that are considered ancestral. Thus, our findings suggest that the reuse of sub-domain sized fragments was relevant in early phases of protein evolution and became less important later on.


Assuntos
Algoritmos , Aminoácidos/química , Proteínas/química , Software , Motivos de Aminoácidos , Bases de Dados de Proteínas , Evolução Molecular , História do Século XXI , História Antiga , Cadeias de Markov , Modelos Moleculares , Origem da Vida , Conformação Proteica , Domínios Proteicos , Dobramento de Proteína , Proteínas/história
19.
Int J Mol Sci ; 22(8)2021 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-33921228

RESUMO

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) encodes the papain-like protease (PLpro). The protein not only plays an essential role in viral replication but also cleaves ubiquitin and ubiquitin-like interferon-stimulated gene 15 protein (ISG15) from host proteins, making it an important target for developing new antiviral drugs. In this study, we searched for novel, noncovalent potential PLpro inhibitors by employing a multistep in silico screening of a 15 million compound library. The selectivity of the best-scored compounds was evaluated by checking their binding affinity to the human ubiquitin carboxy-terminal hydrolase L1 (UCH-L1), which, as a deubiquitylating enzyme, exhibits structural and functional similarities to the PLpro. As a result, we identified 387 potential, selective PLpro inhibitors, from which we retrieved the 20 best compounds according to their IC50 values toward PLpro estimated by a multiple linear regression model. The selected candidates display potential activity against the protein with IC50 values in the nanomolar range from approximately 159 to 505 nM and mostly adopt a similar binding mode to the known, noncovalent SARS-CoV-2 PLpro inhibitors. We further propose the six most promising compounds for future in vitro evaluation. The results for the top potential PLpro inhibitors are deposited in the database prepared to facilitate research on anti-SARS-CoV-2 drugs.


Assuntos
Antivirais/química , Antivirais/metabolismo , Proteases Semelhantes à Papaína de Coronavírus/antagonistas & inibidores , Inibidores de Proteases/química , Inibidores de Proteases/metabolismo , SARS-CoV-2/enzimologia , Animais , Antivirais/toxicidade , Simulação por Computador , Cristalografia por Raios X , Bases de Dados de Compostos Químicos , Bases de Dados de Proteínas , Avaliação Pré-Clínica de Medicamentos , Humanos , Concentração Inibidora 50 , Dose Letal Mediana , Ligantes , Testes de Mutagenicidade , Inibidores de Proteases/toxicidade , Relação Quantitativa Estrutura-Atividade , Ratos , Ubiquitina Tiolesterase/química , Ubiquitina Tiolesterase/metabolismo
20.
Nat Methods ; 18(5): 472-481, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33875885

RESUMO

Intrinsically disordered proteins, defying the traditional protein structure-function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude.


Assuntos
Biologia Computacional , Proteínas Intrinsicamente Desordenadas/química , Sequência de Aminoácidos , Bases de Dados de Proteínas , Ligação Proteica , Conformação Proteica , Dobramento de Proteína , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA