Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 99
Filtrar
1.
Proteins ; 91(12): 1539-1549, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37920879

RESUMO

Computing protein structure from amino acid sequence information has been a long-standing grand challenge. Critical assessment of structure prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every 2 years. The 2020 experiment (CASP14) saw major progress, with the second generation of deep learning methods delivering accuracy comparable with experiment for many single proteins. There is an expectation that these methods will have much wider application in computational structural biology. Here we summarize results from the most recent experiment, CASP15, in 2022, with an emphasis on new deep learning-driven progress. Other papers in this special issue of proteins provide more detailed analysis. For single protein structures, the AlphaFold2 deep learning method is still superior to other approaches, but there are two points of note. First, although AlphaFold2 was the core of all the most successful methods, there was a wide variety of implementation and combination with other methods. Second, using the standard AlphaFold2 protocol and default parameters only produces the highest quality result for about two thirds of the targets, and more extensive sampling is required for the others. The major advance in this CASP is the enormous increase in the accuracy of computed protein complexes, achieved by the use of deep learning methods, although overall these do not fully match the performance for single proteins. Here too, AlphaFold2 based method perform best, and again more extensive sampling than the defaults is often required. Also of note are the encouraging early results on the use of deep learning to compute ensembles of macromolecular structures. Critically for the usability of computed structures, for both single proteins and protein complexes, deep learning derived estimates of both local and global accuracy are of high quality, however the estimates in interface regions are slightly less reliable. CASP15 also included computation of RNA structures for the first time. Here, the classical approaches produced better agreement with experiment than the new deep learning ones, and accuracy is limited. Also, for the first time, CASP included the computation of protein-ligand complexes, an area of special interest for drug design. Here too, classical methods were still superior to deep learning ones. Many new approaches were discussed at the CASP conference, and it is clear methods will continue to advance.


Assuntos
Biologia Computacional , Proteínas , Conformação Proteica , Modelos Moleculares , Proteínas/química , Sequência de Aminoácidos , Biologia Computacional/métodos
2.
Proteins ; 91(12): 1903-1911, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37872703

RESUMO

For the first time, the 2022 CASP (Critical Assessment of Structure Prediction) community experiment included a section on computing multiple conformations for protein and RNA structures. There was full or partial success in reproducing the ensembles for four of the nine targets, an encouraging result. For protein structures, enhanced sampling with variations of the AlphaFold2 deep learning method was by far the most effective approach. One substantial conformational change caused by a single mutation across a complex interface was accurately reproduced. In two other assembly modeling cases, methods succeeded in sampling conformations near to the experimental ones even though environmental factors were not included in the calculations. An experimentally derived flexibility ensemble allowed a single accurate RNA structure model to be identified. Difficulties included how to handle sparse or low-resolution experimental data and the current lack of effective methods for modeling RNA/protein complexes. However, these and other obstacles appear addressable.


Assuntos
Proteínas , RNA , Conformação Proteica , Proteínas/química , Mutação
3.
Proc Natl Acad Sci U S A ; 120(28): e2221745120, 2023 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-37399411

RESUMO

The CASP14 experiment demonstrated the extraordinary structure modeling capabilities of artificial intelligence (AI) methods. That result has ignited a fierce debate about what these methods are actually doing. One of the criticisms has been that the AI does not have any sense of the underlying physics but is merely performing pattern recognition. Here, we address that issue by analyzing the extent to which the methods identify rare structural motifs. The rationale underlying the approach is that a pattern recognition machine tends to choose the more frequently occurring motifs, whereas some sense of subtle energetic factors is required to choose infrequently occurring ones. To reduce the possibility of bias from related experimental structures and to minimize the effect of experimental errors, we examined only CASP14 target protein crystal structures determined to a resolution limit better than 2 Å, which lacked significant amino acid sequence homology to proteins of known structure. In those experimental structures and in the corresponding models, we track cis peptides, π-helices, 310-helices, and other small 3D motifs that occur in the PDB database at a frequency of lower than 1% of total amino acid residues. The best-performing AI method, AlphaFold2, captured these uncommon structural elements exquisitely well. All discrepancies appeared to be a consequence of crystal environment effects. We propose that the neural network learned a protein structure potential of mean force, enabling it to correctly identify situations where unusual structural features represent the lowest local free energy because of subtle influences from the atomic environment.


Assuntos
Inteligência Artificial , Proteínas , Sequência de Aminoácidos , Proteínas/química , Estrutura Secundária de Proteína , Redes Neurais de Computação , Conformação Proteica
4.
Proteins ; 91(12): 1571-1599, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37493353

RESUMO

We present an in-depth analysis of selected CASP15 targets, focusing on their biological and functional significance. The authors of the structures identify and discuss key protein features and evaluate how effectively these aspects were captured in the submitted predictions. While the overall ability to predict three-dimensional protein structures continues to impress, reproducing uncommon features not previously observed in experimental structures is still a challenge. Furthermore, instances with conformational flexibility and large multimeric complexes highlight the need for novel scoring strategies to better emphasize biologically relevant structural regions. Looking ahead, closer integration of computational and experimental techniques will play a key role in determining the next challenges to be unraveled in the field of structural molecular biology.


Assuntos
Biologia Computacional , Proteínas , Conformação Proteica , Modelos Moleculares , Biologia Computacional/métodos , Proteínas/química
5.
Proteins ; 91(12): 1600-1615, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37466021

RESUMO

The first RNA category of the Critical Assessment of Techniques for Structure Prediction competition was only made possible because of the scientists who provided experimental structures to challenge the predictors. In this article, these scientists offer a unique and valuable analysis of both the successes and areas for improvement in the predicted models. All 10 RNA-only targets yielded predictions topologically similar to experimentally determined structures. For one target, experimentalists were able to phase their x-ray diffraction data by molecular replacement, showing a potential application of structure predictions for RNA structural biologists. Recommended areas for improvement include: enhancing the accuracy in local interaction predictions and increased consideration of the experimental conditions such as multimerization, structure determination method, and time along folding pathways. The prediction of RNA-protein complexes remains the most significant challenge. Finally, given the intrinsic flexibility of many RNAs, we propose the consideration of ensemble models.


Assuntos
Biologia Computacional , Proteínas , Conformação Proteica , Proteínas/química , Modelos Moleculares , Biologia Computacional/métodos , Difração de Raios X
6.
Proteins ; 91(12): 1550-1557, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37306011

RESUMO

Prediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.


Assuntos
Biologia Computacional , Proteínas , Conformação Proteica , Proteínas/química , Modelos Moleculares , Ligantes
7.
Comput Struct Biotechnol J ; 20: 4952-4968, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36147680

RESUMO

Antibodies are fundamental effectors of humoral immunity, and have become a highly successful class of therapeutics. There is increasing evidence that antibodies utilize transient homotypic interactions to enhance function, and elucidation of such interactions can provide insights into their biology and new opportunities for their optimization as drugs. Yet the transitory nature of weak interactions makes them difficult to investigate. Capitalizing on their rich structural data and high conservation, we have characterized all the ways that antibody fragment antigen-binding (Fab) regions interact crystallographically. This approach led to the discovery of previously unrealized interfaces between antibodies. While diverse interactions exist, ß-sheet dimers and variable-constant elbow dimers are recurrent motifs. Disulfide engineering enabled interactions to be trapped and investigated structurally and functionally, providing experimental validation of the interfaces and illustrating their potential for optimization. This work provides first insight into previously undiscovered oligomeric interactions between antibodies, and enables new opportunities for their biotherapeutic optimization.

8.
Nat Rev Chem ; 6(4): 287-295, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35783295

RESUMO

One aspirational goal of computational chemistry is to predict potent and drug-like binders for any protein, such that only those that bind are synthesized. In this Roadmap, we describe the launch of Critical Assessment of Computational Hit-finding Experiments (CACHE), a public benchmarking project to compare and improve small molecule hit-finding algorithms through cycles of prediction and experimental testing. Participants will predict small molecule binders for new and biologically relevant protein targets representing different prediction scenarios. Predicted compounds will be tested rigorously in an experimental hub, and all predicted binders as well as all experimental screening data, including the chemical structures of experimentally tested compounds, will be made publicly available, and not subject to any intellectual property restrictions. The ability of a range of computational approaches to find novel binders will be evaluated, compared, and openly published. CACHE will launch 3 new benchmarking exercises every year. The outcomes will be better prediction methods, new small molecule binders for target proteins of importance for fundamental biology or drug discovery, and a major technological step towards achieving the goal of Target 2035, a global initiative to identify pharmacological probes for all human proteins.

9.
Proteins ; 89(12): 1607-1617, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34533838

RESUMO

Critical assessment of structure prediction (CASP) is a community experiment to advance methods of computing three-dimensional protein structure from amino acid sequence. Core components are rigorous blind testing of methods and evaluation of the results by independent assessors. In the most recent experiment (CASP14), deep-learning methods from one research group consistently delivered computed structures rivaling the corresponding experimental ones in accuracy. In this sense, the results represent a solution to the classical protein-folding problem, at least for single proteins. The models have already been shown to be capable of providing solutions for problematic crystal structures, and there are broad implications for the rest of structural biology. Other research groups also substantially improved performance. Here, we describe these results and outline some of the many implications. Other related areas of CASP, including modeling of protein complexes, structure refinement, estimation of model accuracy, and prediction of inter-residue contacts and distances, are also described.


Assuntos
Conformação Proteica , Dobramento de Proteína , Proteínas , Software , Sequência de Aminoácidos , Biologia Computacional , Modelos Estatísticos , Simulação de Dinâmica Molecular , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína
10.
Proteins ; 89(12): 1647-1672, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34561912

RESUMO

The biological and functional significance of selected Critical Assessment of Techniques for Protein Structure Prediction 14 (CASP14) targets are described by the authors of the structures. The authors highlight the most relevant features of the target proteins and discuss how well these features were reproduced in the respective submitted predictions. The overall ability to predict three-dimensional structures of proteins has improved remarkably in CASP14, and many difficult targets were modeled with impressive accuracy. For the first time in the history of CASP, the experimentalists not only highlighted that computational models can accurately reproduce the most critical structural features observed in their targets, but also envisaged that models could serve as a guidance for further studies of biologically-relevant properties of proteins.


Assuntos
Modelos Moleculares , Conformação Proteica , Proteínas/química , Software , Sequência de Aminoácidos , Biologia Computacional , Microscopia Crioeletrônica , Cristalografia por Raios X , Análise de Sequência de Proteína
11.
Proteins ; 89(12): 1633-1646, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34449113

RESUMO

Critical assessment of structure prediction (CASP) conducts community experiments to determine the state of the art in computing protein structure from amino acid sequence. The process relies on the experimental community providing information about not yet public or about to be solved structures, for use as targets. For some targets, the experimental structure is not solved in time for use in CASP. Calculated structure accuracy improved dramatically in this round, implying that models should now be much more useful for resolving many sorts of experimental difficulties. To test this, selected models for seven unsolved targets were provided to the experimental groups. These models were from the AlphaFold2 group, who overall submitted the most accurate predictions in CASP14. Four targets were solved with the aid of the models, and, additionally, the structure of an already solved target was improved. An a posteriori analysis showed that, in some cases, models from other groups would also be effective. This paper provides accounts of the successful application of models to structure determination, including molecular replacement for X-ray crystallography, backbone tracing and sequence positioning in a cryo-electron microscopy structure, and correction of local features. The results suggest that, in future, there will be greatly increased synergy between computational and experimental approaches to structure determination.


Assuntos
Biologia Computacional/métodos , Microscopia Crioeletrônica , Cristalografia por Raios X , Modelos Moleculares , Proteínas/química , Conformação Proteica , Software
12.
Proteins ; 89(12): 1987-1996, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34462960

RESUMO

Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state of the art in computing protein structure from sequence. In the spring of 2020, CASP launched a community project to compute the structures of the most structurally challenging proteins coded for in the SARS-CoV-2 genome. Forty-seven research groups submitted over 3000 three-dimensional models and 700 sets of accuracy estimates on 10 proteins. The resulting models were released to the public. CASP community members also worked together to provide estimates of local and global accuracy and identify structure-based domain boundaries for some proteins. Subsequently, two of these structures (ORF3a and ORF8) have been solved experimentally, allowing assessment of both model quality and the accuracy estimates. Models from the AlphaFold2 group were found to have good agreement with the experimental structures, with main chain GDT_TS accuracy scores ranging from 63 (a correct topology) to 87 (competitive with experiment).


Assuntos
SARS-CoV-2/química , Proteínas Virais/química , COVID-19/virologia , Genoma Viral , Humanos , Modelos Moleculares , Conformação Proteica , Domínios Proteicos , SARS-CoV-2/genética , Proteínas Virais/genética , Proteínas Viroporinas/química , Proteínas Viroporinas/genética
13.
Bioinformatics ; 37(22): 4180-4186, 2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-34117883

RESUMO

MOTIVATION: Experimental findings on genetic disease mechanisms are scattered throughout the literature and represented in many ways, including unstructured text, cartoons, pathway diagrams and network graphs. Integration and structuring of such mechanistic information greatly enhances its utility. RESULTS: MecCog is a graphical framework for building integrated representations (mechanism schemas) of mechanisms by which a genetic variant causes a disease phenotype. A MecCog mechanism schema displays the propagation of system perturbations across stages of biological organization, using graphical notations to symbolize perturbed entities and activities, hyperlinked evidence tagging, a mechanism ontology and depiction of knowledge gaps, ambiguities and uncertainties. The web platform enables a user to construct, store, publish, browse, query and comment on schemas. MecCog facilitates the identification of potential biomarkers, therapeutic intervention sites and critical future experiments. AVAILABILITY AND IMPLEMENTATION: The MecCog framework is freely available at http://www.meccog.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Doenças Genéticas Inatas , Fenótipo , Biologia Computacional
14.
Hum Mutat ; 41(2): 347-362, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31680375

RESUMO

Precise identification of causative variants from whole-genome sequencing data, including both coding and noncoding variants, is challenging. The Critical Assessment of Genome Interpretation 5 SickKids clinical genome challenge provided an opportunity to assess our ability to extract such information. Participants in the challenge were required to match each of the 24 whole-genome sequences to the correct phenotypic profile and to identify the disease class of each genome. These are all rare disease cases that have resisted genetic diagnosis in a state-of-the-art pipeline. The patients have a range of eye, neurological, and connective-tissue disorders. We used a gene-centric approach to address this problem, assigning each gene a multiphenotype-matching score. Mutations in the top-scoring genes for each phenotype profile were ranked on a 6-point scale of pathogenicity probability, resulting in an approximately equal number of top-ranked coding and noncoding candidate variants overall. We were able to assign the correct disease class for 12 cases and the correct genome to a clinical profile for five cases. The challenge assessor found genes in three of these five cases as likely appropriate. In the postsubmission phase, after careful screening of the genes in the correct genome, we identified additional potential diagnostic variants, a high proportion of which are noncoding.


Assuntos
Estudos de Associação Genética/métodos , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Genoma Humano , Genômica/métodos , Doenças Raras , Algoritmos , Alelos , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Modelos Teóricos , Fenótipo , Sequenciamento Completo do Genoma , Fluxo de Trabalho
15.
Proteins ; 87(12): 1011-1020, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31589781

RESUMO

CASP (critical assessment of structure prediction) assesses the state of the art in modeling protein structure from amino acid sequence. The most recent experiment (CASP13 held in 2018) saw dramatic progress in structure modeling without use of structural templates (historically "ab initio" modeling). Progress was driven by the successful application of deep learning techniques to predict inter-residue distances. In turn, these results drove dramatic improvements in three-dimensional structure accuracy: With the proviso that there are an adequate number of sequences known for the protein family, the new methods essentially solve the long-standing problem of predicting the fold topology of monomeric proteins. Further, the number of sequences required in the alignment has fallen substantially. There is also substantial improvement in the accuracy of template-based models. Other areas-model refinement, accuracy estimation, and the structure of protein assemblies-have again yielded interesting results. CASP13 placed increased emphasis on the use of sparse data together with modeling and chemical crosslinking, SAXS, and NMR all yielded more mature results. This paper summarizes the key outcomes of CASP13. The special issue of PROTEINS contains papers describing the CASP13 assessments in each modeling category and contributions from the participants.


Assuntos
Sequência de Aminoácidos/genética , Biologia Computacional , Conformação Proteica , Proteínas/ultraestrutura , Humanos , Modelos Moleculares , Proteínas/química , Proteínas/genética , Espalhamento a Baixo Ângulo , Difração de Raios X
17.
Hum Mutat ; 40(9): 1530-1545, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31301157

RESUMO

Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.


Assuntos
Substituição de Aminoácidos , Biologia Computacional/métodos , Cistationina beta-Sintase/genética , Cistationina/metabolismo , Cistationina beta-Sintase/metabolismo , Homocisteína/metabolismo , Humanos , Fenótipo , Medicina de Precisão
18.
Hum Mutat ; 40(9): 1519-1529, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31342580

RESUMO

The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population-scale analysis of disease epidemiology and rare variant association analysis.


Assuntos
Acetilglucosaminidase/metabolismo , Biologia Computacional/métodos , Mutação de Sentido Incorreto , Acetilglucosaminidase/genética , Humanos , Modelos Genéticos , Análise de Regressão
19.
Hum Mutat ; 40(9): 1373-1391, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31322791

RESUMO

Whole-genome sequencing (WGS) holds great potential as a diagnostic test. However, the majority of patients currently undergoing WGS lack a molecular diagnosis, largely due to the vast number of undiscovered disease genes and our inability to assess the pathogenicity of most genomic variants. The CAGI SickKids challenges attempted to address this knowledge gap by assessing state-of-the-art methods for clinical phenotype prediction from genomes. CAGI4 and CAGI5 participants were provided with WGS data and clinical descriptions of 25 and 24 undiagnosed patients from the SickKids Genome Clinic Project, respectively. Predictors were asked to identify primary and secondary causal variants. In addition, for CAGI5, groups had to match each genome to one of three disorder categories (neurologic, ophthalmologic, and connective), and separately to each patient. The performance of matching genomes to categories was no better than random but two groups performed significantly better than chance in matching genomes to patients. Two of the ten variants proposed by two groups in CAGI4 were deemed to be diagnostic, and several proposed pathogenic variants in CAGI5 are good candidates for phenotype expansion. We discuss implications for improving in silico assessment of genomic variants and identifying new disease genes.


Assuntos
Biologia Computacional/métodos , Variação Genética , Doenças não Diagnosticadas/diagnóstico , Adolescente , Criança , Pré-Escolar , Simulação por Computador , Bases de Dados Genéticas , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Fenótipo , Doenças não Diagnosticadas/genética , Sequenciamento Completo do Genoma
20.
Hum Mutat ; 40(9): 1197-1201, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31334884

RESUMO

Interpretation of genomic variation plays an essential role in the analysis of cancer and monogenic disease, and increasingly also in complex trait disease, with applications ranging from basic research to clinical decisions. Many computational impact prediction methods have been developed, yet the field lacks a clear consensus on their appropriate use and interpretation. The Critical Assessment of Genome Interpretation (CAGI, /'ka-je/) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. CAGI participants are provided genetic variants and make blind predictions of resulting phenotype. Independent assessors evaluate the predictions by comparing with experimental and clinical data. CAGI has completed five editions with the goals of establishing the state of art in genome interpretation and of encouraging new methodological developments. This special issue (https://onlinelibrary.wiley.com/toc/10981004/2019/40/9) comprises reports from CAGI, focusing on the fifth edition that culminated in a conference that took place 5 to 7 July 2018. CAGI5 was comprised of 14 challenges and engaged hundreds of participants from a dozen countries. This edition had a notable increase in splicing and expression regulatory variant challenges, while also continuing challenges on clinical genomics, as well as complex disease datasets and missense variants in diseases ranging from cancer to Pompe disease to schizophrenia. Full information about CAGI is at https://genomeinterpretation.org.


Assuntos
Biologia Computacional/métodos , Genoma Humano , Algoritmos , Congressos como Assunto , Interpretação Estatística de Dados , Genômica , Humanos , Medicina de Precisão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...