Pesquisa | Portal de Pesquisa da BVS

1.

More than just pattern recognition: Prediction of uncommon protein structure features by AI methods.

Herzberg, Osnat; Moult, John.

Proc Natl Acad Sci U S A ; 120(28): e2221745120, 2023 07 11.

Artigo em Inglês | MEDLINE | ID: mdl-37399411

RESUMO

The CASP14 experiment demonstrated the extraordinary structure modeling capabilities of artificial intelligence (AI) methods. That result has ignited a fierce debate about what these methods are actually doing. One of the criticisms has been that the AI does not have any sense of the underlying physics but is merely performing pattern recognition. Here, we address that issue by analyzing the extent to which the methods identify rare structural motifs. The rationale underlying the approach is that a pattern recognition machine tends to choose the more frequently occurring motifs, whereas some sense of subtle energetic factors is required to choose infrequently occurring ones. To reduce the possibility of bias from related experimental structures and to minimize the effect of experimental errors, we examined only CASP14 target protein crystal structures determined to a resolution limit better than 2 Å, which lacked significant amino acid sequence homology to proteins of known structure. In those experimental structures and in the corresponding models, we track cis peptides, π-helices, 310-helices, and other small 3D motifs that occur in the PDB database at a frequency of lower than 1% of total amino acid residues. The best-performing AI method, AlphaFold2, captured these uncommon structural elements exquisitely well. All discrepancies appeared to be a consequence of crystal environment effects. We propose that the neural network learned a protein structure potential of mean force, enabling it to correctly identify situations where unusual structural features represent the lowest local free energy because of subtle influences from the atomic environment.

Assuntos

Inteligência Artificial , Proteínas , Sequência de Aminoácidos , Proteínas/química , Estrutura Secundária de Proteína , Redes Neurais de Computação , Conformação Proteica

2.

Breaking the conformational ensemble barrier: Ensemble structure modeling challenges in CASP15.

Kryshtafovych, Andriy; Montelione, Gaetano T; Rigden, Daniel J; Mesdaghi, Shahram; Karaca, Ezgi; Moult, John.

Proteins ; 91(12): 1903-1911, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37872703

RESUMO

For the first time, the 2022 CASP (Critical Assessment of Structure Prediction) community experiment included a section on computing multiple conformations for protein and RNA structures. There was full or partial success in reproducing the ensembles for four of the nine targets, an encouraging result. For protein structures, enhanced sampling with variations of the AlphaFold2 deep learning method was by far the most effective approach. One substantial conformational change caused by a single mutation across a complex interface was accurately reproduced. In two other assembly modeling cases, methods succeeded in sampling conformations near to the experimental ones even though environmental factors were not included in the calculations. An experimentally derived flexibility ensemble allowed a single accurate RNA structure model to be identified. Difficulties included how to handle sparse or low-resolution experimental data and the current lack of effective methods for modeling RNA/protein complexes. However, these and other obstacles appear addressable.

Assuntos

Proteínas , RNA , Conformação Proteica , Proteínas/química , Mutação

3.

Critical assessment of methods of protein structure prediction (CASP)-Round XV.

Kryshtafovych, Andriy; Schwede, Torsten; Topf, Maya; Fidelis, Krzysztof; Moult, John.

Proteins ; 91(12): 1539-1549, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37920879

RESUMO

Computing protein structure from amino acid sequence information has been a long-standing grand challenge. Critical assessment of structure prediction (CASP) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every 2 years. The 2020 experiment (CASP14) saw major progress, with the second generation of deep learning methods delivering accuracy comparable with experiment for many single proteins. There is an expectation that these methods will have much wider application in computational structural biology. Here we summarize results from the most recent experiment, CASP15, in 2022, with an emphasis on new deep learning-driven progress. Other papers in this special issue of proteins provide more detailed analysis. For single protein structures, the AlphaFold2 deep learning method is still superior to other approaches, but there are two points of note. First, although AlphaFold2 was the core of all the most successful methods, there was a wide variety of implementation and combination with other methods. Second, using the standard AlphaFold2 protocol and default parameters only produces the highest quality result for about two thirds of the targets, and more extensive sampling is required for the others. The major advance in this CASP is the enormous increase in the accuracy of computed protein complexes, achieved by the use of deep learning methods, although overall these do not fully match the performance for single proteins. Here too, AlphaFold2 based method perform best, and again more extensive sampling than the defaults is often required. Also of note are the encouraging early results on the use of deep learning to compute ensembles of macromolecular structures. Critically for the usability of computed structures, for both single proteins and protein complexes, deep learning derived estimates of both local and global accuracy are of high quality, however the estimates in interface regions are slightly less reliable. CASP15 also included computation of RNA structures for the first time. Here, the classical approaches produced better agreement with experiment than the new deep learning ones, and accuracy is limited. Also, for the first time, CASP included the computation of protein-ligand complexes, an area of special interest for drug design. Here too, classical methods were still superior to deep learning ones. Many new approaches were discussed at the CASP conference, and it is clear methods will continue to advance.

Assuntos

Biologia Computacional , Proteínas , Conformação Proteica , Modelos Moleculares , Proteínas/química , Sequência de Aminoácidos , Biologia Computacional/métodos

4.

New prediction categories in CASP15.

Kryshtafovych, Andriy; Antczak, Maciej; Szachniuk, Marta; Zok, Tomasz; Kretsch, Rachael C; Rangan, Ramya; Pham, Phillip; Das, Rhiju; Robin, Xavier; Studer, Gabriel; Durairaj, Janani; Eberhardt, Jerome; Sweeney, Aaron; Topf, Maya; Schwede, Torsten; Fidelis, Krzysztof; Moult, John.

Proteins ; 91(12): 1550-1557, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37306011

RESUMO

Prediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.

Assuntos

Biologia Computacional , Proteínas , Conformação Proteica , Proteínas/química , Modelos Moleculares , Ligantes

5.

RNA target highlights in CASP15: Evaluation of predicted models by structure providers.

Kretsch, Rachael C; Andersen, Ebbe S; Bujnicki, Janusz M; Chiu, Wah; Das, Rhiju; Luo, Bingnan; Masquida, Benoît; McRae, Ewan K S; Schroeder, Griffin M; Su, Zhaoming; Wedekind, Joseph E; Xu, Lily; Zhang, Kaiming; Zheludev, Ivan N; Moult, John; Kryshtafovych, Andriy.

Proteins ; 91(12): 1600-1615, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37466021

RESUMO

The first RNA category of the Critical Assessment of Techniques for Structure Prediction competition was only made possible because of the scientists who provided experimental structures to challenge the predictors. In this article, these scientists offer a unique and valuable analysis of both the successes and areas for improvement in the predicted models. All 10 RNA-only targets yielded predictions topologically similar to experimentally determined structures. For one target, experimentalists were able to phase their x-ray diffraction data by molecular replacement, showing a potential application of structure predictions for RNA structural biologists. Recommended areas for improvement include: enhancing the accuracy in local interaction predictions and increased consideration of the experimental conditions such as multimerization, structure determination method, and time along folding pathways. The prediction of RNA-protein complexes remains the most significant challenge. Finally, given the intrinsic flexibility of many RNAs, we propose the consideration of ensemble models.

Assuntos

Biologia Computacional , Proteínas , Conformação Proteica , Proteínas/química , Modelos Moleculares , Biologia Computacional/métodos , Difração de Raios X

6.

Protein target highlights in CASP15: Analysis of models by structure providers.

Alexander, Leila T; Durairaj, Janani; Kryshtafovych, Andriy; Abriata, Luciano A; Bayo, Yusupha; Bhabha, Gira; Breyton, Cécile; Caulton, Simon G; Chen, James; Degroux, Séraphine; Ekiert, Damian C; Erlandsen, Benedikte S; Freddolino, Peter L; Gilzer, Dominic; Greening, Chris; Grimes, Jonathan M; Grinter, Rhys; Gurusaran, Manickam; Hartmann, Marcus D; Hitchman, Charlie J; Keown, Jeremy R; Kropp, Ashleigh; Kursula, Petri; Lovering, Andrew L; Lemaitre, Bruno; Lia, Andrea; Liu, Shiheng; Logotheti, Maria; Lu, Shuze; Markússon, Sigurbjörn; Miller, Mitchell D; Minasov, George; Niemann, Hartmut H; Opazo, Felipe; Phillips, George N; Davies, Owen R; Rommelaere, Samuel; Rosas-Lemus, Monica; Roversi, Pietro; Satchell, Karla; Smith, Nathan; Wilson, Mark A; Wu, Kuan-Lin; Xia, Xian; Xiao, Han; Zhang, Wenhua; Zhou, Z Hong; Fidelis, Krzysztof; Topf, Maya; Moult, John.

Proteins ; 91(12): 1571-1599, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37493353

RESUMO

We present an in-depth analysis of selected CASP15 targets, focusing on their biological and functional significance. The authors of the structures identify and discuss key protein features and evaluate how effectively these aspects were captured in the submitted predictions. While the overall ability to predict three-dimensional protein structures continues to impress, reproducing uncommon features not previously observed in experimental structures is still a challenge. Furthermore, instances with conformational flexibility and large multimeric complexes highlight the need for novel scoring strategies to better emphasize biologically relevant structural regions. Looking ahead, closer integration of computational and experimental techniques will play a key role in determining the next challenges to be unraveled in the field of structural molecular biology.

Assuntos

Biologia Computacional , Proteínas , Conformação Proteica , Modelos Moleculares , Biologia Computacional/métodos , Proteínas/química

7.

MecCog: a knowledge representation framework for genetic disease mechanism.

Kundu, Kunal; Darden, Lindley; Moult, John.

Bioinformatics ; 37(22): 4180-4186, 2021 11 18.

Artigo em Inglês | MEDLINE | ID: mdl-34117883

RESUMO

MOTIVATION: Experimental findings on genetic disease mechanisms are scattered throughout the literature and represented in many ways, including unstructured text, cartoons, pathway diagrams and network graphs. Integration and structuring of such mechanistic information greatly enhances its utility. RESULTS: MecCog is a graphical framework for building integrated representations (mechanism schemas) of mechanisms by which a genetic variant causes a disease phenotype. A MecCog mechanism schema displays the propagation of system perturbations across stages of biological organization, using graphical notations to symbolize perturbed entities and activities, hyperlinked evidence tagging, a mechanism ontology and depiction of knowledge gaps, ambiguities and uncertainties. The web platform enables a user to construct, store, publish, browse, query and comment on schemas. MecCog facilitates the identification of potential biomarkers, therapeutic intervention sites and critical future experiments. AVAILABILITY AND IMPLEMENTATION: The MecCog framework is freely available at http://www.meccog.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Doenças Genéticas Inatas , Fenótipo , Biologia Computacional

8.

Critical assessment of methods of protein structure prediction (CASP)-Round XIV.

Kryshtafovych, Andriy; Schwede, Torsten; Topf, Maya; Fidelis, Krzysztof; Moult, John.

Proteins ; 89(12): 1607-1617, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34533838

RESUMO

Critical assessment of structure prediction (CASP) is a community experiment to advance methods of computing three-dimensional protein structure from amino acid sequence. Core components are rigorous blind testing of methods and evaluation of the results by independent assessors. In the most recent experiment (CASP14), deep-learning methods from one research group consistently delivered computed structures rivaling the corresponding experimental ones in accuracy. In this sense, the results represent a solution to the classical protein-folding problem, at least for single proteins. The models have already been shown to be capable of providing solutions for problematic crystal structures, and there are broad implications for the rest of structural biology. Other research groups also substantially improved performance. Here, we describe these results and outline some of the many implications. Other related areas of CASP, including modeling of protein complexes, structure refinement, estimation of model accuracy, and prediction of inter-residue contacts and distances, are also described.

Assuntos

Conformação Proteica , Dobramento de Proteína , Proteínas , Software , Sequência de Aminoácidos , Biologia Computacional , Modelos Estatísticos , Simulação de Dinâmica Molecular , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína

9.

Modeling SARS-CoV-2 proteins in the CASP-commons experiment.

Kryshtafovych, Andriy; Moult, John; Billings, Wendy M; Della Corte, Dennis; Fidelis, Krzysztof; Kwon, Sohee; Olechnovic, Kliment; Seok, Chaok; Venclovas, Ceslovas; Won, Jonghun.

Proteins ; 89(12): 1987-1996, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34462960

RESUMO

Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state of the art in computing protein structure from sequence. In the spring of 2020, CASP launched a community project to compute the structures of the most structurally challenging proteins coded for in the SARS-CoV-2 genome. Forty-seven research groups submitted over 3000 three-dimensional models and 700 sets of accuracy estimates on 10 proteins. The resulting models were released to the public. CASP community members also worked together to provide estimates of local and global accuracy and identify structure-based domain boundaries for some proteins. Subsequently, two of these structures (ORF3a and ORF8) have been solved experimentally, allowing assessment of both model quality and the accuracy estimates. Models from the AlphaFold2 group were found to have good agreement with the experimental structures, with main chain GDT_TS accuracy scores ranging from 63 (a correct topology) to 87 (competitive with experiment).

Assuntos

SARS-CoV-2/química , Proteínas Virais/química , COVID-19/virologia , Genoma Viral , Humanos , Modelos Moleculares , Conformação Proteica , Domínios Proteicos , SARS-CoV-2/genética , Proteínas Virais/genética , Proteínas Viroporinas/química , Proteínas Viroporinas/genética

10.

Computational models in the service of X-ray and cryo-electron microscopy structure determination.

Kryshtafovych, Andriy; Moult, John; Albrecht, Reinhard; Chang, Geoffrey A; Chao, Kinlin; Fraser, Alec; Greenfield, Julia; Hartmann, Marcus D; Herzberg, Osnat; Josts, Inokentijs; Leiman, Petr G; Linden, Sara B; Lupas, Andrei N; Nelson, Daniel C; Rees, Steven D; Shang, Xiaoran; Sokolova, Maria L; Tidow, Henning.

Proteins ; 89(12): 1633-1646, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34449113

RESUMO

Critical assessment of structure prediction (CASP) conducts community experiments to determine the state of the art in computing protein structure from amino acid sequence. The process relies on the experimental community providing information about not yet public or about to be solved structures, for use as targets. For some targets, the experimental structure is not solved in time for use in CASP. Calculated structure accuracy improved dramatically in this round, implying that models should now be much more useful for resolving many sorts of experimental difficulties. To test this, selected models for seven unsolved targets were provided to the experimental groups. These models were from the AlphaFold2 group, who overall submitted the most accurate predictions in CASP14. Four targets were solved with the aid of the models, and, additionally, the structure of an already solved target was improved. An a posteriori analysis showed that, in some cases, models from other groups would also be effective. This paper provides accounts of the successful application of models to structure determination, including molecular replacement for X-ray crystallography, backbone tracing and sequence positioning in a cryo-electron microscopy structure, and correction of local features. The results suggest that, in future, there will be greatly increased synergy between computational and experimental approaches to structure determination.

Assuntos

Biologia Computacional/métodos , Microscopia Crioeletrônica , Cristalografia por Raios X , Modelos Moleculares , Proteínas/química , Conformação Proteica , Software

11.

Target highlights in CASP14: Analysis of models by structure providers.

Alexander, Leila T; Lepore, Rosalba; Kryshtafovych, Andriy; Adamopoulos, Athanassios; Alahuhta, Markus; Arvin, Ann M; Bomble, Yannick J; Böttcher, Bettina; Breyton, Cécile; Chiarini, Valerio; Chinnam, Naga Babu; Chiu, Wah; Fidelis, Krzysztof; Grinter, Rhys; Gupta, Gagan D; Hartmann, Marcus D; Hayes, Christopher S; Heidebrecht, Tatjana; Ilari, Andrea; Joachimiak, Andrzej; Kim, Youngchang; Linares, Romain; Lovering, Andrew L; Lunin, Vladimir V; Lupas, Andrei N; Makbul, Cihan; Michalska, Karolina; Moult, John; Mukherjee, Prasun K; Nutt, William Sam; Oliver, Stefan L; Perrakis, Anastassis; Stols, Lucy; Tainer, John A; Topf, Maya; Tsutakawa, Susan E; Valdivia-Delgado, Mauricio; Schwede, Torsten.

Proteins ; 89(12): 1647-1672, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34561912

RESUMO

The biological and functional significance of selected Critical Assessment of Techniques for Protein Structure Prediction 14 (CASP14) targets are described by the authors of the structures. The authors highlight the most relevant features of the target proteins and discuss how well these features were reproduced in the respective submitted predictions. The overall ability to predict three-dimensional structures of proteins has improved remarkably in CASP14, and many difficult targets were modeled with impressive accuracy. For the first time in the history of CASP, the experimentalists not only highlighted that computational models can accurately reproduce the most critical structural features observed in their targets, but also envisaged that models could serve as a guidance for further studies of biologically-relevant properties of proteins.

Assuntos

Modelos Moleculares , Conformação Proteica , Proteínas/química , Software , Sequência de Aminoácidos , Biologia Computacional , Microscopia Crioeletrônica , Cristalografia por Raios X , Análise de Sequência de Proteína

12.

Matching whole genomes to rare genetic disorders: Identification of potential causative variants using phenotype-weighted knowledge in the CAGI SickKids5 clinical genomes challenge.

Pal, Lipika R; Kundu, Kunal; Yin, Yizhou; Moult, John.

Hum Mutat ; 41(2): 347-362, 2020 02.

Artigo em Inglês | MEDLINE | ID: mdl-31680375

RESUMO

Precise identification of causative variants from whole-genome sequencing data, including both coding and noncoding variants, is challenging. The Critical Assessment of Genome Interpretation 5 SickKids clinical genome challenge provided an opportunity to assess our ability to extract such information. Participants in the challenge were required to match each of the 24 whole-genome sequences to the correct phenotypic profile and to identify the disease class of each genome. These are all rare disease cases that have resisted genetic diagnosis in a state-of-the-art pipeline. The patients have a range of eye, neurological, and connective-tissue disorders. We used a gene-centric approach to address this problem, assigning each gene a multiphenotype-matching score. Mutations in the top-scoring genes for each phenotype profile were ranked on a 6-point scale of pathogenicity probability, resulting in an approximately equal number of top-ranked coding and noncoding candidate variants overall. We were able to assign the correct disease class for 12 cases and the correct genome to a clinical profile for five cases. The challenge assessor found genes in three of these five cases as likely appropriate. In the postsubmission phase, after careful screening of the genes in the correct genome, we identified additional potential diagnostic variants, a high proportion of which are noncoding.

Assuntos

Estudos de Associação Genética/métodos , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Genoma Humano , Genômica/métodos , Doenças Raras , Algoritmos , Alelos , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Modelos Teóricos , Fenótipo , Sequenciamento Completo do Genoma , Fluxo de Trabalho

13.

Reports from the fifth edition of CAGI: The Critical Assessment of Genome Interpretation.

Andreoletti, Gaia; Pal, Lipika R; Moult, John; Brenner, Steven E.

Hum Mutat ; 40(9): 1197-1201, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31334884

RESUMO

Interpretation of genomic variation plays an essential role in the analysis of cancer and monogenic disease, and increasingly also in complex trait disease, with applications ranging from basic research to clinical decisions. Many computational impact prediction methods have been developed, yet the field lacks a clear consensus on their appropriate use and interpretation. The Critical Assessment of Genome Interpretation (CAGI, /'ka-je/) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. CAGI participants are provided genetic variants and make blind predictions of resulting phenotype. Independent assessors evaluate the predictions by comparing with experimental and clinical data. CAGI has completed five editions with the goals of establishing the state of art in genome interpretation and of encouraging new methodological developments. This special issue (https://onlinelibrary.wiley.com/toc/10981004/2019/40/9) comprises reports from CAGI, focusing on the fifth edition that culminated in a conference that took place 5 to 7 July 2018. CAGI5 was comprised of 14 challenges and engaged hundreds of participants from a dozen countries. This edition had a notable increase in splicing and expression regulatory variant challenges, while also continuing challenges on clinical genomics, as well as complex disease datasets and missense variants in diseases ranging from cancer to Pompe disease to schizophrenia. Full information about CAGI is at https://genomeinterpretation.org.

Assuntos

Biologia Computacional/métodos , Genoma Humano , Algoritmos , Congressos como Assunto , Interpretação Estatística de Dados , Genômica , Humanos , Medicina de Precisão

14.

Assessment of methods for predicting the effects of PTEN and TPMT protein variants.

Pejaver, Vikas; Babbi, Giulia; Casadio, Rita; Folkman, Lukas; Katsonis, Panagiotis; Kundu, Kunal; Lichtarge, Olivier; Martelli, Pier Luigi; Miller, Maximilian; Moult, John; Pal, Lipika R; Savojardo, Castrense; Yin, Yizhou; Zhou, Yaoqi; Radivojac, Predrag; Bromberg, Yana.

Hum Mutat ; 40(9): 1495-1506, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31184403

RESUMO

Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.

Assuntos

Biologia Computacional/métodos , Metiltransferases/química , Mutação , PTEN Fosfo-Hidrolase/química , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Metiltransferases/genética , PTEN Fosfo-Hidrolase/genética , Estabilidade Proteica

15.

Assessment of patient clinical descriptions and pathogenic variants from gene panel sequences in the CAGI-5 intellectual disability challenge.

Carraro, Marco; Monzon, Alexander Miguel; Chiricosta, Luigi; Reggiani, Francesco; Aspromonte, Maria Cristina; Bellini, Mariagrazia; Pagel, Kymberleigh; Jiang, Yuxiang; Radivojac, Predrag; Kundu, Kunal; Pal, Lipika R; Yin, Yizhou; Limongelli, Ivan; Andreoletti, Gaia; Moult, John; Wilson, Stephen J; Katsonis, Panagiotis; Lichtarge, Olivier; Chen, Jingqi; Wang, Yaqiong; Hu, Zhiqiang; Brenner, Steven E; Ferrari, Carlo; Murgia, Alessandra; Tosatto, Silvio C E; Leonardi, Emanuela.

Hum Mutat ; 40(9): 1330-1345, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31144778

RESUMO

The Critical Assessment of Genome Interpretation-5 intellectual disability challenge asked to use computational methods to predict patient clinical phenotypes and the causal variant(s) based on an analysis of their gene panel sequence data. Sequence data for 74 genes associated with intellectual disability (ID) and/or autism spectrum disorders (ASD) from a cohort of 150 patients with a range of neurodevelopmental manifestations (i.e. ID, autism, epilepsy, microcephaly, macrocephaly, hypotonia, ataxia) have been made available for this challenge. For each patient, predictors had to report the causative variants and which of the seven phenotypes were present. Since neurodevelopmental disorders are characterized by strong comorbidity, tested individuals often present more than one pathological condition. Considering the overall clinical manifestation of each patient, the correct phenotype has been predicted by at least one group for 93 individuals (62%). ID and ASD were the best predicted among the seven phenotypic traits. Also, causative or potentially pathogenic variants were predicted correctly by at least one group. However, the prediction of the correct causative variant seems to be insufficient to predict the correct phenotype. In some cases, the correct prediction has been supported by rare or common variants in genes different from the causative one.

Assuntos

Transtorno do Espectro Autista/genética , Biologia Computacional/métodos , Deficiência Intelectual/genética , Análise de Sequência de DNA/métodos , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Fenótipo , Locos de Características Quantitativas

16.

Performance of computational methods for the evaluation of pericentriolar material 1 missense variants in CAGI-5.

Monzon, Alexander Miguel; Carraro, Marco; Chiricosta, Luigi; Reggiani, Francesco; Han, James; Ozturk, Kivilcim; Wang, Yanran; Miller, Maximilian; Bromberg, Yana; Capriotti, Emidio; Savojardo, Castrense; Babbi, Giulia; Martelli, Pier L; Casadio, Rita; Katsonis, Panagiotis; Lichtarge, Olivier; Carter, Hannah; Kousi, Maria; Katsanis, Nicholas; Andreoletti, Gaia; Moult, John; Brenner, Steven E; Ferrari, Carlo; Leonardi, Emanuela; Tosatto, Silvio C E.

Hum Mutat ; 40(9): 1474-1485, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31260570

RESUMO

The CAGI-5 pericentriolar material 1 (PCM1) challenge aimed to predict the effect of 38 transgenic human missense mutations in the PCM1 protein implicated in schizophrenia. Participants were provided with 16 benign variants (negative controls), 10 hypomorphic, and 12 loss of function variants. Six groups participated and were asked to predict the probability of effect and standard deviation associated to each mutation. Here, we present the challenge assessment. Prediction performance was evaluated using different measures to conclude in a final ranking which highlights the strengths and weaknesses of each group. The results show a great variety of predictions where some methods performed significantly better than others. Benign variants played an important role as negative controls, highlighting predictors biased to identify disease phenotypes. The best predictor, Bromberg lab, used a neural-network-based method able to discriminate between neutral and non-neutral single nucleotide polymorphisms. The CAGI-5 PCM1 challenge allowed us to evaluate the state of the art techniques for interpreting the effect of novel variants for a difficult target protein.

Assuntos

Autoantígenos/genética , Proteínas de Ciclo Celular/genética , Biologia Computacional/métodos , Mutação de Sentido Incorreto , Esquizofrenia/genética , Bases de Dados Genéticas , Predisposição Genética para Doença , Humanos , Redes Neurais de Computação , Fenótipo , Polimorfismo de Nucleotídeo Único

17.

Predicting venous thromboembolism risk from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges.

McInnes, Gregory; Daneshjou, Roxana; Katsonis, Panagiostis; Lichtarge, Olivier; Srinivasan, Rajgopal; Rana, Sadhna; Radivojac, Predrag; Mooney, Sean D; Pagel, Kymberleigh A; Stamboulian, Moses; Jiang, Yuxiang; Capriotti, Emidio; Wang, Yanran; Bromberg, Yana; Bovo, Samuele; Savojardo, Castrense; Martelli, Pier Luigi; Casadio, Rita; Pal, Lipika R; Moult, John; Brenner, Steven E; Altman, Russ.

Hum Mutat ; 40(9): 1314-1320, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31140652

RESUMO

Genetics play a key role in venous thromboembolism (VTE) risk, however established risk factors in European populations do not translate to individuals of African descent because of the differences in allele frequencies between populations. As part of the fifth iteration of the Critical Assessment of Genome Interpretation, participants were asked to predict VTE status in exome data from African American subjects. Participants were provided with 103 unlabeled exomes from patients treated with warfarin for non-VTE causes or VTE and asked to predict which disease each subject had been treated for. Given the lack of training data, many participants opted to use unsupervised machine learning methods, clustering the exomes by variation in genes known to be associated with VTE. The best performing method using only VTE related genes achieved an area under the ROC curve of 0.65. Here, we discuss the range of methods used in the prediction of VTE from sequence data and explore some of the difficulties of conducting a challenge with known confounders. In addition, we show that an existing genetic risk score for VTE that was developed in European subjects works well in African Americans.

Assuntos

Sequenciamento do Exoma/métodos , Tromboembolia Venosa/genética , Varfarina/administração & dosagem , Análise por Conglomerados , Biologia Computacional/métodos , Congressos como Assunto , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Curva ROC , Aprendizado de Máquina não Supervisionado , Tromboembolia Venosa/tratamento farmacológico , Varfarina/uso terapêutico

18.

CAGI SickKids challenges: Assessment of phenotype and variant predictions derived from clinical and genomic data of children with undiagnosed diseases.

Kasak, Laura; Hunter, Jesse M; Udani, Rupa; Bakolitsa, Constantina; Hu, Zhiqiang; Adhikari, Aashish N; Babbi, Giulia; Casadio, Rita; Gough, Julian; Guerrero, Rafael F; Jiang, Yuxiang; Joseph, Thomas; Katsonis, Panagiotis; Kotte, Sujatha; Kundu, Kunal; Lichtarge, Olivier; Martelli, Pier Luigi; Mooney, Sean D; Moult, John; Pal, Lipika R; Poitras, Jennifer; Radivojac, Predrag; Rao, Aditya; Sivadasan, Naveen; Sunderam, Uma; Saipradeep, V G; Yin, Yizhou; Zaucha, Jan; Brenner, Steven E; Meyn, M Stephen.

Hum Mutat ; 40(9): 1373-1391, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31322791

RESUMO

Whole-genome sequencing (WGS) holds great potential as a diagnostic test. However, the majority of patients currently undergoing WGS lack a molecular diagnosis, largely due to the vast number of undiscovered disease genes and our inability to assess the pathogenicity of most genomic variants. The CAGI SickKids challenges attempted to address this knowledge gap by assessing state-of-the-art methods for clinical phenotype prediction from genomes. CAGI4 and CAGI5 participants were provided with WGS data and clinical descriptions of 25 and 24 undiagnosed patients from the SickKids Genome Clinic Project, respectively. Predictors were asked to identify primary and secondary causal variants. In addition, for CAGI5, groups had to match each genome to one of three disorder categories (neurologic, ophthalmologic, and connective), and separately to each patient. The performance of matching genomes to categories was no better than random but two groups performed significantly better than chance in matching genomes to patients. Two of the ten variants proposed by two groups in CAGI4 were deemed to be diagnostic, and several proposed pathogenic variants in CAGI5 are good candidates for phenotype expansion. We discuss implications for improving in silico assessment of genomic variants and identifying new disease genes.

Assuntos

Biologia Computacional/métodos , Variação Genética , Doenças não Diagnosticadas/diagnóstico , Adolescente , Criança , Pré-Escolar , Simulação por Computador , Bases de Dados Genéticas , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Fenótipo , Doenças não Diagnosticadas/genética , Sequenciamento Completo do Genoma

19.

Assessing computational predictions of the phenotypic effect of cystathionine-beta-synthase variants.

Kasak, Laura; Bakolitsa, Constantina; Hu, Zhiqiang; Yu, Changhua; Rine, Jasper; Dimster-Denk, Dago F; Pandey, Gaurav; De Baets, Greet; Bromberg, Yana; Cao, Chen; Capriotti, Emidio; Casadio, Rita; Van Durme, Joost; Giollo, Manuel; Karchin, Rachel; Katsonis, Panagiotis; Leonardi, Emanuela; Lichtarge, Olivier; Martelli, Pier Luigi; Masica, David; Mooney, Sean D; Olatubosun, Ayodeji; Radivojac, Predrag; Rousseau, Frederic; Pal, Lipika R; Savojardo, Castrense; Schymkowitz, Joost; Thusberg, Janita; Tosatto, Silvio C E; Vihinen, Mauno; Väliaho, Jouni; Repo, Susanna; Moult, John; Brenner, Steven E; Friedberg, Iddo.

Hum Mutat ; 40(9): 1530-1545, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31301157

RESUMO

Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.

Assuntos

Substituição de Aminoácidos , Biologia Computacional/métodos , Cistationina beta-Sintase/genética , Cistationina/metabolismo , Cistationina beta-Sintase/metabolismo , Homocisteína/metabolismo , Humanos , Fenótipo , Medicina de Precisão

20.

Assessment of predicted enzymatic activity of α-N-acetylglucosaminidase variants of unknown significance for CAGI 2016.

Clark, Wyatt T; Kasak, Laura; Bakolitsa, Constantina; Hu, Zhiqiang; Andreoletti, Gaia; Babbi, Giulia; Bromberg, Yana; Casadio, Rita; Dunbrack, Roland; Folkman, Lukas; Ford, Colby T; Jones, David; Katsonis, Panagiotis; Kundu, Kunal; Lichtarge, Olivier; Martelli, Pier L; Mooney, Sean D; Nodzak, Conor; Pal, Lipika R; Radivojac, Predrag; Savojardo, Castrense; Shi, Xinghua; Zhou, Yaoqi; Uppal, Aneeta; Xu, Qifang; Yin, Yizhou; Pejaver, Vikas; Wang, Meng; Wei, Liping; Moult, John; Yu, Guoying Karen; Brenner, Steven E; LeBowitz, Jonathan H.

Hum Mutat ; 40(9): 1519-1529, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31342580

RESUMO

The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population-scale analysis of disease epidemiology and rare variant association analysis.

Assuntos

Acetilglucosaminidase/metabolismo , Biologia Computacional/métodos , Mutação de Sentido Incorreto , Acetilglucosaminidase/genética , Humanos , Modelos Genéticos , Análise de Regressão

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA