Búsqueda | Portal Regional de la BVS

Physics-driven structural docking and protein language models accelerate antibody screening and design for broad-spectrum antiviral therapy.

Almubarak, Hannah Faisal; Tan, Wuwei; Hoffmann, Andrew D; Wei, Juncheng; El-Shennawy, Lamiaa; Squires, Joshua R; Sun, Yuanfei; Dashzeveg, Nurmaa K; Simonton, Brooke; Jia, Yuzhi; Iyer, Radhika; Xu, Yanan; Nicolaescu, Vlad; Elli, Derek; Randall, Glenn C; Schipma, Matthew J; Swaminathan, Suchitra; Ison, Michael G; Liu, Huiping; Fang, Deyu; Shen, Yang.

bioRxiv ; 2024 Mar 04.

Artículo en Inglés | MEDLINE | ID: mdl-38496411

RESUMEN

Therapeutic antibodies have become one of the most influential therapeutics in modern medicine to fight against infectious pathogens, cancer, and many other diseases. However, experimental screening for highly efficacious targeting antibodies is labor-intensive and of high cost, which is exacerbated by evolving antigen targets under selective pressure such as fast-mutating viral variants. As a proof-of-concept, we developed a machine learning-assisted antibody generation pipeline that greatly accelerates the screening and re-design of immunoglobulins G (IgGs) against a broad spectrum of SARS-CoV-2 coronavirus variant strains. These viruses infect human host cells via the viral spike protein binding to the host cell receptor angiotensin-converting enzyme 2 (ACE2). Using over 1300 IgG sequences derived from convalescent patient B cells that bind with spike's receptor binding domain (RBD), we first established protein structural docking models in assessing the RBD-IgG-ACE2 interaction interfaces and predicting the virus-neutralizing activity of each IgG with a confidence score. Additionally, employing Gaussian process regression (also known as Kriging) in a latent space of an antibody language model, we predicted the landscape of IgGs' activity profiles against individual coronaviral variants of concern. With functional analyses and experimental validations, we efficiently prioritized IgG candidates for neutralizing a broad spectrum of viral variants (wildtype, Delta, and Omicron) to prevent the infection of host cells in vitro and hACE2 transgenic mice in vivo. Furthermore, the computational analyses enabled rational redesigns of selective IgG clones with single amino acid substitutions at the RBD-binding interface to improve the IgG blockade efficacy for one of the severe, therapy-resistant strains - Delta (B.1.617). Our work expedites applications of artificial intelligence in antibody screening and re-design even in low-data regimes combining protein language models and Kriging for antibody sequence analysis, activity prediction, and efficacy improvement, in synergy with physics-driven protein docking models for antibody-antigen interface structure analyses and functional optimization.

Structure-Informed Protein Language Models are Robust Predictors for Variant Effects.

Sun, Yuanfei; Shen, Yang.

Res Sq ; 2023 Aug 03.

Artículo en Inglés | MEDLINE | ID: mdl-37577664

RESUMEN

Predicting protein variant effects through machine learning is often challenged by the scarcity of experimentally measured effect labels. Recently, protein language models (pLMs) emerge as zero-shot predictors without the need of effect labels, by modeling the evolutionary distribution of functional protein sequences. However, biological contexts important to variant effects are implicitly modeled and effectively marginalized. By assessing the sequence awareness and the structure awareness of pLMs, we find that their improvements often correlate with better variant effect prediction but their tradeoff can present a barrier as observed in over-finetuning to specific family sequences. We introduce a framework of structure-informed pLMs (SI-pLMs) to inject protein structural contexts purposely and controllably, by extending masked sequence denoising in conventional pLMs to cross-modality denoising. Our SI-pLMs are applicable to revising any sequence-only pLMs through model architecture and training objectives. They do not require structure data as model inputs for variant effect prediction and only use structures as context provider and model regularizer during training. Numerical results over deep mutagenesis scanning benchmarks show that our SI-pLMs, despite relatively compact sizes, are robustly top performers against competing methods including other pLMs, regardless of the target protein family's evolutionary information content or the tendency to overfitting / over-finetuning. Learned distributions in structural contexts could enhance sequence distributions in predicting variant effects. Ablation studies reveal major contributing factors and analyses of sequence embeddings provide further insights. The data and scripts are available at https://github.com/Stephen2526/Structure-informed_PLM.git.

The ENCODE Imputation Challenge: a critical assessment of methods for cross-cell type imputation of epigenomic profiles.

Schreiber, Jacob; Boix, Carles; Wook Lee, Jin; Li, Hongyang; Guan, Yuanfang; Chang, Chun-Chieh; Chang, Jen-Chien; Hawkins-Hooker, Alex; Schölkopf, Bernhard; Schweikert, Gabriele; Carulla, Mateo Rojas; Canakoglu, Arif; Guzzo, Francesco; Nanni, Luca; Masseroli, Marco; Carman, Mark James; Pinoli, Pietro; Hong, Chenyang; Yip, Kevin Y; Spence, Jeffrey P; Batra, Sanjit Singh; Song, Yun S; Mahony, Shaun; Zhang, Zheng; Tan, Wuwei; Shen, Yang; Sun, Yuanfei; Shi, Minyi; Adrian, Jessika; Sandstrom, Richard; Farrell, Nina; Halow, Jessica; Lee, Kristen; Jiang, Lixia; Yang, Xinqiong; Epstein, Charles; Strattan, J Seth; Bernstein, Bradley; Snyder, Michael; Kellis, Manolis; Stafford, William; Kundaje, Anshul.

Genome Biol ; 24(1): 79, 2023 04 18.

Artículo en Inglés | MEDLINE | ID: mdl-37072822

RESUMEN

A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.

Asunto(s)

Algoritmos , Epigenómica , Genómica/métodos

Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants.

Cline, Melissa S; Babbi, Giulia; Bonache, Sandra; Cao, Yue; Casadio, Rita; de la Cruz, Xavier; Díez, Orland; Gutiérrez-Enríquez, Sara; Katsonis, Panagiotis; Lai, Carmen; Lichtarge, Olivier; Martelli, Pier L; Mishne, Gilad; Moles-Fernández, Alejandro; Montalban, Gemma; Mooney, Sean D; O'Conner, Robert; Ootes, Lars; Özkan, Selen; Padilla, Natalia; Pagel, Kymberleigh A; Pejaver, Vikas; Radivojac, Predrag; Riera, Casandra; Savojardo, Castrense; Shen, Yang; Sun, Yuanfei; Topper, Scott; Parsons, Michael T; Spurdle, Amanda B; Goldgar, David E.

Hum Mutat ; 40(9): 1546-1556, 2019 09.

Artículo en Inglés | MEDLINE | ID: mdl-31294896

RESUMEN

Testing for variation in BRCA1 and BRCA2 (commonly referred to as BRCA1/2), has emerged as a standard clinical practice and is helping countless women better understand and manage their heritable risk of breast and ovarian cancer. Yet the increased rate of BRCA1/2 testing has led to an increasing number of Variants of Uncertain Significance (VUS), and the rate of VUS discovery currently outpaces the rate of clinical variant interpretation. Computational prediction is a key component of the variant interpretation pipeline. In the CAGI5 ENIGMA Challenge, six prediction teams submitted predictions on 326 newly-interpreted variants from the ENIGMA Consortium. By evaluating these predictions against the new interpretations, we have gained a number of insights on the state of the art of variant prediction and specific steps to further advance this state of the art.

Asunto(s)

Proteína BRCA1/genética , Proteína BRCA2/genética , Neoplasias de la Mama/diagnóstico , Biología Computacional/métodos , Neoplasias Ováricas/diagnóstico , Neoplasias de la Mama/genética , Detección Precoz del Cáncer , Femenino , Predisposición Genética a la Enfermedad , Pruebas Genéticas , Variación Genética , Humanos , Modelos Genéticos , Neoplasias Ováricas/genética

Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer.

Voskanian, Alin; Katsonis, Panagiotis; Lichtarge, Olivier; Pejaver, Vikas; Radivojac, Predrag; Mooney, Sean D; Capriotti, Emidio; Bromberg, Yana; Wang, Yanran; Miller, Max; Martelli, Pier Luigi; Savojardo, Castrense; Babbi, Giulia; Casadio, Rita; Cao, Yue; Sun, Yuanfei; Shen, Yang; Garg, Aditi; Pal, Debnath; Yu, Yao; Huff, Chad D; Tavtigian, Sean V; Young, Erin; Neuhausen, Susan L; Ziv, Elad; Pal, Lipika R; Andreoletti, Gaia; Brenner, Steven E; Kann, Maricel G.

Hum Mutat ; 40(9): 1612-1622, 2019 09.

Artículo en Inglés | MEDLINE | ID: mdl-31241222

RESUMEN

The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.

Asunto(s)

Neoplasias de la Mama/genética , Quinasa de Punto de Control 2/genética , Biología Computacional/métodos , Hispánicos o Latinos/genética , Polimorfismo de Nucleótido Simple , Adulto , Anciano , Neoplasias de la Mama/etnología , Estudios de Casos y Controles , Simulación por Computador , Femenino , Predisposición Genética a la Enfermedad , Humanos , Modelos Lineales , Persona de Mediana Edad , Estados Unidos/etnología , Secuenciación del Exoma

Predicting pathogenicity of missense variants with weakly supervised regression.

Cao, Yue; Sun, Yuanfei; Karimi, Mostafa; Chen, Haoran; Moronfoye, Oluwaseyi; Shen, Yang.

Hum Mutat ; 40(9): 1579-1592, 2019 09.

Artículo en Inglés | MEDLINE | ID: mdl-31144781

RESUMEN

Quickly growing genetic variation data of unknown clinical significance demand computational methods that can reliably predict clinical phenotypes and deeply unravel molecular mechanisms. On the platform enabled by the Critical Assessment of Genome Interpretation (CAGI), we develop a novel "weakly supervised" regression (WSR) model that not only predicts precise clinical significance (probability of pathogenicity) from inexact training annotations (class of pathogenicity) but also infers underlying molecular mechanisms in a variant-specific manner. Compared to multiclass logistic regression, a representative multiclass classifier, our kernelized WSR improves the performance for the ENIGMA Challenge set from 0.72 to 0.97 in binary area under the receiver operating characteristic curve (AUC) and from 0.64 to 0.80 in ordinal multiclass AUC. WSR model interpretation and protein structural interpretation reach consensus in corroborating the most probable molecular mechanisms by which some pathogenic BRCA1 variants confer clinical significance, namely metal-binding disruption for p.C44F and p.C47Y, protein-binding disruption for p.M18T, and structure destabilization for p.S1715N.

Asunto(s)

Proteína BRCA1/genética , Biología Computacional/métodos , Mutación Missense , Área Bajo la Curva , Predisposición Genética a la Enfermedad , Humanos , Modelos Logísticos , Aprendizaje Automático , Modelos Genéticos , Fenotipo

Predicting protein conformational changes for unbound and homology docking: learning from intrinsic and induced flexibility.

Chen, Haoran; Sun, Yuanfei; Shen, Yang.

Proteins ; 85(3): 544-556, 2017 03.

Artículo en Inglés | MEDLINE | ID: mdl-27862345

RESUMEN

Predicting protein conformational changes from unbound structures or even homology models to bound structures remains a critical challenge for protein docking. Here we present a study directly addressing the challenge by reducing the dimensionality and narrowing the range of the corresponding conformational space. The study builds on cNMA-our new framework of partner- and contact-specific normal mode analysis that exploits encounter complexes and considers both intrinsic and induced flexibility. First, we established over a CAPRI (Critical Assessment of PRedicted Interactions) target set that the direction of conformational changes from unbound structures and homology models can be reproduced to a great extent by a small set of cNMA modes. In particular, homology-to-bound interface root-mean-square deviation (iRMSD) can be reduced by 40% on average with the slowest 30 modes. Second, we developed novel and interpretable features from cNMA and used various machine learning approaches to predict the extent of conformational changes. The models learned from a set of unbound-to-bound conformational changes could predict the actual extent of iRMSD with errors around 0.6 Å for unbound proteins in a held-out benchmark subset, around 0.8 Å for unbound proteins in the CAPRI set, and around 1 Å even for homology models in the CAPRI set. Our results shed new insights into origins of conformational differences between homology models and bound structures and provide new support for the low-dimensionality of conformational adjustment during protein associations. The results also provide new tools for ensemble generation and conformational sampling in unbound and homology docking. Proteins 2017; 85:544-556. © 2016 Wiley Periodicals, Inc.

Asunto(s)

Biología Computacional/métodos , Aprendizaje Automático , Modelos Estadísticos , Simulación del Acoplamiento Molecular/métodos , Proteínas/química , Programas Informáticos , Benchmarking , Sitios de Unión , Precisión de la Medición Dimensional , Unión Proteica , Conformación Proteica , Multimerización de Proteína , Proyectos de Investigación , Homología Estructural de Proteína , Termodinámica

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA