RESUMEN
Therapeutic antibodies have become one of the most influential therapeutics in modern medicine to fight against infectious pathogens, cancer, and many other diseases. However, experimental screening for highly efficacious targeting antibodies is labor-intensive and of high cost, which is exacerbated by evolving antigen targets under selective pressure such as fast-mutating viral variants. As a proof-of-concept, we developed a machine learning-assisted antibody generation pipeline AbGen that greatly accelerates the screening and re-design of immunoglobulins G (IgGs) against a broad spectrum of SARS-CoV-2 coronavirus variant strains. Our AbGen centers around a novel antibody language model (AbLM) that is pretrained on 12 million generic protein domain sequences and fine-tuned on 4,000+ paired VH-VL sequences, with IgG-specific CDR-masking and VH-VL cross-attention. AbLM provides a latent space of IgG sequence embeddings for AbGen, including (a) landscapes of IgGs' activities in neutralizing the wild-type virus are analyzed through structure prediction for IgG and IgG-antigen (viral protein spike's receptor binding domain, RBD) interactions; and (b) landscapes of IgGs' susceptibility in neutralizing variant viruses are predicted through Gaussian process regression, despite that as few as 14 clinical antibodies' responses to variants of concern are available. The AbGen pipeline was applied to over 1300 IgG sequences we collected from RBD-binding B cells of convalescent patients. With experimental validations, AbGen efficiently prioritized IgG candidates against a broad spectrum of viral variants (wildtype, Delta, and Omicron), preventing the infection of host cells in vitro and hACE2 transgenic mice in vivo. Compared to other existing protein language models that require 10-100 times more model parameters, AbLM improved the precision from around 50% to 75% to predict IgGs with low variant susceptibility. Furthermore, AbGen enables structure-based computational protein redesign for selected IgG clones with single amino acid substitutions at the RBD-binding interface that doubled the IgG blockade efficacy for one of the severe, therapy-resistant strains - Delta (B.1.617). Our work expedites applications of artificial intelligence in antibody screen and re-design combining data-driven protein language models and Kriging for antibody sequence analysis and activity prediction, in synergy with physics-driven protein docking and design for antibody-antigen interface analyses and functional optimization.
RESUMEN
Physiologic activation of estrogen receptor α (ERα) is mediated by estradiol (E2) binding in the ligand-binding pocket of the receptor, repositioning helix 12 (H12) to facilitate binding of coactivator proteins in the unoccupied coactivator binding groove. In breast cancer, activation of ERα is often observed through point mutations that lead to the same H12 repositioning in the absence of E2. Through expanded genetic sequencing of breast cancer patients, we identified a collection of mutations located far from H12 but nonetheless capable of promoting E2-independent transcription and breast cancer cell growth. Using machine learning and computational structure analyses, this set of mutants was inferred to act distinctly from the H12-repositioning mutants and instead was associated with conformational changes across the ERα dimer interface. Through both in vitro and in-cell assays of full-length ERα protein and isolated ligand-binding domain, we found that these mutants promoted ERα dimerization, stability, and nuclear localization. Point mutations that selectively disrupted dimerization abrogated E2-independent transcriptional activity of these dimer-promoting mutants. The results reveal a distinct mechanism for activation of ERα function through enforced receptor dimerization and suggest dimer disruption as a potential therapeutic strategy to treat ER-dependent cancers.
Asunto(s)
Neoplasias de la Mama , Femenino , Humanos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Proliferación Celular , Dimerización , Estradiol/farmacología , Estradiol/metabolismo , Receptor alfa de Estrógeno/genética , Receptor alfa de Estrógeno/metabolismo , Ligandos , MutaciónRESUMEN
MOTIVATION: A growing amount of noncoding genetic variants, including single-nucleotide polymorphisms, are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epigenetic profiles. However, current models often focus on local, 1D genome sequence determinants and disregard global, 3D chromatin structure that critically affects epigenetic events. RESULTS: We find that noncoding variants of unexpected high similarity in epigenetic profiles, with regards to their relatively low similarity in local sequences, can be largely attributed to their proximity in chromatin structure. Accordingly, we have developed a multimodal deep learning scheme that incorporates both data of 1D genome sequence and 3D chromatin structure for predicting noncoding variant effects. Specifically, we have integrated convolutional and recurrent neural networks for sequence embedding and graph neural networks for structure embedding despite the resolution gap between the two types of data, while utilizing recent DNA language models. Numerical results show that our models outperform competing sequence-only models in predicting epigenetic profiles and their use of long-range interactions complement sequence-only models in extracting regulatory motifs. They prove to be excellent predictors for noncoding variant effects in gene expression and pathogenicity, whether in unsupervised "zero-shot" learning or supervised "few-shot" learning. AVAILABILITY AND IMPLEMENTATION: Codes and data can be accessed at https://github.com/Shen-Lab/ncVarPred-1D3D and https://zenodo.org/record/7975777.
Asunto(s)
Cromatina , Epigenómica , Humanos , Cromatina/genética , Lenguaje , Herencia Multifactorial , Redes Neurales de la ComputaciónRESUMEN
In the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6), the Genetics of Neurodevelopmental Disorders Lab in Padua proposed a new ID-challenge to give the opportunity of developing computational methods for predicting patient's phenotype and the causal variants. Eight research teams and 30 models had access to the phenotype details and real genetic data, based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. In this study we evaluate the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and causal variants. Finally, we asked to develop a method to find new possible genetic causes for patients without a genetic diagnosis. As already done for the CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (causative, putative pathogenic and contributing factors) were provided. Considering the overall clinical manifestation of our cohort, we give out the variant data and phenotypic traits of the 150 patients from CAGI5 ID-Challenge as training and validation for the prediction methods development.
RESUMEN
A promising alternative to comprehensively performing genomics experiments is to, instead, perform a subset of experiments and use computational methods to impute the remainder. However, identifying the best imputation methods and what measures meaningfully evaluate performance are open questions. We address these questions by comprehensively analyzing 23 methods from the ENCODE Imputation Challenge. We find that imputation evaluations are challenging and confounded by distributional shifts from differences in data collection and processing over time, the amount of available data, and redundancy among performance measures. Our analyses suggest simple steps for overcoming these issues and promising directions for more robust research.
Asunto(s)
Algoritmos , Epigenómica , Genómica/métodosRESUMEN
Myelofibrosis (MF) is the deadliest form of myeloproliferative neoplasm (MPN). The JAK inhibitor Ruxolitinib can reduce constitutional symptoms but it does not substantially improve bone marrow fibrosis. Pim1 expression is significantly elevated in MPN/MF hematopoietic progenitors. Here, we show that genetic ablation of Pim1 blocked the development of myelofibrosis induced by Jak2V617F and MPLW515L. Pharmacologic inhibition of Pim1 with a second-generation Pim kinase inhibitor TP-3654 significantly reduced leukocytosis and splenomegaly, and attenuated bone marrow fibrosis in Jak2V617F and MPLW515L mouse models of MF. Combined treatment of TP-3654 and Ruxolitinib resulted in greater reduction of spleen size, normalization of blood leukocyte counts and abrogation of bone marrow fibrosis in murine models of MF. TP-3654 treatment also preferentially inhibited Jak2V617F mutant hematopoietic progenitors in mice. Mechanistically, we show that TP-3654 treatment significantly inhibits mTORC1, MYC and TGF-ß signaling in Jak2V617F mutant hematopoietic cells and diminishes the expression of fibrotic markers in the bone marrow. Collectively, our results suggest that Pim1 plays an important role in the pathogenesis of MF, and inhibition of Pim1 with TP-3654 might be useful for treatment of MF.
Asunto(s)
Mielofibrosis Primaria/tratamiento farmacológico , Mielofibrosis Primaria/genética , Inhibidores de Proteínas Quinasas/farmacología , Proteínas Proto-Oncogénicas c-pim-1/antagonistas & inhibidores , Proteínas Proto-Oncogénicas c-pim-1/genética , Animales , Línea Celular , Modelos Animales de Enfermedad , Eliminación de Gen , Humanos , Janus Quinasa 2/genética , Ratones , Ratones Noqueados , Inhibidores de Proteínas Quinasas/uso terapéuticoRESUMEN
Replication-dependent canonical histone messenger RNAs (mRNAs) do not terminate with a poly(A) tail at the 3' end. We previously demonstrated that exposure to arsenic, an environmental carcinogen, induces polyadenylation of canonical histone H3.1 mRNA, causing transformation of human cells in vitro. Here we report that polyadenylation of H3.1 mRNA increases H3.1 protein, resulting in displacement of histone variant H3.3 at active promoters, enhancers, and insulator regions, leading to transcriptional deregulation, G2/M cell-cycle arrest, chromosome aneuploidy, and aberrations. In support of these observations, knocking down the expression of H3.3 induced cell transformation, whereas ectopic expression of H3.3 attenuated arsenic-induced cell transformation. Notably, arsenic exposure also resulted in displacement of H3.3 from active promoters, enhancers, and insulator regions. These data suggest that H3.3 displacement might be central to carcinogenesis caused by polyadenylation of H3.1 mRNA upon arsenic exposure. Our findings illustrate the importance of proper histone stoichiometry in maintaining genome integrity.