Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 97
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Nat Commun ; 15(1): 5141, 2024 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-38902262

RESUMEN

A major challenge in protein design is to augment existing functional proteins with multiple property enhancements. Altering several properties likely necessitates numerous primary sequence changes, and novel methods are needed to accurately predict combinations of mutations that maintain or enhance function. Models of sequence co-variation (e.g., EVcouplings), which leverage extensive information about various protein properties and activities from homologous protein sequences, have proven effective for many applications including structure determination and mutation effect prediction. We apply EVcouplings to computationally design variants of the model protein TEM-1 ß-lactamase. Nearly all the 14 experimentally characterized designs were functional, including one with 84 mutations from the nearest natural homolog. The designs also had large increases in thermostability, increased activity on multiple substrates, and nearly identical structure to the wild type enzyme. This study highlights the efficacy of evolutionary models in guiding large sequence alterations to generate functional diversity for protein design applications.


Asunto(s)
Evolución Molecular , Mutación , Ingeniería de Proteínas , beta-Lactamasas , beta-Lactamasas/genética , beta-Lactamasas/metabolismo , beta-Lactamasas/química , Ingeniería de Proteínas/métodos , Modelos Moleculares , Secuencia de Aminoácidos , Estabilidad de Enzimas , Conformación Proteica
2.
Alzheimers Dement ; 2024 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-38923692

RESUMEN

INTRODUCTION: Variants of uncertain significance (VUS) surged with affordable genetic testing, posing challenges for determining pathogenicity. We examine the pathogenicity of a novel VUS P93S in Annexin A11 (ANXA11) - an amyotrophic lateral sclerosis/frontotemporal dementia-associated gene - in a corticobasal syndrome kindred. Established ANXA11 mutations cause ANXA11 aggregation, altered lysosomal-RNA granule co-trafficking, and transactive response DNA binding protein of 43 kDa (TDP-43) mis-localization. METHODS: We described the clinical presentation and explored the phenotypic diversity of ANXA11 variants. P93S's effect on ANXA11 function and TDP-43 biology was characterized in induced pluripotent stem cell-derived neurons alongside multiomic neuronal and microglial profiling. RESULTS: ANXA11 mutations were linked to corticobasal syndrome cases. P93S led to decreased lysosome colocalization, neuritic RNA, and nuclear TDP-43 with cryptic exon expression. Multiomic microglial signatures implicated immune dysregulation and interferon signaling pathways. DISCUSSION: This study establishes ANXA11 P93S pathogenicity, broadens the phenotypic spectrum of ANXA11 mutations, underscores neuronal and microglial dysfunction in ANXA11 pathophysiology, and demonstrates the potential of cellular models to determine variant pathogenicity. HIGHLIGHTS: ANXA11 P93S is a pathogenic variant. Corticobasal syndrome is part of the ANXA11 phenotypic spectrum. Hybridization chain reaction fluorescence in situ hybridization (HCR FISH) is a new tool for the detection of cryptic exons due to TDP-43-related loss of splicing regulation. Microglial ANXA11 and related immune pathways are important drivers of disease. Cellular models are powerful tools for adjudicating variants of uncertain significance.

3.
Cell ; 187(11): 2735-2745.e12, 2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38723628

RESUMEN

Hepatitis B virus (HBV) is a small double-stranded DNA virus that chronically infects 296 million people. Over half of its compact genome encodes proteins in two overlapping reading frames, and during evolution, multiple selective pressures can act on shared nucleotides. This study combines an RNA-based HBV cell culture system with deep mutational scanning (DMS) to uncouple cis- and trans-acting sequence requirements in the HBV genome. The results support a leaky ribosome scanning model for polymerase translation, provide a fitness map of the HBV polymerase at single-nucleotide resolution, and identify conserved prolines adjacent to the HBV polymerase termination codon that stall ribosomes. Further experiments indicated that stalled ribosomes tether the nascent polymerase to its template RNA, ensuring cis-preferential RNA packaging and reverse transcription of the HBV genome.


Asunto(s)
Virus de la Hepatitis B , Transcripción Reversa , Humanos , Genoma Viral/genética , Virus de la Hepatitis B/genética , Mutación , Ribosomas/metabolismo , ARN Viral/genética , ARN Viral/metabolismo , Línea Celular
4.
ArXiv ; 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38699161

RESUMEN

Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them. Here, to address these issues, we provide guidelines and recommendations for the release of novel VEPs. Emphasising open-source availability, transparent methodologies, clear variant effect score interpretations, standardised scales, accessible predictions, and rigorous training data disclosure, we aim to improve the usability and interpretability of VEPs, and promote their integration into analysis and evaluation pipelines. We also provide a large, categorised list of currently available VEPs, aiming to facilitate the discovery and encourage the usage of novel methods within the scientific community.

5.
Nat Commun ; 15(1): 1639, 2024 Feb 22.
Artículo en Inglés | MEDLINE | ID: mdl-38388493

RESUMEN

Recent developments in protein design rely on large neural networks with up to 100s of millions of parameters, yet it is unclear which residue dependencies are critical for determining protein function. Here, we show that amino acid preferences at individual residues-without accounting for mutation interactions-explain much and sometimes virtually all of the combinatorial mutation effects across 8 datasets (R2 ~ 78-98%). Hence, few observations (~100 times the number of mutated residues) enable accurate prediction of held-out variant effects (Pearson r > 0.80). We hypothesized that the local structural contexts around a residue could be sufficient to predict mutation preferences, and develop an unsupervised approach termed CoVES (Combinatorial Variant Effects from Structure). Our results suggest that CoVES outperforms not just model-free methods but also similarly to complex models for creating functional and diverse protein variants. CoVES offers an effective alternative to complicated models for identifying functional protein mutations.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Proteínas/metabolismo , Aminoácidos/química , Mutación
6.
Nat Struct Mol Biol ; 31(4): 667-677, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38326651

RESUMEN

The orphan G protein-coupled receptor (GPCR) GPR161 plays a central role in development by suppressing Hedgehog signaling. The fundamental basis of how GPR161 is activated remains unclear. Here, we determined a cryogenic-electron microscopy structure of active human GPR161 bound to heterotrimeric Gs. This structure revealed an extracellular loop 2 that occupies the canonical GPCR orthosteric ligand pocket. Furthermore, a sterol that binds adjacent to transmembrane helices 6 and 7 stabilizes a GPR161 conformation required for Gs coupling. Mutations that prevent sterol binding to GPR161 suppress Gs-mediated signaling. These mutants retain the ability to suppress GLI2 transcription factor accumulation in primary cilia, a key function of ciliary GPR161. By contrast, a protein kinase A-binding site in the GPR161 C terminus is critical in suppressing GLI2 ciliary accumulation. Our work highlights how structural features of GPR161 interface with the Hedgehog pathway and sets a foundation to understand the role of GPR161 function in other signaling pathways.


Asunto(s)
Proteínas Hedgehog , Transducción de Señal , Humanos , Proteínas Hedgehog/genética , Receptores Acoplados a Proteínas G/metabolismo , Mutación , Cilios/metabolismo
7.
Nat Biotechnol ; 42(2): 216-228, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38361074

RESUMEN

Recent breakthroughs in AI coupled with the rapid accumulation of protein sequence and structure data have radically transformed computational protein design. New methods promise to escape the constraints of natural and laboratory evolution, accelerating the generation of proteins for applications in biotechnology and medicine. To make sense of the exploding diversity of machine learning approaches, we introduce a unifying framework that classifies models on the basis of their use of three core data modalities: sequences, structures and functional labels. We discuss the new capabilities and outstanding challenges for the practical design of enzymes, antibodies, vaccines, nanomachines and more. We then highlight trends shaping the future of this field, from large-scale assays to more robust benchmarks, multimodal foundation models, enhanced sampling strategies and laboratory automation.


Asunto(s)
Aprendizaje Automático , Proteínas , Biotecnología , Secuencia de Aminoácidos , Anticuerpos
8.
Res Sq ; 2024 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-38260496

RESUMEN

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome1-6. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data7 and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders8 from potentially healthy individuals9. popEVE identifies 442 genes in patients this developmental disorder cohort, including evidence of 123 novel genetic disorders, many without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. A majority of these variants are close to interacting partners in 3D complexes. Preliminary analyses on child exomes indicate that popEVE can identify candidate variants without the need for inheritance labels. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable.

9.
Nat Methods ; 21(3): 531-540, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38279009

RESUMEN

Analysis across a growing number of single-cell perturbation datasets is hampered by poor data interoperability. To facilitate development and benchmarking of computational methods, we collect a set of 44 publicly available single-cell perturbation-response datasets with molecular readouts, including transcriptomics, proteomics and epigenomics. We apply uniform quality control pipelines and harmonize feature annotations. The resulting information resource, scPerturb, enables development and testing of computational methods, and facilitates comparison and integration across datasets. We describe energy statistics (E-statistics) for quantification of perturbation effects and significance testing, and demonstrate E-distance as a general distance measure between sets of single-cell expression profiles. We illustrate the application of E-statistics for quantifying similarity and efficacy of perturbations. The perturbation-response datasets and E-statistics computation software are publicly available at scperturb.org. This work provides an information resource for researchers working with single-cell perturbation data and recommendations for experimental design, including optimal cell counts and read depth.


Asunto(s)
Proteómica , Programas Informáticos , Perfilación de la Expresión Génica/métodos , Epigenómica , Análisis de la Célula Individual
10.
medRxiv ; 2023 Nov 28.
Artículo en Inglés | MEDLINE | ID: mdl-38076790

RESUMEN

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders from potentially healthy individuals. popEVE identifies 442 genes in a cohort of developmental disorder cases, including evidence of 119 novel genetic disorders without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable. Interactive web viewer and downloads available at pop.evemodel.org.

11.
bioRxiv ; 2023 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-38106034

RESUMEN

Protein design holds immense potential for optimizing naturally occurring proteins, with broad applications in drug discovery, material design, and sustainability. However, computational methods for protein engineering are confronted with significant challenges, such as an expansive design space, sparse functional regions, and a scarcity of available labels. These issues are further exacerbated in practice by the fact most real-life design scenarios necessitate the simultaneous optimization of multiple properties. In this work, we introduce ProteinNPT, a non-parametric transformer variant tailored to protein sequences and particularly suited to label-scarce and multi-task learning settings. We first focus on the supervised fitness prediction setting and develop several cross-validation schemes which support robust performance assessment. We subsequently reimplement prior top-performing baselines, introduce several extensions of these baselines by integrating diverse branches of the protein engineering literature, and demonstrate that ProteinNPT consistently outperforms all of them across a diverse set of protein property prediction tasks. Finally, we demonstrate the value of our approach for iterative protein design across extensive in silico Bayesian optimization and conditional sampling experiments.

12.
bioRxiv ; 2023 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-38106144

RESUMEN

Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite a surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis.

13.
bioRxiv ; 2023 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-38014077

RESUMEN

When nature maintains or evolves a gene's function over millions of years at scale, it produces a diversity of homologous sequences whose patterns of conservation and change contain rich structural, functional, and historical information about the gene. However, natural gene diversity likely excludes vast regions of functional sequence space and includes phylogenetic and evolutionary eccentricities, limiting what information we can extract. We introduce an accessible experimental approach for compressing long-term gene evolution to laboratory timescales, allowing for the direct observation of extensive adaptation and divergence followed by inference of structural, functional, and environmental constraints for any selectable gene. To enable this approach, we developed a new orthogonal DNA replication (OrthoRep) system that durably hypermutates chosen genes at a rate of >10 -4 substitutions per base in vivo . When OrthoRep was used to evolve a conditionally essential maladapted enzyme, we obtained thousands of unique multi-mutation sequences with many pairs >60 amino acids apart (>15% divergence), revealing known and new factors influencing enzyme adaptation. The fitness of evolved sequences was not predictable by advanced machine learning models trained on natural variation. We suggest that OrthoRep supports the prospective and systematic discovery of constraints shaping gene evolution, uncovering of new regions in fitness landscapes, and general applications in biomolecular engineering.

14.
Nature ; 622(7984): 818-825, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37821700

RESUMEN

Effective pandemic preparedness relies on anticipating viral mutations that are able to evade host immune responses to facilitate vaccine and therapeutic design. However, current strategies for viral evolution prediction are not available early in a pandemic-experimental approaches require host polyclonal antibodies to test against1-16, and existing computational methods draw heavily from current strain prevalence to make reliable predictions of variants of concern17-19. To address this, we developed EVEscape, a generalizable modular framework that combines fitness predictions from a deep learning model of historical sequences with biophysical and structural information. EVEscape quantifies the viral escape potential of mutations at scale and has the advantage of being applicable before surveillance sequencing, experimental scans or three-dimensional structures of antibody complexes are available. We demonstrate that EVEscape, trained on sequences available before 2020, is as accurate as high-throughput experimental scans at anticipating pandemic variation for SARS-CoV-2 and is generalizable to other viruses including influenza, HIV and understudied viruses with pandemic potential such as Lassa and Nipah. We provide continually revised escape scores for all current strains of SARS-CoV-2 and predict probable further mutations to forecast emerging strains as a tool for continuing vaccine development ( evescape.org ).


Asunto(s)
Evolución Molecular , Predicción , Evasión Inmune , Mutación , Pandemias , Virus , Humanos , Diseño de Fármacos , Infecciones por VIH , Evasión Inmune/genética , Evasión Inmune/inmunología , Gripe Humana , Virus Lassa , Virus Nipah , SARS-CoV-2/genética , SARS-CoV-2/inmunología , Vacunas Virales/inmunología , Virus/genética , Virus/inmunología
15.
Res Sq ; 2023 Oct 19.
Artículo en Inglés | MEDLINE | ID: mdl-37886540

RESUMEN

As genetic testing has become more accessible and affordable, variants of uncertain significance (VUS) are increasingly identified, and determining whether these variants play causal roles in disease is a major challenge. The known disease-associated Annexin A11 (ANXA11) mutations result in ANXA11 aggregation, alterations in lysosomal-RNA granule co-trafficking, and TDP-43 mis-localization and present as amyotrophic lateral sclerosis or frontotemporal dementia. We identified a novel VUS in ANXA11 (P93S) in a kindred with corticobasal syndrome and unique radiographic features that segregated with disease. We then queried neurodegenerative disorder clinic databases to identify the phenotypic spread of ANXA11 mutations. Multi-modal computational analysis of this variant was performed and the effect of this VUS on ANXA11 function and TDP-43 biology was characterized in iPSC-derived neurons. Single-cell sequencing and proteomic analysis of iPSC-derived neurons and microglia were used to determine the multiomic signature of this VUS. Mutations in ANXA11 were found in association with clinically diagnosed corticobasal syndrome, thereby establishing corticobasal syndrome as part of ANXA11 clinical spectrum. In iPSC-derived neurons expressing mutant ANXA11, we found decreased colocalization of lysosomes and decreased neuritic RNA as well as decreased nuclear TDP-43 and increased formation of cryptic exons compared to controls. Multiomic assessment of the P93S variant in iPSC-derived neurons and microglia indicates that the pathogenic omic signature in neurons is modest compared to microglia. Additionally, omic studies reveal that immune dysregulation and interferon signaling pathways in microglia are central to disease. Collectively, these findings identify a new pathogenic variant in ANXA11, expand the range of clinical syndromes caused by ANXA11 mutations, and implicate both neuronal and microglia dysfunction in ANXA11 pathophysiology. This work illustrates the potential for iPSC-derived cellular models to revolutionize the variant annotation process and provides a generalizable approach to determining causality of novel variants across genes.

16.
Nature ; 620(7972): 47-60, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37532811

RESUMEN

Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI toolsneed a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.


Asunto(s)
Inteligencia Artificial , Proyectos de Investigación , Inteligencia Artificial/normas , Inteligencia Artificial/tendencias , Conjuntos de Datos como Asunto , Aprendizaje Profundo , Proyectos de Investigación/normas , Proyectos de Investigación/tendencias , Aprendizaje Automático no Supervisado
19.
Genome Biol ; 24(1): 147, 2023 07 03.
Artículo en Inglés | MEDLINE | ID: mdl-37394429

RESUMEN

Sequencing has revealed hundreds of millions of human genetic variants, and continued efforts will only add to this variant avalanche. Insufficient information exists to interpret the effects of most variants, limiting opportunities for precision medicine and comprehension of genome function. A solution lies in experimental assessment of the functional effect of variants, which can reveal their biological and clinical impact. However, variant effect assays have generally been undertaken reactively for individual variants only after and, in most cases long after, their first observation. Now, multiplexed assays of variant effect can characterise massive numbers of variants simultaneously, yielding variant effect maps that reveal the function of every possible single nucleotide change in a gene or regulatory element. Generating maps for every protein encoding gene and regulatory element in the human genome would create an 'Atlas' of variant effect maps and transform our understanding of genetics and usher in a new era of nucleotide-resolution functional knowledge of the genome. An Atlas would reveal the fundamental biology of the human genome, inform human evolution, empower the development and use of therapeutics and maximize the utility of genomics for diagnosing and treating disease. The Atlas of Variant Effects Alliance is an international collaborative group comprising hundreds of researchers, technologists and clinicians dedicated to realising an Atlas of Variant Effects to help deliver on the promise of genomics.


Asunto(s)
Variación Genética , Genómica , Humanos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Medicina de Precisión
20.
bioRxiv ; 2023 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-37292845

RESUMEN

The orphan G protein-coupled receptor (GPCR) GPR161 is enriched in primary cilia, where it plays a central role in suppressing Hedgehog signaling1. GPR161 mutations lead to developmental defects and cancers2,3,4. The fundamental basis of how GPR161 is activated, including potential endogenous activators and pathway-relevant signal transducers, remains unclear. To elucidate GPR161 function, we determined a cryogenic-electron microscopy structure of active GPR161 bound to the heterotrimeric G protein complex Gs. This structure revealed an extracellular loop 2 that occupies the canonical GPCR orthosteric ligand pocket. Furthermore, we identify a sterol that binds to a conserved extrahelical site adjacent to transmembrane helices 6 and 7 and stabilizes a GPR161 conformation required for Gs coupling. Mutations that prevent sterol binding to GPR161 suppress cAMP pathway activation. Surprisingly, these mutants retain the ability to suppress GLI2 transcription factor accumulation in cilia, a key function of ciliary GPR161 in Hedgehog pathway suppression. By contrast, a protein kinase A-binding site in the GPR161 C-terminus is critical in suppressing GLI2 ciliary accumulation. Our work highlights how unique structural features of GPR161 interface with the Hedgehog pathway and sets a foundation to understand the broader role of GPR161 function in other signaling pathways.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...