Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 95
Filtrar
1.
Cell ; 187(11): 2735-2745.e12, 2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38723628

RESUMO

Hepatitis B virus (HBV) is a small double-stranded DNA virus that chronically infects 296 million people. Over half of its compact genome encodes proteins in two overlapping reading frames, and during evolution, multiple selective pressures can act on shared nucleotides. This study combines an RNA-based HBV cell culture system with deep mutational scanning (DMS) to uncouple cis- and trans-acting sequence requirements in the HBV genome. The results support a leaky ribosome scanning model for polymerase translation, provide a fitness map of the HBV polymerase at single-nucleotide resolution, and identify conserved prolines adjacent to the HBV polymerase termination codon that stall ribosomes. Further experiments indicated that stalled ribosomes tether the nascent polymerase to its template RNA, ensuring cis-preferential RNA packaging and reverse transcription of the HBV genome.


Assuntos
Vírus da Hepatite B , Transcrição Reversa , Humanos , Genoma Viral/genética , Vírus da Hepatite B/genética , Mutação , Ribossomos/metabolismo , RNA Viral/genética , RNA Viral/metabolismo , Linhagem Celular
2.
ArXiv ; 2024 Apr 16.
Artigo em Inglês | MEDLINE | ID: mdl-38699161

RESUMO

Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them. Here, to address these issues, we provide guidelines and recommendations for the release of novel VEPs. Emphasising open-source availability, transparent methodologies, clear variant effect score interpretations, standardised scales, accessible predictions, and rigorous training data disclosure, we aim to improve the usability and interpretability of VEPs, and promote their integration into analysis and evaluation pipelines. We also provide a large, categorised list of currently available VEPs, aiming to facilitate the discovery and encourage the usage of novel methods within the scientific community.

3.
Nat Struct Mol Biol ; 31(4): 667-677, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38326651

RESUMO

The orphan G protein-coupled receptor (GPCR) GPR161 plays a central role in development by suppressing Hedgehog signaling. The fundamental basis of how GPR161 is activated remains unclear. Here, we determined a cryogenic-electron microscopy structure of active human GPR161 bound to heterotrimeric Gs. This structure revealed an extracellular loop 2 that occupies the canonical GPCR orthosteric ligand pocket. Furthermore, a sterol that binds adjacent to transmembrane helices 6 and 7 stabilizes a GPR161 conformation required for Gs coupling. Mutations that prevent sterol binding to GPR161 suppress Gs-mediated signaling. These mutants retain the ability to suppress GLI2 transcription factor accumulation in primary cilia, a key function of ciliary GPR161. By contrast, a protein kinase A-binding site in the GPR161 C terminus is critical in suppressing GLI2 ciliary accumulation. Our work highlights how structural features of GPR161 interface with the Hedgehog pathway and sets a foundation to understand the role of GPR161 function in other signaling pathways.


Assuntos
Proteínas Hedgehog , Transdução de Sinais , Humanos , Proteínas Hedgehog/genética , Receptores Acoplados a Proteínas G/metabolismo , Mutação , Cílios/metabolismo
4.
Nat Biotechnol ; 42(2): 216-228, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38361074

RESUMO

Recent breakthroughs in AI coupled with the rapid accumulation of protein sequence and structure data have radically transformed computational protein design. New methods promise to escape the constraints of natural and laboratory evolution, accelerating the generation of proteins for applications in biotechnology and medicine. To make sense of the exploding diversity of machine learning approaches, we introduce a unifying framework that classifies models on the basis of their use of three core data modalities: sequences, structures and functional labels. We discuss the new capabilities and outstanding challenges for the practical design of enzymes, antibodies, vaccines, nanomachines and more. We then highlight trends shaping the future of this field, from large-scale assays to more robust benchmarks, multimodal foundation models, enhanced sampling strategies and laboratory automation.


Assuntos
Aprendizado de Máquina , Proteínas , Biotecnologia , Sequência de Aminoácidos , Anticorpos
5.
Nat Commun ; 15(1): 1639, 2024 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-38388493

RESUMO

Recent developments in protein design rely on large neural networks with up to 100s of millions of parameters, yet it is unclear which residue dependencies are critical for determining protein function. Here, we show that amino acid preferences at individual residues-without accounting for mutation interactions-explain much and sometimes virtually all of the combinatorial mutation effects across 8 datasets (R2 ~ 78-98%). Hence, few observations (~100 times the number of mutated residues) enable accurate prediction of held-out variant effects (Pearson r > 0.80). We hypothesized that the local structural contexts around a residue could be sufficient to predict mutation preferences, and develop an unsupervised approach termed CoVES (Combinatorial Variant Effects from Structure). Our results suggest that CoVES outperforms not just model-free methods but also similarly to complex models for creating functional and diverse protein variants. CoVES offers an effective alternative to complicated models for identifying functional protein mutations.


Assuntos
Redes Neurais de Computação , Proteínas , Proteínas/metabolismo , Aminoácidos/química , Mutação
6.
Nat Methods ; 21(3): 531-540, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38279009

RESUMO

Analysis across a growing number of single-cell perturbation datasets is hampered by poor data interoperability. To facilitate development and benchmarking of computational methods, we collect a set of 44 publicly available single-cell perturbation-response datasets with molecular readouts, including transcriptomics, proteomics and epigenomics. We apply uniform quality control pipelines and harmonize feature annotations. The resulting information resource, scPerturb, enables development and testing of computational methods, and facilitates comparison and integration across datasets. We describe energy statistics (E-statistics) for quantification of perturbation effects and significance testing, and demonstrate E-distance as a general distance measure between sets of single-cell expression profiles. We illustrate the application of E-statistics for quantifying similarity and efficacy of perturbations. The perturbation-response datasets and E-statistics computation software are publicly available at scperturb.org. This work provides an information resource for researchers working with single-cell perturbation data and recommendations for experimental design, including optimal cell counts and read depth.


Assuntos
Proteômica , Software , Perfilação da Expressão Gênica/métodos , Epigenômica , Análise de Célula Única
7.
Res Sq ; 2024 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-38260496

RESUMO

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome1-6. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data7 and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders8 from potentially healthy individuals9. popEVE identifies 442 genes in patients this developmental disorder cohort, including evidence of 123 novel genetic disorders, many without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. A majority of these variants are close to interacting partners in 3D complexes. Preliminary analyses on child exomes indicate that popEVE can identify candidate variants without the need for inheritance labels. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable.

8.
medRxiv ; 2023 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-38076790

RESUMO

Identifying causal mutations accelerates genetic disease diagnosis, and therapeutic development. Missense variants present a bottleneck in genetic diagnoses as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are increasingly successful at prediction for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome. To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data and achieves state-of-the-art performance at ranking variants by severity to distinguish patients with severe developmental disorders from potentially healthy individuals. popEVE identifies 442 genes in a cohort of developmental disorder cases, including evidence of 119 novel genetic disorders without the need for gene-level enrichment and without overestimating the prevalence of pathogenic variants in the population. By placing variants on a unified scale, our model offers a comprehensive perspective on the distribution of fitness effects across the entire proteome and the broader human population. popEVE provides compelling evidence for genetic diagnoses even in exceptionally rare single-patient disorders where conventional techniques relying on repeated observations may not be applicable. Interactive web viewer and downloads available at pop.evemodel.org.

9.
bioRxiv ; 2023 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-38106034

RESUMO

Protein design holds immense potential for optimizing naturally occurring proteins, with broad applications in drug discovery, material design, and sustainability. However, computational methods for protein engineering are confronted with significant challenges, such as an expansive design space, sparse functional regions, and a scarcity of available labels. These issues are further exacerbated in practice by the fact most real-life design scenarios necessitate the simultaneous optimization of multiple properties. In this work, we introduce ProteinNPT, a non-parametric transformer variant tailored to protein sequences and particularly suited to label-scarce and multi-task learning settings. We first focus on the supervised fitness prediction setting and develop several cross-validation schemes which support robust performance assessment. We subsequently reimplement prior top-performing baselines, introduce several extensions of these baselines by integrating diverse branches of the protein engineering literature, and demonstrate that ProteinNPT consistently outperforms all of them across a diverse set of protein property prediction tasks. Finally, we demonstrate the value of our approach for iterative protein design across extensive in silico Bayesian optimization and conditional sampling experiments.

10.
bioRxiv ; 2023 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-38106144

RESUMO

Predicting the effects of mutations in proteins is critical to many applications, from understanding genetic disease to designing novel proteins that can address our most pressing challenges in climate, agriculture and healthcare. Despite a surge in machine learning-based protein models to tackle these questions, an assessment of their respective benefits is challenging due to the use of distinct, often contrived, experimental datasets, and the variable performance of models across different protein families. Addressing these challenges requires scale. To that end we introduce ProteinGym, a large-scale and holistic set of benchmarks specifically designed for protein fitness prediction and design. It encompasses both a broad collection of over 250 standardized deep mutational scanning assays, spanning millions of mutated sequences, as well as curated clinical datasets providing high-quality expert annotations about mutation effects. We devise a robust evaluation framework that combines metrics for both fitness prediction and design, factors in known limitations of the underlying experimental methods, and covers both zero-shot and supervised settings. We report the performance of a diverse set of over 70 high-performing models from various subfields (eg., alignment-based, inverse folding) into a unified benchmark suite. We open source the corresponding codebase, datasets, MSAs, structures, model predictions and develop a user-friendly website that facilitates data access and analysis.

11.
bioRxiv ; 2023 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-38014077

RESUMO

When nature maintains or evolves a gene's function over millions of years at scale, it produces a diversity of homologous sequences whose patterns of conservation and change contain rich structural, functional, and historical information about the gene. However, natural gene diversity likely excludes vast regions of functional sequence space and includes phylogenetic and evolutionary eccentricities, limiting what information we can extract. We introduce an accessible experimental approach for compressing long-term gene evolution to laboratory timescales, allowing for the direct observation of extensive adaptation and divergence followed by inference of structural, functional, and environmental constraints for any selectable gene. To enable this approach, we developed a new orthogonal DNA replication (OrthoRep) system that durably hypermutates chosen genes at a rate of >10 -4 substitutions per base in vivo . When OrthoRep was used to evolve a conditionally essential maladapted enzyme, we obtained thousands of unique multi-mutation sequences with many pairs >60 amino acids apart (>15% divergence), revealing known and new factors influencing enzyme adaptation. The fitness of evolved sequences was not predictable by advanced machine learning models trained on natural variation. We suggest that OrthoRep supports the prospective and systematic discovery of constraints shaping gene evolution, uncovering of new regions in fitness landscapes, and general applications in biomolecular engineering.

12.
Res Sq ; 2023 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-37886540

RESUMO

As genetic testing has become more accessible and affordable, variants of uncertain significance (VUS) are increasingly identified, and determining whether these variants play causal roles in disease is a major challenge. The known disease-associated Annexin A11 (ANXA11) mutations result in ANXA11 aggregation, alterations in lysosomal-RNA granule co-trafficking, and TDP-43 mis-localization and present as amyotrophic lateral sclerosis or frontotemporal dementia. We identified a novel VUS in ANXA11 (P93S) in a kindred with corticobasal syndrome and unique radiographic features that segregated with disease. We then queried neurodegenerative disorder clinic databases to identify the phenotypic spread of ANXA11 mutations. Multi-modal computational analysis of this variant was performed and the effect of this VUS on ANXA11 function and TDP-43 biology was characterized in iPSC-derived neurons. Single-cell sequencing and proteomic analysis of iPSC-derived neurons and microglia were used to determine the multiomic signature of this VUS. Mutations in ANXA11 were found in association with clinically diagnosed corticobasal syndrome, thereby establishing corticobasal syndrome as part of ANXA11 clinical spectrum. In iPSC-derived neurons expressing mutant ANXA11, we found decreased colocalization of lysosomes and decreased neuritic RNA as well as decreased nuclear TDP-43 and increased formation of cryptic exons compared to controls. Multiomic assessment of the P93S variant in iPSC-derived neurons and microglia indicates that the pathogenic omic signature in neurons is modest compared to microglia. Additionally, omic studies reveal that immune dysregulation and interferon signaling pathways in microglia are central to disease. Collectively, these findings identify a new pathogenic variant in ANXA11, expand the range of clinical syndromes caused by ANXA11 mutations, and implicate both neuronal and microglia dysfunction in ANXA11 pathophysiology. This work illustrates the potential for iPSC-derived cellular models to revolutionize the variant annotation process and provides a generalizable approach to determining causality of novel variants across genes.

13.
Nature ; 622(7984): 818-825, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37821700

RESUMO

Effective pandemic preparedness relies on anticipating viral mutations that are able to evade host immune responses to facilitate vaccine and therapeutic design. However, current strategies for viral evolution prediction are not available early in a pandemic-experimental approaches require host polyclonal antibodies to test against1-16, and existing computational methods draw heavily from current strain prevalence to make reliable predictions of variants of concern17-19. To address this, we developed EVEscape, a generalizable modular framework that combines fitness predictions from a deep learning model of historical sequences with biophysical and structural information. EVEscape quantifies the viral escape potential of mutations at scale and has the advantage of being applicable before surveillance sequencing, experimental scans or three-dimensional structures of antibody complexes are available. We demonstrate that EVEscape, trained on sequences available before 2020, is as accurate as high-throughput experimental scans at anticipating pandemic variation for SARS-CoV-2 and is generalizable to other viruses including influenza, HIV and understudied viruses with pandemic potential such as Lassa and Nipah. We provide continually revised escape scores for all current strains of SARS-CoV-2 and predict probable further mutations to forecast emerging strains as a tool for continuing vaccine development ( evescape.org ).


Assuntos
Evolução Molecular , Previsões , Evasão da Resposta Imune , Mutação , Pandemias , Vírus , Humanos , Desenho de Fármacos , Infecções por HIV , Evasão da Resposta Imune/genética , Evasão da Resposta Imune/imunologia , Influenza Humana , Vírus Lassa , Vírus Nipah , SARS-CoV-2/genética , SARS-CoV-2/imunologia , Vacinas Virais/imunologia , Vírus/genética , Vírus/imunologia
14.
Nature ; 620(7972): 47-60, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37532811

RESUMO

Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI toolsneed a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.


Assuntos
Inteligência Artificial , Projetos de Pesquisa , Inteligência Artificial/normas , Inteligência Artificial/tendências , Conjuntos de Dados como Assunto , Aprendizado Profundo , Projetos de Pesquisa/normas , Projetos de Pesquisa/tendências , Aprendizado de Máquina não Supervisionado
17.
Genome Biol ; 24(1): 147, 2023 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-37394429

RESUMO

Sequencing has revealed hundreds of millions of human genetic variants, and continued efforts will only add to this variant avalanche. Insufficient information exists to interpret the effects of most variants, limiting opportunities for precision medicine and comprehension of genome function. A solution lies in experimental assessment of the functional effect of variants, which can reveal their biological and clinical impact. However, variant effect assays have generally been undertaken reactively for individual variants only after and, in most cases long after, their first observation. Now, multiplexed assays of variant effect can characterise massive numbers of variants simultaneously, yielding variant effect maps that reveal the function of every possible single nucleotide change in a gene or regulatory element. Generating maps for every protein encoding gene and regulatory element in the human genome would create an 'Atlas' of variant effect maps and transform our understanding of genetics and usher in a new era of nucleotide-resolution functional knowledge of the genome. An Atlas would reveal the fundamental biology of the human genome, inform human evolution, empower the development and use of therapeutics and maximize the utility of genomics for diagnosing and treating disease. The Atlas of Variant Effects Alliance is an international collaborative group comprising hundreds of researchers, technologists and clinicians dedicated to realising an Atlas of Variant Effects to help deliver on the promise of genomics.


Assuntos
Variação Genética , Genômica , Humanos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Medicina de Precisão
18.
bioRxiv ; 2023 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-37292845

RESUMO

The orphan G protein-coupled receptor (GPCR) GPR161 is enriched in primary cilia, where it plays a central role in suppressing Hedgehog signaling1. GPR161 mutations lead to developmental defects and cancers2,3,4. The fundamental basis of how GPR161 is activated, including potential endogenous activators and pathway-relevant signal transducers, remains unclear. To elucidate GPR161 function, we determined a cryogenic-electron microscopy structure of active GPR161 bound to the heterotrimeric G protein complex Gs. This structure revealed an extracellular loop 2 that occupies the canonical GPCR orthosteric ligand pocket. Furthermore, we identify a sterol that binds to a conserved extrahelical site adjacent to transmembrane helices 6 and 7 and stabilizes a GPR161 conformation required for Gs coupling. Mutations that prevent sterol binding to GPR161 suppress cAMP pathway activation. Surprisingly, these mutants retain the ability to suppress GLI2 transcription factor accumulation in cilia, a key function of ciliary GPR161 in Hedgehog pathway suppression. By contrast, a protein kinase A-binding site in the GPR161 C-terminus is critical in suppressing GLI2 ciliary accumulation. Our work highlights how unique structural features of GPR161 interface with the Hedgehog pathway and sets a foundation to understand the broader role of GPR161 function in other signaling pathways.

19.
Nat Med ; 29(5): 1113-1122, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37156936

RESUMO

Pancreatic cancer is an aggressive disease that typically presents late with poor outcomes, indicating a pronounced need for early detection. In this study, we applied artificial intelligence methods to clinical data from 6 million patients (24,000 pancreatic cancer cases) in Denmark (Danish National Patient Registry (DNPR)) and from 3 million patients (3,900 cases) in the United States (US Veterans Affairs (US-VA)). We trained machine learning models on the sequence of disease codes in clinical histories and tested prediction of cancer occurrence within incremental time windows (CancerRiskNet). For cancer occurrence within 36 months, the performance of the best DNPR model has area under the receiver operating characteristic (AUROC) curve = 0.88 and decreases to AUROC (3m) = 0.83 when disease events within 3 months before cancer diagnosis are excluded from training, with an estimated relative risk of 59 for 1,000 highest-risk patients older than age 50 years. Cross-application of the Danish model to US-VA data had lower performance (AUROC = 0.71), and retraining was needed to improve performance (AUROC = 0.78, AUROC (3m) = 0.76). These results improve the ability to design realistic surveillance programs for patients at elevated risk, potentially benefiting lifespan and quality of life by early detection of this aggressive cancer.


Assuntos
Aprendizado Profundo , Neoplasias Pancreáticas , Humanos , Pessoa de Meia-Idade , Inteligência Artificial , Qualidade de Vida , Neoplasias Pancreáticas/diagnóstico , Neoplasias Pancreáticas/epidemiologia , Algoritmos , Neoplasias Pancreáticas
20.
bioRxiv ; 2023 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-37214973

RESUMO

Designing optimized proteins is important for a range of practical applications. Protein design is a rapidly developing field that would benefit from approaches that enable many changes in the amino acid primary sequence, rather than a small number of mutations, while maintaining structure and enhancing function. Homologous protein sequences contain extensive information about various protein properties and activities that have emerged over billions of years of evolution. Evolutionary models of sequence co-variation, derived from a set of homologous sequences, have proven effective in a range of applications including structure determination and mutation effect prediction. In this work we apply one of these models (EVcouplings) to computationally design highly divergent variants of the model protein TEM-1 ß-lactamase, and characterize these designs experimentally using multiple biochemical and biophysical assays. Nearly all designed variants were functional, including one with 84 mutations from the nearest natural homolog. Surprisingly, all functional designs had large increases in thermostability and most had a broadening of available substrates. These property enhancements occurred while maintaining a nearly identical structure to the wild type enzyme. Collectively, this work demonstrates that evolutionary models of sequence co-variation (1) are able to capture complex epistatic interactions that successfully guide large sequence departures from natural contexts, and (2) can be applied to generate functional diversity useful for many applications in protein design.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA