Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
1.
bioRxiv ; 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-39005366

RESUMEN

Variant annotation is a crucial objective in mammalian functional genomics. Deep Mutational Scanning (DMS) is a well-established method for annotating human gene variants, but CRISPR base editing (BE) is emerging as an alternative. However, questions remain about how well high-throughput base editing measurements can annotate variant function and the extent of downstream experimental validation required. This study presents the first direct comparison of DMS and BE in the same lab and cell line. Results indicate that focusing on the most likely edits and highest efficiency sgRNAs enhances the agreement between a "gold standard" DMS dataset and a BE screen. A simple filter for sgRNAs making single edits in their window could sufficiently annotate a large proportion of variants directly from sgRNA sequencing of large pools. When multi-edit guides are unavoidable, directly measuring the variants created in the pool, rather than sgRNA abundance, can recover high-quality variant annotation measurements in multiplexed pools. Taken together, our data show an impressive degree of correlation between base editor data and gold standard deep mutational scanning.

2.
ArXiv ; 2024 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-38699164

RESUMEN

Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a new method for inferring the probability distribution from which a sample of biological sequences were drawn for the case where the sequences are composed of elements that admit a natural ordering. Our method is based on Bayesian field theory, a physics-based machine learning approach, and can be regarded as a nonparametric extension of the traditional maximum entropy estimate. As an example, we use it to analyze the aneuploidy data pertaining to gliomas from The Cancer Genome Atlas project. In addition, we demonstrate two follow-up analyses that can be performed with the resulting probability distribution. One of them is to investigate the associations among the sequence sites. This provides us a way to infer the governing biological grammar. The other is to study the global geometry of the probability landscape, which allows us to look at the problem from an evolutionary point of view. It can be seen that this methodology enables us to learn from a sample of sequences about how a biological system or phenomenon in the real world works.

3.
PLoS Biol ; 22(5): e3002594, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38754362

RESUMEN

The standard genetic code defines the rules of translation for nearly every life form on Earth. It also determines the amino acid changes accessible via single-nucleotide mutations, thus influencing protein evolvability-the ability of mutation to bring forth adaptive variation in protein function. One of the most striking features of the standard genetic code is its robustness to mutation, yet it remains an open question whether such robustness facilitates or frustrates protein evolvability. To answer this question, we use data from massively parallel sequence-to-function assays to construct and analyze 6 empirical adaptive landscapes under hundreds of thousands of rewired genetic codes, including those of codon compression schemes relevant to protein engineering and synthetic biology. We find that robust genetic codes tend to enhance protein evolvability by rendering smooth adaptive landscapes with few peaks, which are readily accessible from throughout sequence space. However, the standard genetic code is rarely exceptional in this regard, because many alternative codes render smoother landscapes than the standard code. By constructing low-dimensional visualizations of these landscapes, which each comprise more than 16 million mRNA sequences, we show that such alternative codes radically alter the topological features of the network of high-fitness genotypes. Whereas the genetic codes that optimize evolvability depend to some extent on the detailed relationship between amino acid sequence and protein function, we also uncover general design principles for engineering nonstandard genetic codes for enhanced and diminished evolvability, which may facilitate directed protein evolution experiments and the bio-containment of synthetic organisms, respectively.


Asunto(s)
Evolución Molecular , Código Genético , Proteínas , Proteínas/genética , Proteínas/metabolismo , Mutación/genética , Codón/genética , Modelos Genéticos , Biología Sintética/métodos , Biosíntesis de Proteínas , Ingeniería de Proteínas/métodos
4.
ArXiv ; 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38699161

RESUMEN

Computational methods for assessing the likely impacts of mutations, known as variant effect predictors (VEPs), are widely used in the assessment and interpretation of human genetic variation, as well as in other applications like protein engineering. Many different VEPs have been released to date, and there is tremendous variability in their underlying algorithms and outputs, and in the ways in which the methodologies and predictions are shared. This leads to considerable challenges for end users in knowing which VEPs to use and how to use them. Here, to address these issues, we provide guidelines and recommendations for the release of novel VEPs. Emphasising open-source availability, transparent methodologies, clear variant effect score interpretations, standardised scales, accessible predictions, and rigorous training data disclosure, we aim to improve the usability and interpretability of VEPs, and promote their integration into analysis and evaluation pipelines. We also provide a large, categorised list of currently available VEPs, aiming to facilitate the discovery and encourage the usage of novel methods within the scientific community.

5.
bioRxiv ; 2024 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-38798671

RESUMEN

Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.

6.
bioRxiv ; 2024 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-38798625

RESUMEN

Quantitative models that describe how biological sequences encode functional activities are ubiquitous in modern biology. One important aspect of these models is that they commonly exhibit gauge freedoms, i.e., directions in parameter space that do not affect model predictions. In physics, gauge freedoms arise when physical theories are formulated in ways that respect fundamental symmetries. However, the connections that gauge freedoms in models of sequence-function relationships have to the symmetries of sequence space have yet to be systematically studied. Here we study the gauge freedoms of models that respect a specific symmetry of sequence space: the group of position-specific character permutations. We find that gauge freedoms arise when model parameters transform under redundant irreducible matrix representations of this group. Based on this finding, we describe an "embedding distillation" procedure that enables analytic calculation of the number of independent gauge freedoms, as well as efficient computation of a sparse basis for the space of gauge freedoms. We also study how parameter transformation behavior affects parameter interpretability. We find that in many (and possibly all) nontrivial models, the ability to interpret individual model parameters as quantifying intrinsic allelic effects requires that gauge freedoms be present. This finding establishes an incompatibility between two distinct notions of parameter interpretability. Our work thus advances the understanding of symmetries, gauge freedoms, and parameter interpretability in sequence-function relationships.

7.
Nat Commun ; 15(1): 1880, 2024 Feb 29.
Artículo en Inglés | MEDLINE | ID: mdl-38424098

RESUMEN

Drugs that target pre-mRNA splicing hold great therapeutic potential, but the quantitative understanding of how these drugs work is limited. Here we introduce mechanistically interpretable quantitative models for the sequence-specific and concentration-dependent behavior of splice-modifying drugs. Using massively parallel splicing assays, RNA-seq experiments, and precision dose-response curves, we obtain quantitative models for two small-molecule drugs, risdiplam and branaplam, developed for treating spinal muscular atrophy. The results quantitatively characterize the specificities of risdiplam and branaplam for 5' splice site sequences, suggest that branaplam recognizes 5' splice sites via two distinct interaction modes, and contradict the prevailing two-site hypothesis for risdiplam activity at SMN2 exon 7. The results also show that anomalous single-drug cooperativity, as well as multi-drug synergy, are widespread among small-molecule drugs and antisense-oligonucleotide drugs that promote exon inclusion. Our quantitative models thus clarify the mechanisms of existing treatments and provide a basis for the rational development of new therapies.


Asunto(s)
Atrofia Muscular Espinal , Pirimidinas , Empalme del ARN , Humanos , Empalme del ARN/genética , Compuestos Azo , Oligonucleótidos/genética , Oligonucleótidos Antisentido/genética , Oligonucleótidos Antisentido/uso terapéutico , Sitios de Empalme de ARN , Atrofia Muscular Espinal/tratamiento farmacológico , Atrofia Muscular Espinal/genética
8.
bioRxiv ; 2024 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-38013993

RESUMEN

Deep neural networks (DNNs) have greatly advanced the ability to predict genome function from sequence. Interpreting genomic DNNs in terms of biological mechanisms, however, remains difficult. Here we introduce SQUID, a genomic DNN interpretability framework based on surrogate modeling. SQUID approximates genomic DNNs in user-specified regions of sequence space using surrogate models, i.e., simpler models that are mechanistically interpretable. Importantly, SQUID removes the confounding effects that nonlinearities and heteroscedastic noise in functional genomics data can have on model interpretation. Benchmarking analysis on multiple genomic DNNs shows that SQUID, when compared to established interpretability methods, identifies motifs that are more consistent across genomic loci and yields improved single-nucleotide variant-effect predictions. SQUID also supports surrogate models that quantify epistatic interactions within and between cis-regulatory elements. SQUID thus advances the ability to mechanistically interpret genomic DNNs.

9.
Science ; 382(6668): 315-320, 2023 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-37856609

RESUMEN

Epistasis between genes is traditionally studied with mutations that eliminate protein activity, but most natural genetic variation is in cis-regulatory DNA and influences gene expression and function quantitatively. In this study, we used natural and engineered cis-regulatory alleles in a plant stem-cell circuit to systematically evaluate epistatic relationships controlling tomato fruit size. Combining a promoter allelic series with two other loci, we collected over 30,000 phenotypic data points from 46 genotypes to quantify how allele strength transforms epistasis. We revealed a saturating dose-dependent relationship but also allele-specific idiosyncratic interactions, including between alleles driving a step change in fruit size during domestication. Our approach and findings expose an underexplored dimension of epistasis, in which cis-regulatory allelic diversity within gene regulatory networks elicits nonlinear, unpredictable interactions that shape phenotypes.


Asunto(s)
Epistasis Genética , Frutas , Solanum lycopersicum , Alelos , Frutas/anatomía & histología , Frutas/genética , Variación Genética , Genotipo , Fenotipo , Solanum lycopersicum/anatomía & histología , Solanum lycopersicum/genética , Regulación de la Expresión Génica de las Plantas , Regiones Promotoras Genéticas , Dosificación de Gen
10.
Am Nat ; 202(4): 534-557, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37792926

RESUMEN

AbstractThe joint distribution of selection coefficients and mutation rates is a key determinant of the genetic architecture of molecular adaptation. Three different distributions are of immediate interest: (1) the "nominal" distribution of possible changes, prior to mutation or selection; (2) the "de novo" distribution of realized mutations; and (3) the "fixed" distribution of selectively established mutations. Here, we formally characterize the relationships between these joint distributions under the strong-selection/weak-mutation (SSWM) regime. The de novo distribution is enriched relative to the nominal distribution for the highest rate mutations, and the fixed distribution is further enriched for the most highly beneficial mutations. Whereas mutation rates and selection coefficients are often assumed to be uncorrelated, we show that even with no correlation in the nominal distribution, the resulting de novo and fixed distributions can have correlations with any combination of signs. Nonetheless, we suggest that natural systems with a finite number of beneficial mutations will frequently have the kind of nominal distribution that induces negative correlations in the fixed distribution. We apply our mathematical framework, along with population simulations, to explore joint distributions of selection coefficients and mutation rates from deep mutational scanning and cancer informatics. Finally, we consider the evolutionary implications of these joint distributions together with two additional joint distributions relevant to parallelism and the rate of adaptation.


Asunto(s)
Tasa de Mutación , Selección Genética , Modelos Genéticos , Mutación , Evolución Biológica , Evolución Molecular
11.
Nat Commun ; 14(1): 2890, 2023 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-37210560

RESUMEN

Mutations in a protein active site can lead to dramatic and useful changes in protein activity. The active site, however, is sensitive to mutations due to a high density of molecular interactions, substantially reducing the likelihood of obtaining functional multipoint mutants. We introduce an atomistic and machine-learning-based approach, called high-throughput Functional Libraries (htFuncLib), that designs a sequence space in which mutations form low-energy combinations that mitigate the risk of incompatible interactions. We apply htFuncLib to the GFP chromophore-binding pocket, and, using fluorescence readout, recover >16,000 unique designs encoding as many as eight active-site mutations. Many designs exhibit substantial and useful diversity in functional thermostability (up to 96 °C), fluorescence lifetime, and quantum yield. By eliminating incompatible active-site mutations, htFuncLib generates a large diversity of functional sequences. We envision that htFuncLib will be used in one-shot optimization of activity in enzymes, binders, and other proteins.


Asunto(s)
Proteínas , Dominio Catalítico , Biblioteca de Genes , Proteínas/genética , Mutación , Fluorescencia , Proteínas Fluorescentes Verdes/metabolismo
12.
Res Sq ; 2023 Dec 13.
Artículo en Inglés | MEDLINE | ID: mdl-37131620

RESUMEN

Some protein binding pairs exhibit extreme specificities that functionally insulate them from homologs. Such pairs evolve mostly by accumulating single-point mutations, and mutants are selected if their affinity exceeds the threshold required for function1-4. Thus, homologous and high-specificity binding pairs bring to light an evolutionary conundrum: how does a new specificity evolve while maintaining the required affinity in each intermediate5,6? Until now, a fully functional single-mutation path that connects two orthogonal pairs has only been described where the pairs were mutationally close thus enabling experimental enumeration of all intermediates2. We present an atomistic and graph-theoretical framework for discovering low molecular strain single-mutation paths that connect two extant pairs, enabling enumeration beyond experimental capability. We apply it to two orthogonal bacterial colicin endonuclease-immunity pairs separated by 17 interface mutations7. We were not able to find a strain-free and functional path in the sequence space defined by the two extant pairs. But including mutations that bridge amino acids that cannot be exchanged through single-nucleotide mutations led us to a strain-free 19-mutation trajectory that is completely viable in vivo. Our experiments show that the specificity switch is remarkably abrupt, resulting from only one radical mutation on each partner. Furthermore, each of the critical specificity-switch mutations increases fitness, demonstrating that functional divergence could be driven by positive Darwinian selection. These results reveal how even radical functional changes in an epistatic fitness landscape may evolve.

13.
Cell ; 186(9): 1824-1845, 2023 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-37116469

RESUMEN

Cachexia, a systemic wasting condition, is considered a late consequence of diseases, including cancer, organ failure, or infections, and contributes to significant morbidity and mortality. The induction process and mechanistic progression of cachexia are incompletely understood. Refocusing academic efforts away from advanced cachexia to the etiology of cachexia may enable discoveries of new therapeutic approaches. Here, we review drivers, mechanisms, organismal predispositions, evidence for multi-organ interaction, model systems, clinical research, trials, and care provision from early onset to late cachexia. Evidence is emerging that distinct inflammatory, metabolic, and neuro-modulatory drivers can initiate processes that ultimately converge on advanced cachexia.


Asunto(s)
Caquexia , Humanos , Caquexia/tratamiento farmacológico , Caquexia/etiología , Caquexia/metabolismo , Caquexia/patología , Músculo Esquelético/metabolismo , Neoplasias/complicaciones , Neoplasias/metabolismo , Neoplasias/patología , Infecciones/complicaciones , Infecciones/patología , Insuficiencia Multiorgánica/complicaciones , Insuficiencia Multiorgánica/patología
14.
Philos Trans R Soc Lond B Biol Sci ; 378(1877): 20220055, 2023 05 22.
Artículo en Inglés | MEDLINE | ID: mdl-37004719

RESUMEN

Predicting evolutionary outcomes is an important research goal in a diversity of contexts. The focus of evolutionary forecasting is usually on adaptive processes, and efforts to improve prediction typically focus on selection. However, adaptive processes often rely on new mutations, which can be strongly influenced by predictable biases in mutation. Here, we provide an overview of existing theory and evidence for such mutation-biased adaptation and consider the implications of these results for the problem of prediction, in regard to topics such as the evolution of infectious diseases, resistance to biochemical agents, as well as cancer and other kinds of somatic evolution. We argue that empirical knowledge of mutational biases is likely to improve in the near future, and that this knowledge is readily applicable to the challenges of short-term prediction. This article is part of the theme issue 'Interdisciplinary approaches to predicting evolutionary biology'.


Asunto(s)
Adaptación Fisiológica , Evolución Biológica , Mutación , Adaptación Fisiológica/genética , Aclimatación , Sesgo , Evolución Molecular
15.
Proc Natl Acad Sci U S A ; 119(39): e2204233119, 2022 09 27.
Artículo en Inglés | MEDLINE | ID: mdl-36129941

RESUMEN

Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype-phenotype relationship typically reflects not only genetic interactions between pairs of sites but also higher-order interactions among larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging. Here we present a method for reconstructing sequence-to-function mappings from partially observed data that can accommodate all orders of genetic interaction. The main idea is to make predictions for unobserved genotypes that match the type and extent of epistasis found in the observed data. This information on the type and extent of epistasis can be extracted by considering how phenotypic correlations change as a function of mutational distance, which is equivalent to estimating the fraction of phenotypic variance due to each order of genetic interaction (additive, pairwise, three-way, etc.). Using these estimated variance components, we then define an empirical Bayes prior that in expectation matches the observed pattern of epistasis and reconstruct the genotype-phenotype mapping by conducting Gaussian process regression under this prior. To demonstrate the power of this approach, we present an application to the antibody-binding domain GB1 and also provide a detailed exploration of a dataset consisting of high-throughput measurements for the splicing efficiency of human pre-mRNA [Formula: see text] splice sites, for which we also validate our model predictions via additional low-throughput experiments.


Asunto(s)
Epistasis Genética , Precursores del ARN , Teorema de Bayes , Mapeo Cromosómico , Biología Computacional , Genotipo , Humanos , Modelos Genéticos , Mutación , Fenotipo , Empalme del ARN
16.
Genome Biol ; 23(1): 98, 2022 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-35428271

RESUMEN

Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps-including biophysically interpretable models-from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise.


Asunto(s)
Bioensayo , Redes Neurales de la Computación , Genotipo , Mutación , Fenotipo
17.
Proc Natl Acad Sci U S A ; 119(7)2022 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-35145034

RESUMEN

Evolutionary adaptation often occurs by the fixation of beneficial mutations. This mode of adaptation can be characterized quantitatively by a spectrum of adaptive substitutions, i.e., a distribution for types of changes fixed in adaptation. Recent work establishes that the changes involved in adaptation reflect common types of mutations, raising the question of how strongly the mutation spectrum shapes the spectrum of adaptive substitutions. We address this question with a codon-based model for the spectrum of adaptive amino acid substitutions, applied to three large datasets covering thousands of amino acid changes identified in natural and experimental adaptation in Saccharomyces cerevisiae, Escherichia coli, and Mycobacterium tuberculosis Using species-specific mutation spectra based on prior knowledge, we find that the mutation spectrum has a proportional influence on the spectrum of adaptive substitutions in all three species. Indeed, we find that by inferring the mutation rates that best explain the spectrum of adaptive substitutions, we can accurately recover the species-specific mutation spectra. However, we also find that the predictive power of the model differs substantially between the three species. To better understand these differences, we use population simulations to explore the factors that influence how closely the spectrum of adaptive substitutions mirrors the mutation spectrum. The results show that the influence of the mutation spectrum decreases with increasing mutational supply ([Formula: see text]) and that predictive power is strongly affected by the number and diversity of beneficial mutations.


Asunto(s)
Adaptación Fisiológica , Escherichia coli/genética , Mycobacterium tuberculosis/genética , Saccharomyces cerevisiae/genética , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Escherichia coli/fisiología , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Regulación Bacteriana de la Expresión Génica , Regulación Fúngica de la Expresión Génica , Mutación , Mycobacterium tuberculosis/fisiología , Saccharomyces cerevisiae/fisiología , Especificidad de la Especie
18.
iScience ; 24(11): 103343, 2021 Nov 19.
Artículo en Inglés | MEDLINE | ID: mdl-34825133

RESUMEN

Genomic data can facilitate personalized treatment decisions by enabling therapeutic hypotheses in individual patients. Mutual exclusivity has been an empirically useful signal for identifying activating mutations that respond to single agent targeted therapies. However, a low mutation frequency can underpower this signal for rare variants. We develop a resampling based method for the direct pairwise comparison of conditional selection between sets of gene pairs. We apply this method to a transcript variant of anaplastic lymphoma kinase (ALK) in melanoma, termed ALKATI that was suggested to predict sensitivity to ALK inhibitors and we find that it is not mutually exclusive with key melanoma oncogenes. Furthermore, we find that ALKATI is not likely to be sufficient for cellular transformation or growth, and it does not predict single agent therapeutic dependency. Our work strongly disfavors the role of ALKATI as a targetable oncogenic driver that might be sensitive to single agent ALK treatment.

19.
Proc Natl Acad Sci U S A ; 118(40)2021 10 05.
Artículo en Inglés | MEDLINE | ID: mdl-34599093

RESUMEN

Density estimation in sequence space is a fundamental problem in machine learning that is also of great importance in computational biology. Due to the discrete nature and large dimensionality of sequence space, how best to estimate such probability distributions from a sample of observed sequences remains unclear. One common strategy for addressing this problem is to estimate the probability distribution using maximum entropy (i.e., calculating point estimates for some set of correlations based on the observed sequences and predicting the probability distribution that is as uniform as possible while still matching these point estimates). Building on recent advances in Bayesian field-theoretic density estimation, we present a generalization of this maximum entropy approach that provides greater expressivity in regions of sequence space where data are plentiful while still maintaining a conservative maximum entropy character in regions of sequence space where data are sparse or absent. In particular, we define a family of priors for probability distributions over sequence space with a single hyperparameter that controls the expected magnitude of higher-order correlations. This family of priors then results in a corresponding one-dimensional family of maximum a posteriori estimates that interpolate smoothly between the maximum entropy estimate and the observed sample frequencies. To demonstrate the power of this method, we use it to explore the high-dimensional geometry of the distribution of 5' splice sites found in the human genome and to understand patterns of chromosomal abnormalities across human cancers.


Asunto(s)
Aneuploidia , Biología Computacional/métodos , Modelos Teóricos , Neoplasias/genética , Sitios de Empalme de ARN , Humanos , Probabilidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA