Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
bioRxiv ; 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38798625

RESUMO

Quantitative models of sequence-function relationships, which describe how biological sequences encode functional activities, are ubiquitous in modern biology. One important aspect of these models is that they commonly exhibit gauge freedoms, i.e., directions in parameter space that do not affect model predictions. In physics, gauge freedoms arise when physical theories are formulated in ways that respect fundamental symmetries. However, the connections that gauge freedoms in models of sequence-function relationships have to the symmetries of sequence space have yet to be systematically studied. Here we study the gauge freedoms of models that respect a specific symmetry of sequence space: the group of position-specific character permutations. We find that gauge freedoms arise when the transformations of model parameters that compensate for these symmetry transformations are described by redundant irreducible matrix representations. Based on this finding, we describe an "embedding distillation" procedure that enables analytic calculation of the dimension of the space of gauge freedoms, as well as efficient computation of a sparse basis for this space. Finally, we show that the ability to interpret model parameters as quantifying allelic effects places strong constraints on the form that models can take, and in particular show that all nontrivial equivariant models of allelic effects must exhibit gauge freedoms. Our work thus advances the understanding of the relationship between symmetries and gauge freedoms in quantitative models of sequence-function relationships.

2.
bioRxiv ; 2024 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-38798671

RESUMO

Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.

3.
Nat Commun ; 15(1): 1880, 2024 Feb 29.
Artigo em Inglês | MEDLINE | ID: mdl-38424098

RESUMO

Drugs that target pre-mRNA splicing hold great therapeutic potential, but the quantitative understanding of how these drugs work is limited. Here we introduce mechanistically interpretable quantitative models for the sequence-specific and concentration-dependent behavior of splice-modifying drugs. Using massively parallel splicing assays, RNA-seq experiments, and precision dose-response curves, we obtain quantitative models for two small-molecule drugs, risdiplam and branaplam, developed for treating spinal muscular atrophy. The results quantitatively characterize the specificities of risdiplam and branaplam for 5' splice site sequences, suggest that branaplam recognizes 5' splice sites via two distinct interaction modes, and contradict the prevailing two-site hypothesis for risdiplam activity at SMN2 exon 7. The results also show that anomalous single-drug cooperativity, as well as multi-drug synergy, are widespread among small-molecule drugs and antisense-oligonucleotide drugs that promote exon inclusion. Our quantitative models thus clarify the mechanisms of existing treatments and provide a basis for the rational development of new therapies.


Assuntos
Atrofia Muscular Espinal , Pirimidinas , Splicing de RNA , Humanos , Splicing de RNA/genética , Compostos Azo , Oligonucleotídeos/genética , Oligonucleotídeos Antissenso/genética , Oligonucleotídeos Antissenso/uso terapêutico , Sítios de Splice de RNA , Atrofia Muscular Espinal/tratamento farmacológico , Atrofia Muscular Espinal/genética
4.
bioRxiv ; 2024 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-38013993

RESUMO

Deep neural networks (DNNs) have greatly advanced the ability to predict genome function from sequence. Interpreting genomic DNNs in terms of biological mechanisms, however, remains difficult. Here we introduce SQUID, a genomic DNN interpretability framework based on surrogate modeling. SQUID approximates genomic DNNs in user-specified regions of sequence space using surrogate models, i.e., simpler models that are mechanistically interpretable. Importantly, SQUID removes the confounding effects that nonlinearities and heteroscedastic noise in functional genomics data can have on model interpretation. Benchmarking analysis on multiple genomic DNNs shows that SQUID, when compared to established interpretability methods, identifies motifs that are more consistent across genomic loci and yields improved single-nucleotide variant-effect predictions. SQUID also supports surrogate models that quantify epistatic interactions within and between cis-regulatory elements. SQUID thus advances the ability to mechanistically interpret genomic DNNs.

5.
Proc Natl Acad Sci U S A ; 119(39): e2204233119, 2022 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-36129941

RESUMO

Contemporary high-throughput mutagenesis experiments are providing an increasingly detailed view of the complex patterns of genetic interaction that occur between multiple mutations within a single protein or regulatory element. By simultaneously measuring the effects of thousands of combinations of mutations, these experiments have revealed that the genotype-phenotype relationship typically reflects not only genetic interactions between pairs of sites but also higher-order interactions among larger numbers of sites. However, modeling and understanding these higher-order interactions remains challenging. Here we present a method for reconstructing sequence-to-function mappings from partially observed data that can accommodate all orders of genetic interaction. The main idea is to make predictions for unobserved genotypes that match the type and extent of epistasis found in the observed data. This information on the type and extent of epistasis can be extracted by considering how phenotypic correlations change as a function of mutational distance, which is equivalent to estimating the fraction of phenotypic variance due to each order of genetic interaction (additive, pairwise, three-way, etc.). Using these estimated variance components, we then define an empirical Bayes prior that in expectation matches the observed pattern of epistasis and reconstruct the genotype-phenotype mapping by conducting Gaussian process regression under this prior. To demonstrate the power of this approach, we present an application to the antibody-binding domain GB1 and also provide a detailed exploration of a dataset consisting of high-throughput measurements for the splicing efficiency of human pre-mRNA [Formula: see text] splice sites, for which we also validate our model predictions via additional low-throughput experiments.


Assuntos
Epistasia Genética , Precursores de RNA , Teorema de Bayes , Mapeamento Cromossômico , Biologia Computacional , Genótipo , Humanos , Modelos Genéticos , Mutação , Fenótipo , Splicing de RNA
6.
Proc Natl Acad Sci U S A ; 119(23): e2201301119, 2022 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-35653571

RESUMO

In σ-dependent transcriptional pausing, the transcription initiation factor σ, translocating with RNA polymerase (RNAP), makes sequence-specific protein­DNA interactions with a promoter-like sequence element in the transcribed region, inducing pausing. It has been proposed that, in σ-dependent pausing, the RNAP active center can access off-pathway "backtracked" states that are substrates for the transcript-cleavage factors of the Gre family and on-pathway "scrunched" states that mediate pause escape. Here, using site-specific protein­DNA photocrosslinking to define positions of the RNAP trailing and leading edges and of σ relative to DNA at the λPR' promoter, we show directly that σ-dependent pausing in the absence of GreB in vitro predominantly involves a state backtracked by 2­4 bp, and σ-dependent pausing in the presence of GreB in vitro and in vivo predominantly involves a state scrunched by 2­3 bp. Analogous experiments with a library of 47 (∼16,000) transcribed-region sequences show that the state scrunched by 2­3 bp­and only that state­is associated with the consensus sequence, T−3N−2Y−1G+1, (where −1 corresponds to the position of the RNA 3' end), which is identical to the consensus for pausing in initial transcription and which is related to the consensus for pausing in transcription elongation. Experiments with heteroduplex templates show that sequence information at position T−3 resides in the DNA nontemplate strand. A cryoelectron microscopy structure of a complex engaged in σ-dependent pausing reveals positions of DNA scrunching on the DNA nontemplate and template strands and suggests that position T−3 of the consensus sequence exerts its effects by facilitating scrunching.


Assuntos
RNA Polimerases Dirigidas por DNA , Transcrição Gênica , Microscopia Crioeletrônica , DNA , RNA Polimerases Dirigidas por DNA/metabolismo , Escherichia coli/genética
7.
Genome Biol ; 23(1): 98, 2022 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-35428271

RESUMO

Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps-including biophysically interpretable models-from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise.


Assuntos
Bioensaio , Redes Neurais de Computação , Genótipo , Mutação , Fenótipo
8.
Proc Natl Acad Sci U S A ; 118(40)2021 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-34599093

RESUMO

Density estimation in sequence space is a fundamental problem in machine learning that is also of great importance in computational biology. Due to the discrete nature and large dimensionality of sequence space, how best to estimate such probability distributions from a sample of observed sequences remains unclear. One common strategy for addressing this problem is to estimate the probability distribution using maximum entropy (i.e., calculating point estimates for some set of correlations based on the observed sequences and predicting the probability distribution that is as uniform as possible while still matching these point estimates). Building on recent advances in Bayesian field-theoretic density estimation, we present a generalization of this maximum entropy approach that provides greater expressivity in regions of sequence space where data are plentiful while still maintaining a conservative maximum entropy character in regions of sequence space where data are sparse or absent. In particular, we define a family of priors for probability distributions over sequence space with a single hyperparameter that controls the expected magnitude of higher-order correlations. This family of priors then results in a corresponding one-dimensional family of maximum a posteriori estimates that interpolate smoothly between the maximum entropy estimate and the observed sample frequencies. To demonstrate the power of this method, we use it to explore the high-dimensional geometry of the distribution of 5' splice sites found in the human genome and to understand patterns of chromosomal abnormalities across human cancers.


Assuntos
Aneuploidia , Biologia Computacional/métodos , Modelos Teóricos , Neoplasias/genética , Sítios de Splice de RNA , Humanos , Probabilidade
9.
Proc Natl Acad Sci U S A ; 118(27)2021 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-34187896

RESUMO

Chemical modifications of RNA 5'-ends enable "epitranscriptomic" regulation, influencing multiple aspects of RNA fate. In transcription initiation, a large inventory of substrates compete with nucleoside triphosphates for use as initiating entities, providing an ab initio mechanism for altering the RNA 5'-end. In Escherichia coli cells, RNAs with a 5'-end hydroxyl are generated by use of dinucleotide RNAs as primers for transcription initiation, "primer-dependent initiation." Here, we use massively systematic transcript end readout (MASTER) to detect and quantify RNA 5'-ends generated by primer-dependent initiation for ∼410 (∼1,000,000) promoter sequences in E. coli The results show primer-dependent initiation in E. coli involves any of the 16 possible dinucleotide primers and depends on promoter sequences in, upstream, and downstream of the primer binding site. The results yield a consensus sequence for primer-dependent initiation, YTSS-2NTSS-1NTSSWTSS+1, where TSS is the transcription start site, NTSS-1NTSS is the primer binding site, Y is pyrimidine, and W is A or T. Biochemical and structure-determination studies show that the base pair (nontemplate-strand base:template-strand base) immediately upstream of the primer binding site (Y:RTSS-2, where R is purine) exerts its effect through the base on the DNA template strand (RTSS-2) through interchain base stacking with the RNA primer. Results from analysis of a large set of natural, chromosomally encoded Ecoli promoters support the conclusions from MASTER. Our findings provide a mechanistic and structural description of how TSS-region sequence hard-codes not only the TSS position but also the potential for epitranscriptomic regulation through primer-dependent transcription initiation.


Assuntos
Primers do DNA/metabolismo , Escherichia coli/genética , Regiões Promotoras Genéticas , Iniciação da Transcrição Genética , Sequência de Bases , Sítios de Ligação , Cromossomos Bacterianos/genética , Regulação Bacteriana da Expressão Gênica , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Sítio de Iniciação de Transcrição
10.
Elife ; 92020 09 21.
Artigo em Inglês | MEDLINE | ID: mdl-32955440

RESUMO

Advances in DNA sequencing have revolutionized our ability to read genomes. However, even in the most well-studied of organisms, the bacterium Escherichia coli, for ≈65% of promoters we remain ignorant of their regulation. Until we crack this regulatory Rosetta Stone, efforts to read and write genomes will remain haphazard. We introduce a new method, Reg-Seq, that links massively parallel reporter assays with mass spectrometry to produce a base pair resolution dissection of more than a E. coli promoters in 12 growth conditions. We demonstrate that the method recapitulates known regulatory information. Then, we examine regulatory architectures for more than 80 promoters which previously had no known regulatory information. In many cases, we also identify which transcription factors mediate their regulation. This method clears a path for highly multiplexed investigations of the regulatory genome of model organisms, with the potential of moving to an array of microbes of ecological and medical relevance.


Assuntos
Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica , Genoma Bacteriano , Regiões Promotoras Genéticas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/instrumentação
11.
Bioinformatics ; 36(7): 2272-2274, 2020 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-31821414

RESUMO

SUMMARY: Sequence logos are visually compelling ways of illustrating the biological properties of DNA, RNA and protein sequences, yet it is currently difficult to generate and customize such logos within the Python programming environment. Here we introduce Logomaker, a Python API for creating publication-quality sequence logos. Logomaker can produce both standard and highly customized logos from either a matrix-like array of numbers or a multiple-sequence alignment. Logos are rendered as native matplotlib objects that are easy to stylize and incorporate into multi-panel figures. AVAILABILITY AND IMPLEMENTATION: Logomaker can be installed using the pip package manager and is compatible with both Python 2.7 and Python 3.6. Documentation is provided at http://logomaker.readthedocs.io; source code is available at http://github.com/jbkinney/logomaker.


Assuntos
Documentação , Software , DNA , Matrizes de Pontuação de Posição Específica
12.
Annu Rev Genomics Hum Genet ; 20: 99-127, 2019 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-31091417

RESUMO

Over the last decade, a rich variety of massively parallel assays have revolutionized our understanding of how biological sequences encode quantitative molecular phenotypes. These assays include deep mutational scanning, high-throughput SELEX, and massively parallel reporter assays. Here, we review these experimental methods and how the data they produce can be used to quantitatively model sequence-function relationships. In doing so, we touch on a diverse range of topics, including the identification of clinically relevant genomic variants, the modeling of transcription factor binding to DNA, the functional and evolutionary landscapes of proteins, and cis-regulatory mechanisms in both transcription and mRNA splicing. We further describe a unified conceptual framework and a core set of mathematical modeling strategies that studies in these diverse areas can make use of. Finally, we highlight key aspects of experimental design and mathematical modeling that are important for the results of such studies to be interpretable and reproducible.


Assuntos
Epistasia Genética , Estudos de Associação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Genéticos , Técnica de Seleção de Aptâmeros/métodos , DNA/genética , DNA/metabolismo , Genótipo , Humanos , Mutação , Fenótipo , Ligação Proteica , Splicing de RNA , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Transcrição Gênica
13.
PLoS Comput Biol ; 15(2): e1006226, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30716072

RESUMO

Despite the central importance of transcriptional regulation in biology, it has proven difficult to determine the regulatory mechanisms of individual genes, let alone entire gene networks. It is particularly difficult to decipher the biophysical mechanisms of transcriptional regulation in living cells and determine the energetic properties of binding sites for transcription factors and RNA polymerase. In this work, we present a strategy for dissecting transcriptional regulatory sequences using in vivo methods (massively parallel reporter assays) to formulate quantitative models that map a transcription factor binding site's DNA sequence to transcription factor-DNA binding energy. We use these models to predict the binding energies of transcription factor binding sites to within 1 kBT of their measured values. We further explore how such a sequence-energy mapping relates to the mechanisms of trancriptional regulation in various promoter contexts. Specifically, we show that our models can be used to design specific induction responses, analyze the effects of amino acid mutations on DNA sequence preference, and determine how regulatory context affects a transcription factor's sequence specificity.


Assuntos
Sítios de Ligação/genética , Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Mapeamento Cromossômico , DNA/química , Regulação da Expressão Gênica/genética , Redes Reguladoras de Genes , Modelos Moleculares , Regiões Promotoras Genéticas/genética , Ligação Proteica , Fatores de Transcrição/química , Fatores de Transcrição/metabolismo , Transcrição Gênica/fisiologia
14.
Cell Syst ; 8(1): 86-93.e3, 2019 01 23.
Artigo em Inglês | MEDLINE | ID: mdl-30611676

RESUMO

Epistasis is the phenomenon by which the effect of a mutation depends on its genetic background. While it is usually defined in terms of organismal fitness, for single proteins, it must reflect physical interactions among residues. Here, we systematically extract the specific contribution pairwise epistasis makes to the physical affinity of antibody-antigen binding relevant to affinity maturation, a process of accelerated Darwinian evolution. We find that, among competing definitions of affinity, the binding free energy is the most appropriate to describe epistasis. We show that epistasis is pervasive, accounting for 25%-35% of variability, of which a large fraction is beneficial. This work suggests that epistasis both constrains, through negative epistasis, and enlarges, through positive epistasis, the set of possible evolutionary paths that can produce high-affinity sequences during repeated rounds of mutation and selection.


Assuntos
Anticorpos/metabolismo , Antígenos/metabolismo , Epistasia Genética/genética , Evolução Biológica , Humanos
15.
Elife ; 72018 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-30570483

RESUMO

Gene expression in all organisms is controlled by cooperative interactions between DNA-bound transcription factors (TFs), but quantitatively measuring TF-DNA and TF-TF interactions remains difficult. Here we introduce a strategy for precisely measuring the Gibbs free energy of such interactions in living cells. This strategy centers on the measurement and modeling of 'allelic manifolds', a multidimensional generalization of the classical genetics concept of allelic series. Allelic manifolds are measured using reporter assays performed on strategically designed cis-regulatory sequences. Quantitative biophysical models are then fit to the resulting data. We used this strategy to study regulation by two Escherichia coli TFs, CRP and [Formula: see text] RNA polymerase. Doing so, we consistently obtained energetic measurements precise to [Formula: see text] kcal/mol. We also obtained multiple results that deviate from the prior literature. Our strategy is compatible with massively parallel reporter assays in both prokaryotes and eukaryotes, and should therefore be highly scalable and broadly applicable. Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that minor issues remain unresolved (see decision letter).


Assuntos
Proteína Receptora de AMP Cíclico/genética , DNA Bacteriano/genética , RNA Polimerases Dirigidas por DNA/genética , Proteínas de Escherichia coli/genética , Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica , Modelos Estatísticos , Fator sigma/genética , Alelos , Sítios de Ligação , Bioensaio , Proteína Receptora de AMP Cíclico/metabolismo , DNA Bacteriano/metabolismo , RNA Polimerases Dirigidas por DNA/metabolismo , Escherichia coli/metabolismo , Proteínas de Escherichia coli/metabolismo , Genes Reporter , Cinética , Ligação Proteica , Fator sigma/metabolismo , Termodinâmica , beta-Galactosidase/genética , beta-Galactosidase/metabolismo
16.
Cancer Cell ; 34(6): 970-981.e8, 2018 12 10.
Artigo em Inglês | MEDLINE | ID: mdl-30503706

RESUMO

The Mixed Lineage Leukemia gene (MLL) is altered in leukemia by chromosomal translocations to produce oncoproteins composed of the MLL N-terminus fused to the C-terminus of a partner protein. Here, we used domain-focused CRISPR screening to identify ZFP64 as an essential transcription factor in MLL-rearranged leukemia. We show that the critical function of ZFP64 in leukemia is to maintain MLL expression via binding to the MLL promoter, which is the most enriched location of ZFP64 occupancy in the human genome. The specificity of ZFP64 for MLL is accounted for by an exceptional density of ZFP64 motifs embedded within the MLL promoter. These findings demonstrate how a sequence anomaly of an oncogene promoter can impose a transcriptional addiction in cancer.


Assuntos
Proteínas de Ligação a DNA/genética , Leucemia Aguda Bifenotípica/genética , Proteína de Leucina Linfoide-Mieloide/genética , Regiões Promotoras Genéticas/genética , Fatores de Transcrição/genética , Translocação Genética , Células A549 , Animais , Linhagem Celular Tumoral , Proteínas de Ligação a DNA/metabolismo , Regulação Leucêmica da Expressão Gênica , Células HEK293 , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Células K562 , Leucemia Aguda Bifenotípica/metabolismo , Leucemia Aguda Bifenotípica/patologia , Camundongos Endogâmicos NOD , Camundongos Knockout , Camundongos SCID , Proteína de Leucina Linfoide-Mieloide/metabolismo , Proteínas de Fusão Oncogênica/genética , Proteínas de Fusão Oncogênica/metabolismo , Células THP-1 , Fatores de Transcrição/metabolismo , Transplante Heterólogo
17.
Phys Rev Lett ; 121(16): 160605, 2018 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-30387642

RESUMO

How might a smooth probability distribution be estimated with accurately quantified uncertainty from a limited amount of sampled data? Here we describe a field-theoretic approach that addresses this problem remarkably well in one dimension, providing an exact nonparametric Bayesian posterior without relying on tunable parameters or large-data approximations. Strong non-Gaussian constraints, which require a nonperturbative treatment, are found to play a major role in reducing distribution uncertainty. A software implementation of this method is provided.

18.
Mol Cell ; 71(6): 1012-1026.e3, 2018 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-30174293

RESUMO

Pre-mRNA splicing is an essential step in the expression of most human genes. Mutations at the 5' splice site (5'ss) frequently cause defective splicing and disease due to interference with the initial recognition of the exon-intron boundary by U1 small nuclear ribonucleoprotein (snRNP), a component of the spliceosome. Here, we use a massively parallel splicing assay (MPSA) in human cells to quantify the activity of all 32,768 unique 5'ss sequences (NNN/GYNNNN) in three different gene contexts. Our results reveal that although splicing efficiency is mostly governed by the 5'ss sequence, there are substantial differences in this efficiency across gene contexts. Among other uses, these MPSA measurements facilitate the prediction of 5'ss sequence variants that are likely to cause aberrant splicing. This approach provides a framework to assess potential pathogenic variants in the human genome and streamline the development of splicing-corrective therapies.


Assuntos
Processamento Alternativo/genética , Sítios de Splice de RNA/genética , Sítios de Splice de RNA/fisiologia , Processamento Alternativo/fisiologia , Proteínas de Transporte/genética , Sequência Conservada/genética , Éxons , Genes BRCA2 , Células HeLa , Humanos , Íntrons , Mutação , Splicing de RNA/genética , Splicing de RNA/fisiologia , RNA Nuclear Pequeno/fisiologia , Ribonucleoproteína Nuclear Pequena U1/fisiologia , Spliceossomos , Proteína 1 de Sobrevivência do Neurônio Motor/genética , Fatores de Elongação da Transcrição
19.
Genes (Basel) ; 9(9)2018 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-30134605

RESUMO

A now classical argument for the marginal thermodynamic stability of proteins explains the distribution of observed protein stabilities as a consequence of an entropic pull in protein sequence space. In particular, most sequences that are sufficiently stable to fold will have stabilities near the folding threshold. Here, we extend this argument to consider its predictions for epistatic interactions for the effects of mutations on the free energy of folding. Although there is abundant evidence to indicate that the effects of mutations on the free energy of folding are nearly additive and conserved over evolutionary time, we show that these observations are compatible with the hypothesis that a non-additive contribution to the folding free energy is essential for observed proteins to maintain their native structure. In particular, through both simulations and analytical results, we show that even very small departures from additivity are sufficient to drive this effect.

20.
Proc Natl Acad Sci U S A ; 115(21): E4796-E4805, 2018 05 22.
Artigo em Inglês | MEDLINE | ID: mdl-29728462

RESUMO

Gene regulation is one of the most ubiquitous processes in biology. However, while the catalog of bacterial genomes continues to expand rapidly, we remain ignorant about how almost all of the genes in these genomes are regulated. At present, characterizing the molecular mechanisms by which individual regulatory sequences operate requires focused efforts using low-throughput methods. Here, we take a first step toward multipromoter dissection and show how a combination of massively parallel reporter assays, mass spectrometry, and information-theoretic modeling can be used to dissect multiple bacterial promoters in a systematic way. We show this approach on both well-studied and previously uncharacterized promoters in the enteric bacterium Escherichia coli In all cases, we recover nucleotide-resolution models of promoter mechanism. For some promoters, including previously unannotated ones, the approach allowed us to further extract quantitative biophysical models describing input-output relationships. Given the generality of the approach presented here, it opens up the possibility of quantitatively dissecting the mechanisms of promoter function in E. coli and a wide range of other bacteria.


Assuntos
Proteínas de Escherichia coli/metabolismo , Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica , Genoma Bacteriano , Proteínas de Fluorescência Verde/metabolismo , Regiões Promotoras Genéticas , Escherichia coli/crescimento & desenvolvimento , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Ativação Transcricional
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA