Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 14 de 14
1.
Nature ; 626(7999): 643-652, 2024 Feb.
Article En | MEDLINE | ID: mdl-38109937

Thousands of proteins have been validated genetically as therapeutic targets for human diseases1. However, very few have been successfully targeted, and many are considered 'undruggable'. This is particularly true for proteins that function via protein-protein interactions-direct inhibition of binding interfaces is difficult and requires the identification of allosteric sites. However, most proteins have no known allosteric sites, and a comprehensive allosteric map does not exist for any protein. Here we address this shortcoming by charting multiple global atlases of inhibitory allosteric communication in KRAS. We quantified the effects of more than 26,000 mutations on the folding of KRAS and its binding to six interaction partners. Genetic interactions in double mutants enabled us to perform biophysical measurements at scale, inferring more than 22,000 causal free energy changes. These energy landscapes quantify how mutations tune the binding specificity of a signalling protein and map the inhibitory allosteric sites for an important therapeutic target. Allosteric propagation is particularly effective across the central ß-sheet of KRAS, and multiple surface pockets are genetically validated as allosterically active, including a distal pocket in the C-terminal lobe of the protein. Allosteric mutations typically inhibit binding to all tested effectors, but they can also change the binding specificity, revealing the regulatory, evolutionary and therapeutic potential to tune pathway activation. Using the approach described here, it should be possible to rapidly and comprehensively identify allosteric target sites in many proteins.


Allosteric Site , Protein Folding , Proto-Oncogene Proteins p21(ras) , Humans , Allosteric Regulation/drug effects , Allosteric Regulation/genetics , Allosteric Site/drug effects , Allosteric Site/genetics , Mutation , Protein Binding , Proto-Oncogene Proteins p21(ras)/antagonists & inhibitors , Proto-Oncogene Proteins p21(ras)/chemistry , Proto-Oncogene Proteins p21(ras)/genetics , Proto-Oncogene Proteins p21(ras)/metabolism , Reproducibility of Results , Substrate Specificity/drug effects , Substrate Specificity/genetics , Thermodynamics
2.
Nature ; 604(7904): 175-183, 2022 04.
Article En | MEDLINE | ID: mdl-35388192

Allosteric communication between distant sites in proteins is central to biological regulation but still poorly characterized, limiting understanding, engineering and drug development1-6. An important reason for this is the lack of methods to comprehensively quantify allostery in diverse proteins. Here we address this shortcoming and present a method that uses deep mutational scanning to globally map allostery. The approach uses an efficient experimental design to infer en masse the causal biophysical effects of mutations by quantifying multiple molecular phenotypes-here we examine binding and protein abundance-in multiple genetic backgrounds and fitting thermodynamic models using neural networks. We apply the approach to two of the most common protein interaction domains found in humans, an SH3 domain and a PDZ domain, to produce comprehensive atlases of allosteric communication. Allosteric mutations are abundant, with a large mutational target space of network-altering 'edgetic' variants. Mutations are more likely to be allosteric closer to binding interfaces, at glycine residues and at specific residues connecting to an opposite surface within the PDZ domain. This general approach of quantifying mutational effects for multiple molecular phenotypes and in multiple genetic backgrounds should enable the energetic and allosteric landscapes of many proteins to be rapidly and comprehensively mapped.


Allosteric Site , PDZ Domains , Proteins , Allosteric Regulation/genetics , PDZ Domains/genetics , Protein Binding/genetics , Proteins/chemistry , Thermodynamics
3.
Elife ; 102021 02 01.
Article En | MEDLINE | ID: mdl-33522485

Plaques of the amyloid beta (Aß) peptide are a pathological hallmark of Alzheimer's disease (AD), the most common form of dementia. Mutations in Aß also cause familial forms of AD (fAD). Here, we use deep mutational scanning to quantify the effects of >14,000 mutations on the aggregation of Aß. The resulting genetic landscape reveals mechanistic insights into fibril nucleation, including the importance of charge and gatekeeper residues in the disordered region outside of the amyloid core in preventing nucleation. Strikingly, unlike computational predictors and previous measurements, the empirical nucleation scores accurately identify all known dominant fAD mutations in Aß, genetically validating that the mechanism of nucleation in a cell-based assay is likely to be very similar to the mechanism that causes the human disease. These results provide the first comprehensive atlas of how mutations alter the formation of any amyloid fibril and a resource for the interpretation of genetic variation in Aß.


Alzheimer's disease is the most common form of dementia, affecting more than 50 million people worldwide. Despite more than 400 clinical trials, there are still no effective drugs that can prevent or treat the disease. A common target in Alzheimer's disease trials is a small protein called amyloid beta. Amyloid beta proteins are 'sticky' molecules. In the brains of people with Alzheimer's disease, they join to form first small aggregates and then long chains called fibrils, a process which is toxic to neurons. Specific mutations in the gene for amyloid beta are known to cause rare, aggressive forms of Alzheimer's disease that typically affect people in their fifties or sixties. But these are not the only mutations that can occur in amyloid beta. In principle, any part of the protein could undergo mutation. And given the size of the human population, it is likely that each of these mutations exists in someone alive today. Seuma et al. reasoned that studying these mutations could help us understand the process by which amyloid beta forms new aggregates. Using an approach called deep mutational scanning, Seuma et al. mutated each point in the protein, one at a time. This produced more than 14,000 different versions of amyloid beta. Seuma et al. then measured how quickly these mutants were able to form aggregates by introducing them into yeast cells. All the mutations known to cause early-onset Alzheimer's disease accelerated amyloid beta aggregation in the yeast. But the results also revealed previously unknown properties that control how fast aggregation occurs. In addition, they highlighted a number of positions in the amyloid beta sequence that act as 'gatekeepers'. In healthy brains, these gatekeepers prevent amyloid beta proteins from sticking together. When mutated, they drive the protein to form aggregates. This comprehensive dataset will help researchers understand how proteins form toxic aggregates, which could in turn help them find ways to prevent this from happening. By providing an 'atlas' of all possible amyloid beta mutations, the dataset will also help clinicians interpret any new mutations they encounter in patients. By showing whether or not a mutation speeds up aggregation, the atlas will help clinicians predict whether that mutation increases the risk of Alzheimer's disease.


Alzheimer Disease/genetics , Amyloid beta-Peptides/genetics , Amyloid/metabolism , Mutation , DNA Mutational Analysis , High-Throughput Nucleotide Sequencing , Plasmids , Saccharomyces cerevisiae/metabolism
4.
PLoS Genet ; 17(2): e1009353, 2021 02.
Article En | MEDLINE | ID: mdl-33524037

RNA structures are dynamic. As a consequence, mutational effects can be hard to rationalize with reference to a single static native structure. We reasoned that deep mutational scanning experiments, which couple molecular function to fitness, should capture mutational effects across multiple conformational states simultaneously. Here, we provide a proof-of-principle that this is indeed the case, using the self-splicing group I intron from Tetrahymena thermophila as a model system. We comprehensively mutagenized two 4-bp segments of the intron. These segments first come together to form the P1 extension (P1ex) helix at the 5' splice site. Following cleavage at the 5' splice site, the two halves of the helix dissociate to allow formation of an alternative helix (P10) at the 3' splice site. Using an in vivo reporter system that couples splicing activity to fitness in E. coli, we demonstrate that fitness is driven jointly by constraints on P1ex and P10 formation. We further show that patterns of epistasis can be used to infer the presence of intramolecular pleiotropy. Using a machine learning approach that allows quantification of mutational effects in a genotype-specific manner, we demonstrate that the fitness landscape can be deconvoluted to implicate P1ex or P10 as the effective genetic background in which molecular fitness is compromised or enhanced. Our results highlight deep mutational scanning as a tool to study alternative conformational states, with the capacity to provide critical insights into the structure, evolution and evolvability of RNAs as dynamic ensembles. Our findings also suggest that, in the future, deep mutational scanning approaches might help reverse-engineer multiple alternative or successive conformations from a single fitness landscape.


Introns/genetics , Mutation , RNA Splicing , RNA, Protozoan/genetics , RNA/genetics , Tetrahymena thermophila/genetics , Base Sequence , Evolution, Molecular , Genetic Fitness , Genetic Pleiotropy , Genotype , Kinetics , Machine Learning , Nucleic Acid Conformation , RNA/chemistry , RNA Splice Sites/genetics
5.
Genome Biol ; 21(1): 207, 2020 08 17.
Article En | MEDLINE | ID: mdl-32799905

Deep mutational scanning (DMS) enables multiplexed measurement of the effects of thousands of variants of proteins, RNAs, and regulatory elements. Here, we present a customizable pipeline, DiMSum, that represents an end-to-end solution for obtaining variant fitness and error estimates from raw sequencing data. A key innovation of DiMSum is the use of an interpretable error model that captures the main sources of variability arising in DMS workflows, outperforming previous methods. DiMSum is available as an R/Bioconda package and provides summary reports to help researchers diagnose common DMS pathologies and take remedial steps in their analyses.


DNA Mutational Analysis/methods , Molecular Diagnostic Techniques/methods , Mutation , Computational Biology , High-Throughput Nucleotide Sequencing/methods , Models, Genetic , Polymerase Chain Reaction , Proteins/genetics , Software
6.
Nat Commun ; 10(1): 4162, 2019 09 13.
Article En | MEDLINE | ID: mdl-31519910

Insoluble protein aggregates are the hallmarks of many neurodegenerative diseases. For example, aggregates of TDP-43 occur in nearly all cases of amyotrophic lateral sclerosis (ALS). However, whether aggregates cause cellular toxicity is still not clear, even in simpler cellular systems. We reasoned that deep mutagenesis might be a powerful approach to disentangle the relationship between aggregation and toxicity. We generated >50,000 mutations in the prion-like domain (PRD) of TDP-43 and quantified their toxicity in yeast cells. Surprisingly, mutations that increase hydrophobicity and aggregation strongly decrease toxicity. In contrast, toxic variants promote the formation of dynamic liquid-like condensates. Mutations have their strongest effects in a hotspot that genetic interactions reveal to be structured in vivo, illustrating how mutagenesis can probe the in vivo structures of unstructured proteins. Our results show that aggregation of TDP-43 is not harmful but protects cells, most likely by titrating the protein away from a toxic liquid-like phase.


Computational Biology/methods , Genomics/methods , Systems Biology/methods , Amyotrophic Lateral Sclerosis/genetics , Amyotrophic Lateral Sclerosis/metabolism , Humans , Hydrophobic and Hydrophilic Interactions , Mutation/genetics , Prions/genetics , Prions/metabolism
7.
Cell Syst ; 5(5): 471-484.e4, 2017 11 22.
Article En | MEDLINE | ID: mdl-29102610

Isogenic cells in a common environment show substantial cell-to-cell variation in gene expression, often referred to as "expression noise." Here, we use multiple single-cell RNA-sequencing datasets to identify features associated with high or low expression noise in mouse embryonic stem cells. These include the core promoter architecture of a gene, with CpG island promoters and a TATA box associated with low and high noise, respectively. High noise is also associated with "conflicting" chromatin states-the absence of transcription-associated histone modifications or the presence of repressive ones in active genes. Genes regulated by pluripotency factors through super-enhancers show high and correlated expression variability, consistent with fluctuations in the pluripotent state. Together, our results provide an integrated view of how core promoters, chromatin, regulation, and pluripotency fluctuations contribute to the variability of gene expression across individual stem cells.


Embryonic Stem Cells/physiology , Gene Expression/genetics , Animals , Chromatin/genetics , CpG Islands/genetics , Histone Code/genetics , Histones/genetics , Mice , Pluripotent Stem Cells/physiology , Promoter Regions, Genetic/genetics , Transcription, Genetic/genetics
8.
Nature ; 544(7648): 59-64, 2017 04 06.
Article En | MEDLINE | ID: mdl-28289288

The folding of genomic DNA from the beads-on-a-string-like structure of nucleosomes into higher-order assemblies is crucially linked to nuclear processes. Here we calculate 3D structures of entire mammalian genomes using data from a new chromosome conformation capture procedure that allows us to first image and then process single cells. The technique enables genome folding to be examined at a scale of less than 100 kb, and chromosome structures to be validated. The structures of individual topological-associated domains and loops vary substantially from cell to cell. By contrast, A and B compartments, lamina-associated domains and active enhancers and promoters are organized in a consistent way on a genome-wide basis in every cell, suggesting that they could drive chromosome and genome folding. By studying genes regulated by pluripotency factor and nucleosome remodelling deacetylase (NuRD), we illustrate how the determination of single-cell genome structure provides a new approach for investigating biological processes.


Chromatin Assembly and Disassembly , Genome , Molecular Imaging/methods , Nucleosomes/chemistry , Single-Cell Analysis/methods , Animals , CCCTC-Binding Factor , Cell Cycle Proteins/metabolism , Chromatin Assembly and Disassembly/genetics , Chromosomal Proteins, Non-Histone/metabolism , Chromosomes, Mammalian/chemistry , Chromosomes, Mammalian/genetics , Chromosomes, Mammalian/metabolism , DNA/chemistry , DNA/genetics , DNA/metabolism , Enhancer Elements, Genetic , G1 Phase , Gene Expression Regulation , Gene Regulatory Networks , Genome/genetics , Haploidy , Mi-2 Nucleosome Remodeling and Deacetylase Complex/metabolism , Mice , Models, Molecular , Molecular Conformation , Molecular Imaging/standards , Mouse Embryonic Stem Cells/cytology , Mouse Embryonic Stem Cells/metabolism , Nucleosomes/genetics , Nucleosomes/metabolism , Promoter Regions, Genetic , Repressor Proteins/metabolism , Reproducibility of Results , Single-Cell Analysis/standards , Cohesins
9.
Genome Res ; 25(4): 504-13, 2015 Apr.
Article En | MEDLINE | ID: mdl-25677180

In addition to mediating sister chromatid cohesion during the cell cycle, the cohesin complex associates with CTCF and with active gene regulatory elements to form long-range interactions between its binding sites. Genome-wide chromosome conformation capture had shown that cohesin's main role in interphase genome organization is in mediating interactions within architectural chromosome compartments, rather than specifying compartments per se. However, it remains unclear how cohesin-mediated interactions contribute to the regulation of gene expression. We have found that the binding of CTCF and cohesin is highly enriched at enhancers and in particular at enhancer arrays or "super-enhancers" in mouse thymocytes. Using local and global chromosome conformation capture, we demonstrate that enhancer elements associate not just in linear sequence, but also in 3D, and that spatial enhancer clustering is facilitated by cohesin. The conditional deletion of cohesin from noncycling thymocytes preserved enhancer position, H3K27ac, H4K4me1, and enhancer transcription, but weakened interactions between enhancers. Interestingly, ∼ 50% of deregulated genes reside in the vicinity of enhancer elements, suggesting that cohesin regulates gene expression through spatial clustering of enhancer elements. We propose a model for cohesin-dependent gene regulation in which spatial clustering of enhancer elements acts as a unified mechanism for both enhancer-promoter "connections" and "insulation."


Cell Cycle Proteins/genetics , Chromosomal Proteins, Non-Histone/genetics , Enhancer Elements, Genetic/genetics , Gene Expression Regulation/genetics , Multigene Family/genetics , Repressor Proteins/metabolism , Thymocytes/cytology , Animals , Binding Sites/genetics , CCCTC-Binding Factor , Cells, Cultured , Histones/genetics , Mice , Promoter Regions, Genetic/genetics , Protein Binding/genetics , Cohesins
10.
Elife ; 3: e02626, 2014 Oct 03.
Article En | MEDLINE | ID: mdl-25279814

As exome sequencing gives way to genome sequencing, the need to interpret the function of regulatory DNA becomes increasingly important. To test whether evolutionary conservation of cis-regulatory modules (CRMs) gives insight into human gene regulation, we determined transcription factor (TF) binding locations of four liver-essential TFs in liver tissue from human, macaque, mouse, rat, and dog. Approximately, two thirds of the TF-bound regions fell into CRMs. Less than half of the human CRMs were found as a CRM in the orthologous region of a second species. Shared CRMs were associated with liver pathways and disease loci identified by genome-wide association studies. Recurrent rare human disease causing mutations at the promoters of several blood coagulation and lipid metabolism genes were also identified within CRMs shared in multiple species. This suggests that multi-species analyses of experimentally determined combinatorial TF binding will help identify genomic regions critical for tissue-specific gene control.


Liver/metabolism , Mammals/metabolism , Signal Transduction , Transcription Factors/metabolism , Animals , Blood Coagulation/genetics , Chromatin Immunoprecipitation , Gene Regulatory Networks , Genome-Wide Association Study , Genomics , Humans , Lipid Metabolism/genetics , Male , Molecular Sequence Annotation , Organ Specificity , Phylogeny , Polymorphism, Single Nucleotide/genetics , Protein Binding , Regulatory Sequences, Nucleic Acid/genetics , Species Specificity
11.
Genome Res ; 23(12): 2066-77, 2013 Dec.
Article En | MEDLINE | ID: mdl-24002784

Chromosome conformation capture approaches have shown that interphase chromatin is partitioned into spatially segregated Mb-sized compartments and sub-Mb-sized topological domains. This compartmentalization is thought to facilitate the matching of genes and regulatory elements, but its precise function and mechanistic basis remain unknown. Cohesin controls chromosome topology to enable DNA repair and chromosome segregation in cycling cells. In addition, cohesin associates with active enhancers and promoters and with CTCF to form long-range interactions important for gene regulation. Although these findings suggest an important role for cohesin in genome organization, this role has not been assessed on a global scale. Unexpectedly, we find that architectural compartments are maintained in noncycling mouse thymocytes after genetic depletion of cohesin in vivo. Cohesin was, however, required for specific long-range interactions within compartments where cohesin-regulated genes reside. Cohesin depletion diminished interactions between cohesin-bound sites, whereas alternative interactions between chromatin features associated with transcriptional activation and repression became more prominent, with corresponding changes in gene expression. Our findings indicate that cohesin-mediated long-range interactions facilitate discrete gene expression states within preexisting chromosomal compartments.


Cell Cycle Proteins/physiology , Chromatin/genetics , Chromatin/metabolism , Chromosomal Proteins, Non-Histone/physiology , Gene Expression Regulation , Repressor Proteins/metabolism , Thymocytes/metabolism , Animals , CCCTC-Binding Factor , Cell Cycle/genetics , Chromosomes, Mammalian , DNA-Binding Proteins , Gene Dosage , Genome , Linear Models , Mice , Nuclear Proteins/metabolism , Phosphoproteins/metabolism , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid , Transcription Factors/metabolism , Cohesins
12.
Genome Biol ; 14(12): R148, 2013 Dec 31.
Article En | MEDLINE | ID: mdl-24380390

BACKGROUND: The genomic binding of CTCF is highly conserved across mammals, but the mechanisms that underlie its stability are poorly understood. One transcription factor known to functionally interact with CTCF in the context of X-chromosome inactivation is the ubiquitously expressed YY1. Because combinatorial transcription factor binding can contribute to the evolutionary stabilization of regulatory regions, we tested whether YY1 and CTCF co-binding could in part account for conservation of CTCF binding. RESULTS: Combined analysis of CTCF and YY1 binding in lymphoblastoid cell lines from seven primates, as well as in mouse and human livers, reveals extensive genome-wide co-localization specifically at evolutionarily stable CTCF-bound regions. CTCF-YY1 co-bound regions resemble regions bound by YY1 alone, as they enrich for active histone marks, RNA polymerase II and transcription factor binding. Although these highly conserved, transcriptionally active CTCF-YY1 co-bound regions are often promoter-proximal, gene-distal regions show similar molecular features. CONCLUSIONS: Our results reveal that these two ubiquitously expressed, multi-functional zinc-finger proteins collaborate in functionally active regions to stabilize one another's genome-wide binding across primate evolution.


Evolution, Molecular , Primates/genetics , Repressor Proteins/metabolism , YY1 Transcription Factor/metabolism , Animals , CCCTC-Binding Factor , Cell Line , Genome , Humans , Mice , Repressor Proteins/chemistry
13.
Genome Res ; 22(11): 2163-75, 2012 Nov.
Article En | MEDLINE | ID: mdl-22780989

The cohesin protein complex contributes to transcriptional regulation in a CTCF-independent manner by colocalizing with master regulators at tissue-specific loci. The regulation of transcription involves the concerted action of multiple transcription factors (TFs) and cohesin's role in this context of combinatorial TF binding remains unexplored. To investigate cohesin-non-CTCF (CNC) binding events in vivo we mapped cohesin and CTCF, as well as a collection of tissue-specific and ubiquitous transcriptional regulators using ChIP-seq in primary mouse liver. We observe a positive correlation between the number of distinct TFs bound and the presence of CNC sites. In contrast to regions of the genome where cohesin and CTCF colocalize, CNC sites coincide with the binding of master regulators and enhancer-markers and are significantly associated with liver-specific expressed genes. We also show that cohesin presence partially explains the commonly observed discrepancy between TF motif score and ChIP signal. Evidence from these statistical analyses in wild-type cells, and comparisons to maps of TF binding in Rad21-cohesin haploinsufficient mouse liver, suggests that cohesin helps to stabilize large protein-DNA complexes. Finally, we observe that the presence of mirrored CTCF binding events at promoters and their nearby cohesin-bound enhancers is associated with elevated expression levels.


Cell Cycle Proteins/metabolism , Chromosomal Proteins, Non-Histone/metabolism , Gene Regulatory Networks , Transcription, Genetic , Animals , CCCTC-Binding Factor , Chromatin Immunoprecipitation , DNA-Binding Proteins , Genome , Haploinsufficiency , Mice , Mice, Inbred C57BL , Nuclear Proteins/genetics , Nuclear Proteins/metabolism , Organ Specificity , Phosphoproteins/genetics , Phosphoproteins/metabolism , Promoter Regions, Genetic , Protein Binding , Repressor Proteins/metabolism , Sequence Analysis, DNA , Transcription Factors/metabolism , Up-Regulation , Cohesins
14.
BMC Bioinformatics ; 12: 29, 2011 Jan 24.
Article En | MEDLINE | ID: mdl-21261946

BACKGROUND: In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research. RESULTS: We show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene http://www.cbio.uct.ac.za/indygene, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs. CONCLUSIONS: The Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.


Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Software , Algorithms , Arabidopsis Proteins/genetics , Models, Statistical , User-Computer Interface
...