Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 72
Filter
1.
Cell ; 183(7): 1742-1756, 2020 12 23.
Article in English | MEDLINE | ID: mdl-33357399

ABSTRACT

It is unclear how disease mutations impact intrinsically disordered protein regions (IDRs), which lack a stable folded structure. These mutations, while prevalent in disease, are frequently neglected or annotated as variants of unknown significance. Biomolecular phase separation, a physical process often mediated by IDRs, has increasingly appreciated roles in cellular organization and regulation. We find that autism spectrum disorder (ASD)- and cancer-associated proteins are enriched for predicted phase separation propensities, suggesting that IDR mutations disrupt phase separation in key cellular processes. More generally, we hypothesize that combinations of small-effect IDR mutations perturb phase separation, potentially contributing to "missing heritability" in complex disease susceptibility.


Subject(s)
Disease/genetics , Mutation/genetics , Chromatin/metabolism , Humans , Intrinsically Disordered Proteins/genetics , Models, Biological , Proteome/metabolism
2.
Cell ; 181(4): 818-831.e19, 2020 05 14.
Article in English | MEDLINE | ID: mdl-32359423

ABSTRACT

Cells sense elevated temperatures and mount an adaptive heat shock response that involves changes in gene expression, but the underlying mechanisms, particularly on the level of translation, remain unknown. Here we report that, in budding yeast, the essential translation initiation factor Ded1p undergoes heat-induced phase separation into gel-like condensates. Using ribosome profiling and an in vitro translation assay, we reveal that condensate formation inactivates Ded1p and represses translation of housekeeping mRNAs while promoting translation of stress mRNAs. Testing a variant of Ded1p with altered phase behavior as well as Ded1p homologs from diverse species, we demonstrate that Ded1p condensation is adaptive and fine-tuned to the maximum growth temperature of the respective organism. We conclude that Ded1p condensation is an integral part of an extended heat shock response that selectively represses translation of housekeeping mRNAs to promote survival under conditions of severe heat stress.


Subject(s)
DEAD-box RNA Helicases/metabolism , Gene Expression Regulation, Fungal/genetics , Protein Biosynthesis/genetics , Saccharomyces cerevisiae Proteins/metabolism , DEAD-box RNA Helicases/physiology , Gene Expression/genetics , Genes, Essential/genetics , Heat-Shock Proteins/metabolism , Heat-Shock Response/genetics , RNA, Messenger/metabolism , Ribosomes/metabolism , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/physiology
3.
Proc Natl Acad Sci U S A ; 120(44): e2304302120, 2023 10 31.
Article in English | MEDLINE | ID: mdl-37878721

ABSTRACT

The AlphaFold Protein Structure Database contains predicted structures for millions of proteins. For the majority of human proteins that contain intrinsically disordered regions (IDRs), which do not adopt a stable structure, it is generally assumed that these regions have low AlphaFold2 confidence scores that reflect low-confidence structural predictions. Here, we show that AlphaFold2 assigns confident structures to nearly 15% of human IDRs. By comparison to experimental NMR data for a subset of IDRs that are known to conditionally fold (i.e., upon binding or under other specific conditions), we find that AlphaFold2 often predicts the structure of the conditionally folded state. Based on databases of IDRs that are known to conditionally fold, we estimate that AlphaFold2 can identify conditionally folding IDRs at a precision as high as 88% at a 10% false positive rate, which is remarkable considering that conditionally folded IDR structures were minimally represented in its training data. We find that human disease mutations are nearly fivefold enriched in conditionally folded IDRs over IDRs in general and that up to 80% of IDRs in prokaryotes are predicted to conditionally fold, compared to less than 20% of eukaryotic IDRs. These results indicate that a large majority of IDRs in the proteomes of human and other eukaryotes function in the absence of conditional folding, but the regions that do acquire folds are more sensitive to mutations. We emphasize that the AlphaFold2 predictions do not reveal functionally relevant structural plasticity within IDRs and cannot offer realistic ensemble representations of conditionally folded IDRs.


Subject(s)
Intrinsically Disordered Proteins , Humans , Intrinsically Disordered Proteins/chemistry , Eukaryota/metabolism , Protein Conformation
4.
Bioinformatics ; 40(4)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38588559

ABSTRACT

MOTIVATION: Supervised deep learning is used to model the complex relationship between genomic sequence and regulatory function. Understanding how these models make predictions can provide biological insight into regulatory functions. Given the complexity of the sequence to regulatory function mapping (the cis-regulatory code), it has been suggested that the genome contains insufficient sequence variation to train models with suitable complexity. Data augmentation is a widely used approach to increase the data variation available for model training, however current data augmentation methods for genomic sequence data are limited. RESULTS: Inspired by the success of comparative genomics, we show that augmenting genomic sequences with evolutionarily related sequences from other species, which we term phylogenetic augmentation, improves the performance of deep learning models trained on regulatory genomic sequences to predict high-throughput functional assay measurements. Additionally, we show that phylogenetic augmentation can rescue model performance when the training set is down-sampled and permits deep learning on a real-world small dataset, demonstrating that this approach improves data efficiency. Overall, this data augmentation method represents a solution for improving model performance that is applicable to many supervised deep-learning problems in genomics. AVAILABILITY AND IMPLEMENTATION: The open-source GitHub repository agduncan94/phylogenetic_augmentation_paper includes the code for rerunning the analyses here and recreating the figures.


Subject(s)
Deep Learning , Genomics , Phylogeny , Genomics/methods , Supervised Machine Learning , Humans
5.
Chem Rev ; 123(14): 9036-9064, 2023 07 26.
Article in English | MEDLINE | ID: mdl-36662637

ABSTRACT

Stress granules (SGs) are cytosolic biomolecular condensates that form in response to cellular stress. Weak, multivalent interactions between their protein and RNA constituents drive their rapid, dynamic assembly through phase separation coupled to percolation. Though a consensus model of SG function has yet to be determined, their perceived implication in cytoprotective processes (e.g., antiviral responses and inhibition of apoptosis) and possible role in the pathogenesis of various neurodegenerative diseases (e.g., amyotrophic lateral sclerosis and frontotemporal dementia) have drawn great interest. Consequently, new studies using numerous cell biological, genetic, and proteomic methods have been performed to unravel the mechanisms underlying SG formation, organization, and function and, with them, a more clearly defined SG proteome. Here, we provide a consensus SG proteome through literature curation and an update of the user-friendly database RNAgranuleDB to version 2.0 (http://rnagranuledb.lunenfeld.ca/). With this updated SG proteome, we use next-generation phase separation prediction tools to assess the predisposition of SG proteins for phase separation and aggregation. Next, we analyze the primary sequence features of intrinsically disordered regions (IDRs) within SG-resident proteins. Finally, we review the protein- and RNA-level determinants, including post-translational modifications (PTMs), that regulate SG composition and assembly/disassembly dynamics.


Subject(s)
Amyotrophic Lateral Sclerosis , Proteome , Humans , Proteomics , Stress Granules , Amyotrophic Lateral Sclerosis/pathology , RNA
6.
Genome Res ; 31(4): 564-575, 2021 04.
Article in English | MEDLINE | ID: mdl-33712417

ABSTRACT

Transcriptional enhancers are critical for development and phenotype evolution and are often mutated in disease contexts; however, even in well-studied cell types, the sequence code conferring enhancer activity remains unknown. To examine the enhancer regulatory code for pluripotent stem cells, we identified genomic regions with conserved binding of multiple transcription factors in mouse and human embryonic stem cells (ESCs). Examination of these regions revealed that they contain on average 12.6 conserved transcription factor binding site (TFBS) sequences. Enriched TFBSs are a diverse repertoire of 70 different sequences representing the binding sequences of both known and novel ESC regulators. Using a diverse set of TFBSs from this repertoire was sufficient to construct short synthetic enhancers with activity comparable to native enhancers. Site-directed mutagenesis of conserved TFBSs in endogenous enhancers or TFBS deletion from synthetic sequences revealed a requirement for 10 or more different TFBSs. Furthermore, specific TFBSs, including the POU5F1:SOX2 comotif, are dispensable, despite cobinding the POU5F1 (also known as OCT4), SOX2, and NANOG master regulators of pluripotency. These findings reveal that a TFBS sequence diversity threshold overrides the need for optimized regulatory grammar and individual TFBSs that recruit specific master regulators.


Subject(s)
Embryonic Stem Cells/metabolism , Enhancer Elements, Genetic , Transcription Factors/metabolism , Animals , Binding Sites , Humans , Mice , Pluripotent Stem Cells/metabolism
7.
PLoS Genet ; 17(9): e1009629, 2021 09.
Article in English | MEDLINE | ID: mdl-34506483

ABSTRACT

Stochastic signaling dynamics expand living cells' information processing capabilities. An increasing number of studies report that regulators encode information in their pulsatile dynamics. The evolutionary mechanisms that lead to complex signaling dynamics remain uncharacterized, perhaps because key interactions of signaling proteins are encoded in intrinsically disordered regions (IDRs), whose evolution is difficult to analyze. Here we focused on the IDR that controls the stochastic pulsing dynamics of Crz1, a transcription factor in fungi downstream of the widely conserved calcium signaling pathway. We find that Crz1 IDRs from anciently diverged fungi can all respond transiently to calcium stress; however, only Crz1 IDRs from the Saccharomyces clade support pulsatility, encode extra information, and rescue fitness in competition assays, while the Crz1 IDRs from distantly related fungi do none of the three. On the other hand, we find that Crz1 pulsing is conserved in the distantly related fungi, consistent with the evolutionary model of stabilizing selection on the signaling phenotype. Further, we show that a calcineurin docking site in a specific part of the IDRs appears to be sufficient for pulsing and show evidence for a beneficial increase in the relative calcineurin affinity of this docking site. We propose that evolutionary flexibility of functionally divergent IDRs underlies the conservation of stochastic signaling by stabilizing selection.


Subject(s)
Intrinsically Disordered Proteins/metabolism , Signal Transduction , Stochastic Processes , DNA-Binding Proteins/metabolism , Evolution, Molecular , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Transcription Factors/metabolism
8.
PLoS Comput Biol ; 18(9): e1010452, 2022 09.
Article in English | MEDLINE | ID: mdl-36074804

ABSTRACT

Constraint-based modeling is a powerful framework for studying cellular metabolism, with applications ranging from predicting growth rates and optimizing production of high value metabolites to identifying enzymes in pathogens that may be targeted for therapeutic interventions. Results from modeling experiments can be affected at least in part by the quality of the metabolic models used. Reconstructing a metabolic network manually can produce a high-quality metabolic model but is a time-consuming task. At the same time, current methods for automating the process typically transfer metabolic function based on sequence similarity, a process known to produce many false positives. We created Architect, a pipeline for automatic metabolic model reconstruction from protein sequences. First, it performs enzyme annotation through an ensemble approach, whereby a likelihood score is computed for an EC prediction based on predictions from existing tools; for this step, our method shows both increased precision and recall compared to individual tools. Next, Architect uses these annotations to construct a high-quality metabolic network which is then gap-filled based on likelihood scores from the ensemble approach. The resulting metabolic model is output in SBML format, suitable for constraints-based analyses. Through comparisons of enzyme annotations and curated metabolic models, we demonstrate improved performance of Architect over other state-of-the-art tools, notably with higher precision and recall on the eukaryote C. elegans and when compared to UniProt annotations in two bacterial species. Code for Architect is available at https://github.com/ParkinsonLab/Architect. For ease-of-use, Architect can be readily set up and utilized using its Docker image, maintained on Docker Hub.


Subject(s)
Caenorhabditis elegans , Metabolic Networks and Pathways , Animals , Bacteria , Molecular Sequence Annotation
9.
PLoS Comput Biol ; 18(6): e1010238, 2022 06.
Article in English | MEDLINE | ID: mdl-35767567

ABSTRACT

A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call "reverse homology", exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences.


Subject(s)
Intrinsically Disordered Proteins , Proteome , Amino Acid Sequence , Evolution, Molecular , Intrinsically Disordered Proteins/chemistry , Protein Conformation , Proteome/metabolism
10.
Bioinformatics ; 35(21): 4525-4527, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31095270

ABSTRACT

SUMMARY: We introduce YeastSpotter, a web application for the segmentation of yeast microscopy images into single cells. YeastSpotter is user-friendly and generalizable, reducing the computational expertise required for this critical preprocessing step in many image analysis pipelines. AVAILABILITY AND IMPLEMENTATION: YeastSpotter is available at http://yeastspotter.csb.utoronto.ca/. Code is available at https://github.com/alexxijielu/yeast_segmentation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Microscopy , Software , Cell Count , Saccharomyces cerevisiae
11.
Bioinformatics ; 35(18): 3232-3239, 2019 09 15.
Article in English | MEDLINE | ID: mdl-30753279

ABSTRACT

MOTIVATION: Mammalian genomes can contain thousands of enhancers but only a subset are actively driving gene expression in a given cellular context. Integrated genomic datasets can be harnessed to predict active enhancers. One challenge in integration of large genomic datasets is the increasing heterogeneity: continuous, binary and discrete features may all be relevant. Coupled with the typically small numbers of training examples, semi-supervised approaches for heterogeneous data are needed; however, current enhancer prediction methods are not designed to handle heterogeneous data in the semi-supervised paradigm. RESULTS: We implemented a Dirichlet Process Heterogeneous Mixture model that infers Gaussian, Bernoulli and Poisson distributions over features. We derived a novel variational inference algorithm to handle semi-supervised learning tasks where certain observations are forced to cluster together. We applied this model to enhancer candidates in mouse heart tissues based on heterogeneous features. We constrained a small number of known active enhancers to appear in the same cluster, and 47 additional regions clustered with them. Many of these are located near heart-specific genes. The model also predicted 1176 active promoters, suggesting that it can discover new enhancers and promoters. AVAILABILITY AND IMPLEMENTATION: We created the 'dphmix' Python package: https://pypi.org/project/dphmix/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome , Genomics , Heart , Animals , Cluster Analysis , Humans , Mice , Software , Supervised Machine Learning
12.
Biochem Soc Trans ; 48(5): 2151-2158, 2020 10 30.
Article in English | MEDLINE | ID: mdl-32985656

ABSTRACT

What do we know about the molecular evolution of functional protein condensation? The capacity of proteins to form biomolecular condensates (compact, protein-rich states, not bound by membranes, but still separated from the rest of the contents of the cell) appears in many cases to be bestowed by weak, transient interactions within one or between proteins. Natural selection is expected to remove or fix amino acid changes, insertions or deletions that preserve and change this condensation capacity when doing so is beneficial to the cell. A few recent studies have begun to explore this frontier of phylogenetics at the intersection of biophysics and cell biology.


Subject(s)
Biophysics/methods , Evolution, Molecular , Phylogeny , Proteins/chemistry , Amino Acids/chemistry , Amino Acids/metabolism , Animals , Biophysical Phenomena , Caenorhabditis elegans , Cell Biology , DEAD-box RNA Helicases/chemistry , Gene Deletion , Humans , Models, Biological , Multigene Family , Mutation , Protein Interaction Mapping , Saccharomyces cerevisiae
13.
PLoS Comput Biol ; 15(9): e1007348, 2019 09.
Article in English | MEDLINE | ID: mdl-31479439

ABSTRACT

Cellular microscopy images contain rich insights about biology. To extract this information, researchers use features, or measurements of the patterns of interest in the images. Here, we introduce a convolutional neural network (CNN) to automatically design features for fluorescence microscopy. We use a self-supervised method to learn feature representations of single cells in microscopy images without labelled training data. We train CNNs on a simple task that leverages the inherent structure of microscopy images and controls for variation in cell morphology and imaging: given one cell from an image, the CNN is asked to predict the fluorescence pattern in a second different cell from the same image. We show that our method learns high-quality features that describe protein expression patterns in single cells both yeast and human microscopy datasets. Moreover, we demonstrate that our features are useful for exploratory biological analysis, by capturing high-resolution cellular components in a proteome-wide cluster analysis of human proteins, and by quantifying multi-localized proteins and single-cell variability. We believe paired cell inpainting is a generalizable method to obtain feature representations of single cells in multichannel microscopy images.


Subject(s)
Microscopy/methods , Single-Cell Analysis/methods , Unsupervised Machine Learning , Cells, Cultured , Computational Biology , Humans , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Yeasts/cytology
14.
Proc Natl Acad Sci U S A ; 114(8): E1450-E1459, 2017 02 21.
Article in English | MEDLINE | ID: mdl-28167781

ABSTRACT

Intrinsically disordered regions (IDRs) are characterized by their lack of stable secondary or tertiary structure and comprise a large part of the eukaryotic proteome. Although these regions play a variety of signaling and regulatory roles, they appear to be rapidly evolving at the primary sequence level. To understand the functional implications of this rapid evolution, we focused on a highly diverged IDR in Saccharomyces cerevisiae that is involved in regulating multiple conserved MAPK pathways. We hypothesized that under stabilizing selection, the functional output of orthologous IDRs could be maintained, such that diverse genotypes could lead to similar function and fitness. Consistent with the stabilizing selection hypothesis, we find that diverged, orthologous IDRs can mostly recapitulate wild-type function and fitness in S. cerevisiae We also find that the electrostatic charge of the IDR is correlated with signaling output and, using phylogenetic comparative methods, find evidence for selection maintaining this quantitative molecular trait despite underlying genotypic divergence.


Subject(s)
Intrinsically Disordered Proteins/metabolism , Amino Acid Sequence , Phylogeny , Protein Conformation , Proteome/metabolism , Saccharomyces cerevisiae/metabolism , Signal Transduction/physiology
15.
PLoS Genet ; 13(4): e1006735, 2017 04.
Article in English | MEDLINE | ID: mdl-28410373

ABSTRACT

Regulatory networks often increase in complexity during evolution through gene duplication and divergence of component proteins. Two models that explain this increase in complexity are: 1) adaptive changes after gene duplication, such as resolution of adaptive conflicts, and 2) non-adaptive processes such as duplication, degeneration and complementation. Both of these models predict complementary changes in the retained duplicates, but they can be distinguished by direct fitness measurements in organisms with short generation times. Previously, it has been observed that repeated duplication of an essential protein in the spindle checkpoint pathway has occurred multiple times over the eukaryotic tree of life, leading to convergent protein domain organization in its duplicates. Here, we replace the paralog pair in S. cerevisiae with a single-copy protein from a species that did not undergo gene duplication. Surprisingly, using quantitative fitness measurements in laboratory conditions stressful for the spindle-checkpoint pathway, we find no evidence that reorganization of protein function after gene duplication is beneficial. We then reconstruct several evolutionary intermediates from the inferred ancestral network to the extant one, and find that, at the resolution of our assay, there exist stepwise mutational paths from the single protein to the divergent pair of extant proteins with no apparent fitness defects. Parallel evolution has been taken as strong evidence for natural selection, but our results suggest that even in these cases, reorganization of protein function after gene duplication may be explained by neutral processes.


Subject(s)
Directed Molecular Evolution , Genetic Drift , Genetic Fitness , Selection, Genetic/genetics , Gene Deletion , Gene Duplication , Green Fluorescent Proteins/genetics , M Phase Cell Cycle Checkpoints/genetics , Nucleotide Motifs/genetics , Saccharomyces cerevisiae/genetics
16.
Entropy (Basel) ; 21(7)2019 Jul 06.
Article in English | MEDLINE | ID: mdl-33267376

ABSTRACT

Bioinformatics and biophysical studies of intrinsically disordered proteins and regions (IDRs) note the high entropy at individual sequence positions and in conformations sampled in solution. This prevents application of the canonical sequence-structure-function paradigm to IDRs and motivates the development of new methods to extract information from IDR sequences. We argue that the information in IDR sequences cannot be fully revealed through positional conservation, which largely measures stable structural contacts and interaction motifs. Instead, considerations of evolutionary conservation of molecular features can reveal the full extent of information in IDRs. Experimental quantification of the large conformational entropy of IDRs is challenging but can be approximated through the extent of conformational sampling measured by a combination of NMR spectroscopy and lower-resolution structural biology techniques, which can be further interpreted with simulations. Conformational entropy and other biophysical features can be modulated by post-translational modifications that provide functional advantages to IDRs by tuning their energy landscapes and enabling a variety of functional interactions and modes of regulation. The diverse mosaic of functional states of IDRs and their conformational features within complexes demands novel metrics of information, which will reflect the complicated sequence-conformational ensemble-function relationship of IDRs.

17.
BMC Bioinformatics ; 19(1): 65, 2018 02 27.
Article in English | MEDLINE | ID: mdl-29482494

ABSTRACT

BACKGROUND: Crm1-dependent Nuclear Export Signals (NESs) are clusters of alternating hydrophobic and non-hydrophobic amino acid residues between 10 to 15 amino acids in length. NESs were largely thought to follow simple consensus patterns, based on which they were categorized into 6-10 classes. However, newly discovered NESs often deviate from the established consensus patterns. Thus, identifying NESs within protein sequences remains a bioinformatics challenge. RESULTS: We describe a probabilistic representation of NESs using a new generative model we call NoLogo that can account for a large diversity of NESs. Using this model to predict NESs, we demonstrate improved performance over PSSM and GLAM2 models, but do not achieve the performance of the state-of-the-art NES predictor LocNES. Our findings illustrate that over 30% of NESs are best described by novel NES classes rather than the 6-10 classes proposed by current/existing models. Finally, many NESs have additional hydrophobic residues either upstream or downstream of the canonical four residues, suggesting possible functionality. CONCLUSION: Applying the NoLogo model highlights the observation that NESs are more diverse than previously appreciated. Our work questions the practice of assigning each NES to one of several predefined NES classes. Finally, our analysis suggests a novel and testable biophysical perspective on interaction between Crm1 receptor and Crm1-dependent NESs.


Subject(s)
Karyopherins/metabolism , Models, Statistical , Nuclear Export Signals , Receptors, Cytoplasmic and Nuclear/metabolism , Software , Amino Acid Sequence , Cluster Analysis , Humans , Hydrophobic and Hydrophilic Interactions , Karyopherins/chemistry , Markov Chains , Position-Specific Scoring Matrices , Probability , Receptors, Cytoplasmic and Nuclear/chemistry , Saccharomyces cerevisiae/metabolism , Exportin 1 Protein
18.
PLoS Biol ; 13(5): e1002146, 2015 May.
Article in English | MEDLINE | ID: mdl-25966461

ABSTRACT

Eukaryotic cells commonly use protein kinases in signaling systems that relay information and control a wide range of processes. These enzymes have a fundamentally similar structure, but achieve functional diversity through variable regions that determine how the catalytic core is activated and recruited to phosphorylation targets. "Hippo" pathways are ancient protein kinase signaling systems that control cell proliferation and morphogenesis; the NDR/LATS family protein kinases, which associate with "Mob" coactivator proteins, are central but incompletely understood components of these pathways. Here we describe the crystal structure of budding yeast Cbk1-Mob2, to our knowledge the first of an NDR/LATS kinase-Mob complex. It shows a novel coactivator-organized activation region that may be unique to NDR/LATS kinases, in which a key regulatory motif apparently shifts from an inactive binding mode to an active one upon phosphorylation. We also provide a structural basis for a substrate docking mechanism previously unknown in AGC family kinases, and show that docking interaction provides robustness to Cbk1's regulation of its two known in vivo substrates. Co-evolution of docking motifs and phosphorylation consensus sites strongly indicates that a protein is an in vivo regulatory target of this hippo pathway, and predicts a new group of high-confidence Cbk1 substrates that function at sites of cytokinesis and cell growth. Moreover, docking peptides arise in unstructured regions of proteins that are probably already kinase substrates, suggesting a broad sequential model for adaptive acquisition of kinase docking in rapidly evolving intrinsically disordered polypeptides.


Subject(s)
Cell Cycle Proteins/metabolism , Intracellular Signaling Peptides and Proteins/metabolism , Molecular Docking Simulation , Protein Serine-Threonine Kinases/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Amino Acid Motifs , Cell Cycle Proteins/chemistry , Conserved Sequence , Intracellular Signaling Peptides and Proteins/chemistry , Phosphorylation , Protein Serine-Threonine Kinases/chemistry , Saccharomyces cerevisiae , Saccharomyces cerevisiae Proteins/chemistry
19.
Mol Biol Evol ; 33(6): 1478-85, 2016 06.
Article in English | MEDLINE | ID: mdl-26882985

ABSTRACT

Characteristics of pseudogene degeneration at the coding level are well-known, such as a shift toward neutral rates of nonsynonymous substitutions and gain of frameshift mutations. In contrast, degeneration of pseudogene transcriptional regulation is not well understood. Here, we test two predictions of regulatory degeneration along a pseudogenized lineage: 1) Decreased transcription factor (TF) binding and 2) accelerated evolution in putative cis-regulatory regions.We find evidence for decreased TF binding levels nearby two primate pseudogenes compared with functional liver genes. However, the majority of TF-bound sequences nearby pseudogenes do not show evidence for lineage-specific accelerated rates of evolution. We conclude that decreases in TF binding level could be a marker for regulatory degeneration, while sequence degeneration in primate cis-regulatory modules may be obscured by background rates of TF binding site turnover.


Subject(s)
Gene Expression Regulation , Primates/genetics , Regulatory Elements, Transcriptional , Animals , Biological Evolution , Evolution, Molecular , Humans , Macaca , Models, Genetic , Protein Binding , Pseudogenes , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Transcription Factors/genetics
20.
Nucleic Acids Res ; 43(21): 10180-9, 2015 Dec 02.
Article in English | MEDLINE | ID: mdl-26527718

ABSTRACT

The protein-DNA interactions between transcription factors and transcription factor binding sites are essential activities in gene regulation. To decipher the binding codes, it is a long-standing challenge to understand the binding mechanism across different transcription factor DNA binding families. Past computational learning studies usually focus on learning and predicting the DNA binding residues on protein side. Taking into account both sides (protein and DNA), we propose and describe a computational study for learning the specificity-determining residue-nucleotide interactions of different known DNA-binding domain families. The proposed learning models are compared to state-of-the-art models comprehensively, demonstrating its competitive learning performance. In addition, we describe and propose two applications which demonstrate how the learnt models can provide meaningful insights into protein-DNA interactions across different DNA binding families.


Subject(s)
DNA-Binding Proteins/chemistry , DNA/chemistry , Sequence Analysis, Protein/methods , Binding Sites , Computational Biology/methods , DNA/metabolism , DNA-Binding Proteins/metabolism , Humans , Machine Learning , Models, Molecular , Nucleotide Motifs , Position-Specific Scoring Matrices , Protein Binding , Protein Structure, Tertiary , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL