Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
1.
Open Biol ; 14(6): 230439, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38862022

RESUMO

Volatile low complexity regions (LCRs) are a novel source of adaptive variation, functional diversification and evolutionary novelty. An interplay of selection and mutation governs the composition and length of low complexity regions. High %GC and mutations provide length variability because of mechanisms like replication slippage. Owing to the complex dynamics between selection and mutation, we need a better understanding of their coexistence. Our findings underscore that positively selected sites (PSS) and low complexity regions prefer the terminal regions of genes, co-occurring in most Tetrapoda clades. We observed that positively selected sites within a gene have position-specific roles. Central-positively selected site genes primarily participate in defence responses, whereas terminal-positively selected site genes exhibit non-specific functions. Low complexity region-containing genes in the Tetrapoda clade exhibit a significantly higher %GC and lower ω (dN/dS: non-synonymous substitution rate/synonymous substitution rate) compared with genes without low complexity regions. This lower ω implies that despite providing rapid functional diversity, low complexity region-containing genes are subjected to intense purifying selection. Furthermore, we observe that low complexity regions consistently display ubiquitous prevalence at lower purity levels, but exhibit a preference for specific positions within a gene as the purity of the low complexity region stretch increases, implying a composition-dependent evolutionary role. Our findings collectively contribute to the understanding of how genetic diversity and adaptation are shaped by the interplay of selection and low complexity regions in the Tetrapoda clade.


Assuntos
Evolução Molecular , Seleção Genética , Animais , Mutação , Filogenia , Proteínas/genética , Proteínas/química , Composição de Bases
2.
Genes (Basel) ; 15(5)2024 05 16.
Artigo em Inglês | MEDLINE | ID: mdl-38790262

RESUMO

Intermediate filaments (IFs) are integral components of the cytoskeleton which provide cells with tissue-specific mechanical properties and are involved in a plethora of cellular processes. Unfortunately, due to their intricate architecture, the 3D structure of the complete molecule of IFs has remained unresolved. Even though most of the rod domain structure has been revealed by means of crystallographic analyses, the flanked head and tail domains are still mostly unknown. Only recently have studies shed light on head or tail domains of IFs, revealing certainsecondary structures and conformational changes during IF assembly. Thus, a deeper understanding of their structure could provide insights into their function.


Assuntos
Filamentos Intermediários , Domínios Proteicos , Filamentos Intermediários/metabolismo , Filamentos Intermediários/genética , Filamentos Intermediários/química , Humanos , Animais , Proteínas de Filamentos Intermediários/genética , Proteínas de Filamentos Intermediários/química , Proteínas de Filamentos Intermediários/metabolismo , Citoesqueleto , Modelos Moleculares
3.
Int J Mol Sci ; 25(7)2024 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-38612505

RESUMO

SARS-CoV-2 has accumulated many mutations since its emergence in late 2019. Nucleotide substitutions leading to amino acid replacements constitute the primary material for natural selection. Insertions, deletions, and substitutions appear to be critical for coronavirus's macro- and microevolution. Understanding the molecular mechanisms of mutations in the mutational hotspots (positions, loci with recurrent mutations, and nucleotide context) is important for disentangling roles of mutagenesis and selection. In the SARS-CoV-2 genome, deletions and insertions are frequently associated with repetitive sequences, whereas C>U substitutions are often surrounded by nucleotides resembling the APOBEC mutable motifs. We describe various approaches to mutation spectra analyses, including the context features of RNAs that are likely to be involved in the generation of recurrent mutations. We also discuss the interplay between mutations and natural selection as a complex evolutionary trend. The substantial variability and complexity of pipelines for the reconstruction of mutations and the huge number of genomic sequences are major problems for the analyses of mutations in the SARS-CoV-2 genome. As a solution, we advocate for the development of a centralized database of predicted mutations, which needs to be updated on a regular basis.


Assuntos
COVID-19 , Humanos , COVID-19/genética , SARS-CoV-2/genética , Mutagênese , Mutação , Nucleotídeos
4.
Comput Struct Biotechnol J ; 21: 5408-5412, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38022702

RESUMO

PolyXY regions are compositionally biased regions composed of two different amino acids. They are classified according to the arrangement of the two amino acid types 'X' and 'Y' into direpeats (composed of alternating amino acids, e.g. 'XYXYXY'), joined (composed of two consecutive stretches of each amino acid, e.g. 'XXXYYY') and shuffled (other arrangements, e.g., 'XYXXYY'). They have been characterized at the amino acid level in all domains of life, and are described as often found within intrinsically disordered regions. Since DNA replication slippage has been proposed as a driver of repeat variation, and given that some polyXY have a repetitive nature, we hypothesized that characterizing the nucleotide coding of various types of polyXY could give hints about their origin and evolution. To test this, we obtained all polyXY regions in the human transcriptome, categorized them, and studied their coding nucleotide sequences. We observed that polyXY exacerbates the codon biases, and that the similarity between the X and Y codons is higher than in the background proteome. Our results support a general mechanism of emergence and evolution of polyXY from single-codon polyX. PolyXY are revealed as hotspots for replication slippage, particularly those composed of repeats: joined and direpeat polyXY. Inter-conversion to shuffled polyXY disrupts nucleotide repeats and restricts further evolution by replication slippage, a mechanism that we previously observed in polyX. Our results shed light on polyXY composition and should simplify the determination of their functions.

5.
Biophys Rev ; 15(5): 1367-1378, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37974990

RESUMO

We review current methods and bioinformatics tools for the text complexity estimates (information and entropy measures). The search DNA regions with extreme statistical characteristics such as low complexity regions are important for biophysical models of chromosome function and gene transcription regulation in genome scale. We discuss the complexity profiling for segmentation and delineation of genome sequences, search for genome repeats and transposable elements, and applications to next-generation sequencing reads. We review the complexity methods and new applications fields: analysis of mutation hotspots loci, analysis of short sequencing reads with quality control, and alignment-free genome comparisons. The algorithms implementing various numerical measures of text complexity estimates including combinatorial and linguistic measures have been developed before genome sequencing era. The series of tools to estimate sequence complexity use compression approaches, mainly by modification of Lempel-Ziv compression. Most of the tools are available online providing large-scale service for whole genome analysis. Novel machine learning applications for classification of complete genome sequences also include sequence compression and complexity algorithms. We present comparison of the complexity methods on the different sequence sets, the applications for gene transcription regulatory regions analysis. Furthermore, we discuss approaches and application of sequence complexity for proteins. The complexity measures for amino acid sequences could be calculated by the same entropy and compression-based algorithms. But the functional and evolutionary roles of low complexity regions in protein have specific features differing from DNA. The tools for protein sequence complexity aimed for protein structural constraints. It was shown that low complexity regions in protein sequences are conservative in evolution and have important biological and structural functions. Finally, we summarize recent findings in large scale genome complexity comparison and applications for coronavirus genome analysis.

6.
Genes (Basel) ; 14(9)2023 08 28.
Artigo em Inglês | MEDLINE | ID: mdl-37761851

RESUMO

Intrinsically disordered regions (IDRs) in protein sequences are emerging as functionally important elements for interaction and regulation. While being generally flexible, we previously showed, by observation of experimentally obtained structures, that they contain regions of reduced sequence complexity that have an increased propensity to form structure. Here we expand the universe of cases taking advantage of structural predictions by AlphaFold. Our studies focus on low complexity regions (LCRs) found within IDRs, where these LCRs have only one or two residue types (polyX and polyXY, respectively). In addition to confirming previous observations that polyE and polyEK have a tendency towards helical structure, we find a similar tendency for other LCRs such as polyQ and polyER, most of them including charged residues. We analyzed the position of polyXY containing IDRs within proteins, which allowed us to show that polyAG and polyAK accumulate at the N-terminal, with the latter showing increased helical propensity at that location. Functional enrichment analysis of polyXY with helical propensity indicated functions requiring interaction with RNA and DNA. Our work adds evidence of the function of LCRs in interaction-dependent structuring of disordered regions, encouraging the development of tools for the prediction of their dynamic structural properties.


Assuntos
RNA , Sequência de Aminoácidos , Domínios Proteicos
7.
Cell Rep ; 42(8): 112955, 2023 08 29.
Artigo em Inglês | MEDLINE | ID: mdl-37586369

RESUMO

Biomolecular condensates are implicated in core cellular processes such as gene regulation and ribosome biogenesis. Although the architecture of biomolecular condensates is thought to rely on collective interactions between many components, it is unclear how the collective interactions required for their formation emerge during evolution. Here, we show that the structure and evolution of a recently emerged biomolecular condensate, the nucleolar fibrillar center (FC), is explained by a single self-assembling scaffold, TCOF1. TCOF1 is necessary to form the FC, and it structurally defines the FC through self-assembly mediated by homotypic interactions of serine/glutamate-rich low-complexity regions (LCRs). Finally, introduction of TCOF1 into a species lacking the FC is sufficient to form an FC-like biomolecular condensate. By demonstrating that a recently emerged biomolecular condensate is built on a simple architecture determined by a single self-assembling protein, our work provides a compelling mechanism by which biomolecular condensates can emerge in the tree of life.


Assuntos
Condensados Biomoleculares , Nucléolo Celular , Ácido Glutâmico , Domínios Proteicos , Serina
8.
Biomolecules ; 13(7)2023 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-37509152

RESUMO

Tandem repeats in proteins are patterns of residues repeated directly adjacent to each other. The evolution of these repeats can be assessed by using groups of homologous sequences, which can help pointing to events of unit duplication or deletion. High pressure in a protein family for variation of a given type of repeat might point to their function. Here, we propose the analysis of protein families to calculate protein short tandem repeats (pSTRs) in each protein sequence and assess their variability within the family in terms of number of units. To facilitate this analysis, we developed the pSTR tool, a method to analyze the evolution of protein short tandem repeats in a given protein family by pairwise comparisons between evolutionarily related protein sequences. We evaluated pSTR unit number variation in protein families of 12 complete metazoan proteomes. We hypothesize that families with more dynamic ensembles of repeats could reflect particular roles of these repeats in processes that require more adaptability.


Assuntos
Repetições de Microssatélites , Proteoma , Animais , Sequência de Aminoácidos , Evolução Molecular
9.
Proc Natl Acad Sci U S A ; 120(16): e2300154120, 2023 04 18.
Artigo em Inglês | MEDLINE | ID: mdl-37036997

RESUMO

The evolution of genomes in all life forms involves two distinct, dynamic types of genomic changes: gene duplication (and loss) that shape families of paralogous genes and extension (and contraction) of low-complexity regions (LCR), which occurs through dynamics of short repeats in protein-coding genes. Although the roles of each of these types of events in genome evolution have been studied, their co-evolutionary dynamics is not thoroughly understood. Here, by analyzing a wide range of genomes from diverse bacteria and archaea, we show that LCR and paralogy represent two distinct routes of evolution that are inversely correlated. The emergence of LCR is a prominent evolutionary mechanism in fast evolving, young protein families, whereas paralogy dominates the comparatively slow evolution of old protein families. The analysis of multiple prokaryotic genomes shows that the formation of LCR is likely a widespread, transient evolutionary mechanism that temporally and locally affects also ancestral functions, but apparently, fades away with time, under mutational and selective pressures, yielding to gene paralogy. We propose that compensatory relationships between short-term and longer-term evolutionary mechanisms are universal in the evolution of life.


Assuntos
Evolução Molecular , Células Procarióticas , Filogenia , Bactérias/genética , Archaea/genética
10.
Comput Struct Biotechnol J ; 20: 5516-5523, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36249567

RESUMO

Low complexity regions (LCRs) differ in amino acid composition from the background provided by the corresponding proteomes. The simplest LCRs are homorepeats (or polyX), regions composed of mostly-one amino acid type. Extensive research has been done to characterize homorepeats, and their taxonomic, functional and structural features depend on the amino acid type and sequence context. From them, the next step towards the study of LCRs are the regions composed of two types of amino acids, which we call polyXY. We classify polyXY in three categories based on the arrangement of the two amino acid types 'X' and 'Y': direpeats (e.g. 'XYXYXY'), joined (e.g. 'XXXYYY') and shuffled (e.g. 'XYYXXY'). We developed a script to search for polyXY, and located them in a comprehensive set of 20,340 reference proteomes. These results are available in a dedicated web server called XYs, in which the user can also submit their own protein datasets to detect polyXY. We studied the distribution of polyXY types by amino acid pair XY and category, and show that polyXY in Eukaryota are mainly located within intrinsically disordered regions. Our study provides a first step towards the characterization of polyXY as protein motifs.

11.
Biomolecules ; 12(10)2022 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-36291695

RESUMO

Intrinsically disordered regions (IDRs) in protein sequences are flexible, have low structural constraints and as a result have faster rates of evolution. This lack of evolutionary conservation greatly limits the use of sequence homology for the classification and functional assessment of IDRs, as opposed to globular domains. The study of IDRs requires other properties for their classification and functional prediction. While composition bias is not a necessary property of IDRs, compositionally biased regions (CBRs) have been noted as frequent part of IDRs. We hypothesized that to characterize IDRs, it could be helpful to study their overlap with particular types of CBRs. Here, we evaluate this overlap in the human proteome. A total of 2/3 of residues in IDRs overlap CBRs. Considering CBRs enriched in one type of amino acid, we can distinguish CBRs that tend to be fully included within long IDRs (R, H, N, D, P, G), from those that partially overlap shorter IDRs (S, E, K, T), and others that tend to overlap IDR terminals (Q, A). CBRs overlap more often IDRs in nuclear proteins and in proteins involved in liquid-liquid phase separation (LLPS). Study of protein interaction networks reveals the enrichment of CBRs in IDRs by tandem repetition of short linear motifs (rich in S or P), and the existence of E-rich polar regions that could support specific protein interactions with non-specific interactions. Our results open ways to pin down the function of IDRs from their partial compositional biases.


Assuntos
Proteínas Intrinsicamente Desordenadas , Humanos , Proteínas Intrinsicamente Desordenadas/química , Proteoma , Viés , Aminoácidos , Proteínas Nucleares/metabolismo , Conformação Proteica
12.
Biomolecules ; 12(8)2022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-36008992

RESUMO

There is increasing evidence that many intrinsically disordered regions (IDRs) in proteins play key functional roles through interactions with other proteins or nucleic acids. These interactions often exhibit a context-dependent structural behavior. We hypothesize that low complexity regions (LCRs), often found within IDRs, could have a role in inducing local structure in IDRs. To test this, we predicted IDRs in the human proteome and analyzed their structures or those of homologous sequences in the Protein Data Bank (PDB). We then identified two types of simple LCRs within IDRs: regions with only one (polyX or homorepeats) or with only two types of amino acids (polyXY). We were able to assign structural information from the PDB more often to these LCRs than to the surrounding IDRs (polyX 61.8% > polyXY 50.5% > IDRs 39.7%). The most frequently observed polyX and polyXY within IDRs contained E (Glu) or G (Gly). Structural analyses of these sequences and of homologs indicate that polyEK regions induce helical conformations, while the other most frequent LCRs induce coil structures. Our work proposes bioinformatics methods to help in the study of the structural behavior of IDRs and provides a solid basis suggesting a structuring role of LCRs within them.


Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas , Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas , Humanos , Proteínas Intrinsicamente Desordenadas/química , Conformação Proteica , Domínios Proteicos , Proteínas/química
13.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35914952

RESUMO

Low complexity regions are fragments of protein sequences composed of only a few types of amino acids. These regions frequently occur in proteins and can play an important role in their functions. However, scientists are mainly focused on regions characterized by high diversity of amino acid composition. Similarity between regions of protein sequences frequently reflect functional similarity between them. In this article, we discuss strengths and weaknesses of the similarity analysis of low complexity regions using BLAST, HHblits and CD-HIT. These methods are considered to be the gold standard in protein similarity analysis and were designed for comparison of high complexity regions. However, we lack specialized methods that could be used to compare the similarity of low complexity regions. Therefore, we investigated the existing methods in order to understand how they can be applied to compare such regions. Our results are supported by exploratory study, discussion of amino acid composition and biological roles of selected examples. We show that existing methods need improvements to efficiently search for similar low complexity regions. We suggest features that have to be re-designed specifically for comparing low complexity regions: scoring matrix, multiple sequence alignment, e-value, local alignment and clustering based on a set of representative sequences. Results of this analysis can either be used to improve existing methods or to create new methods for the similarity analysis of low complexity regions.


Assuntos
Aminoácidos , Proteínas , Algoritmos , Sequência de Aminoácidos , Aminoácidos/genética , Análise por Conglomerados , Proteínas/química , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína/métodos
14.
Biochim Biophys Acta Mol Cell Res ; 1869(11): 119327, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-35901970

RESUMO

Clathrin, made up of the heavy- and light-chains, constitutes one of the most abundant proteins involved in intracellular protein trafficking and endocytosis. YPR129W, which encodes RGG-motif containing translation repressor was identified as a part of the multi-gene construct (SCD6) that suppressed clathrin deficiency. However, the contribution of YPR129W alone in suppressing clathrin deficiency has not been documented. This study identifies YPR129W as a necessary and sufficient gene in a multi-gene construct SCD6 that suppresses clathrin deficiency. Importantly, we also identify cytoplasmic RGG-motif protein encoding gene PSP2 as another novel suppressor of clathrin deficiency. Detailed domain analysis of the two suppressors reveals that the RGG-motif of both Scd6 and Psp2 is important for suppressing clathrin deficiency. Interestingly, the endocytosis function of clathrin heavy chain assayed by internalization of GFP-Snc1 and α-factor secretion activity are not complemented by either Scd6 or Psp2. We further observe that inhibition of TORC1 compromises the suppression activity of both SCD6 and PSP2 to different extent, suggesting that two suppressors are differentially regulated. Scd6 granules increased based on its RGG-motif upon Chc1 depletion. Strikingly, Psp2 overexpression increased the abundance of ubiquitin-conjugated proteins in Chc1 depleted cells in its RGG-motif dependent manner and also decreased the accumulation of GFP-Atg8 foci. Overall based on our results using SCD6 and PSP2, we identify a novel role of RGG-motif containing proteins in suppressing clathrin deficiency. Since both the suppressors are RNA-binding proteins, this study opens an exciting avenue for exploring the connection between clathrin function and post-transcriptional gene control processes.


Assuntos
Cadeias Pesadas de Clatrina , Clatrina , Clatrina/genética , Cadeias Pesadas de Clatrina/genética , Regulação da Expressão Gênica , Proteínas de Ligação a RNA/genética
15.
Genes (Basel) ; 13(5)2022 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-35627143

RESUMO

Homorepeat sequences, consecutive runs of identical amino acids, are prevalent in eukaryotic proteins. It has become necessary to annotate and evaluate this feature in entire proteomes. The definition of what constitutes a homorepeat is not fixed, and different research approaches may require different definitions; therefore, flexible approaches to analyze homorepeats in complete proteomes are needed. Here, we present polyX2, a fast, simple but tunable script to scan protein datasets for all possible homorepeats. The user can modify the length of the window to scan, the minimum number of identical residues that must be found in the window, and the types of homorepeats to be found.


Assuntos
Eucariotos , Proteoma , Aminoácidos , Células Eucarióticas , Proteoma/química , Proteoma/genética , Sequências Repetitivas de Aminoácidos
16.
FEBS J ; 289(1): 17-39, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-33583140

RESUMO

Eukaryotic cells are intracellularly divided into numerous compartments or organelles, which coordinate specific molecules and biological reactions. Membrane-bound organelles are physically separated by lipid bilayers from the surrounding environment. Biomolecular condensates, also referred to membraneless organelles, are micron-scale cellular compartments that lack membranous enclosures but function to concentrate proteins and RNA molecules, and these are involved in diverse processes. Liquid-liquid phase separation (LLPS) driven by multivalent weak macromolecular interactions is a critical principle for the formation of biomolecular condensates, and a multitude of combinations among multivalent interactions may drive liquid-liquid phase transition (LLPT). Dysregulation of LLPS and LLPT leads to aberrant condensate and amyloid formation, which causes many human diseases, including neurodegeneration and cancer. Here, we describe recent findings regarding abnormal forms of biomolecular condensates and aggregation via aberrant LLPS and LLPT of cancer-related proteins in cancer development driven by mutation and fusion of genes. Moreover, we discuss the regulatory mechanisms by which aberrant LLPS and LLPT occur in cancer and the drug candidates targeting these mechanisms. Further understanding of the molecular events regulating how biomolecular condensates and aggregation form in cancer tissue is critical for the development of therapeutic strategies against tumorigenesis.


Assuntos
Citoplasma/genética , Neoplasias/genética , Organelas/genética , Transição de Fase , Citoplasma/metabolismo , Células Eucarióticas/metabolismo , Humanos , Bicamadas Lipídicas/metabolismo , Mutação/genética , Neoplasias/patologia , Organelas/metabolismo
17.
Comput Struct Biotechnol J ; 19: 3964-3977, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34377363

RESUMO

In recent years, attention has been devoted to proteins forming immiscible liquid phases within the liquid intracellular medium, commonly referred to as membraneless organelles (MLO). These organelles enable the spatiotemporal associations of cellular components that exchange dynamically with the cellular milieu. The dysregulation of these liquid-liquid phase separation processes (LLPS) may cause various diseases including neurodegenerative pathologies and cancer, among others. Until very recently, databases containing information on proteins forming MLOs, as well as tools and resources facilitating their analysis, were missing. This has recently changed with the publication of 4 databases that focus on different types of experiments, sets of proteins, inclusion criteria, and levels of annotation or curation. In this study we integrate and analyze the information across these databases, complement their records, and produce a consolidated set of proteins that enables the investigation of the LLPS phenomenon. To gain insight into the features that characterize different types of MLOs and the roles of their associated proteins, they were grouped into categories: High Confidence MLO associated (including Drivers and reviewed proteins), Potential Clients and Regulators, according to their annotated functions. We show that none of the databases taken alone covers the data sufficiently to enable meaningful analysis, validating our integration effort as essential for gaining better understanding of phase separation and laying the foundations for the discovery of new proteins potentially involved in this important cellular process. Lastly, we developed a server, enabling customized selections of different sets of proteins based on MLO location, database, disorder content, among other attributes (https://forti.shinyapps.io/mlos/).

18.
Genetica ; 149(4): 217-237, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34254217

RESUMO

The biological meaning of low complexity regions in the proteins of Plasmodium species is a topic of discussion in evolutionary biology. There is a debate between selectionists and neutralists, who either attribute or do not attribute an effect of low-complexity regions on the fitness of these parasites, respectively. In this work, we comparatively study 22 Plasmodium species to understand whether their low complexity regions undergo a neutral or, rather, a selective and species-dependent evolution. The focus is on the connection between the codon repertoire of the genetic coding sequences and the occurrence of low complexity regions in the corresponding proteins. The first part of the work concerns the correlation between the length of plasmodial proteins and their propensity at embedding low complexity regions. Relative synonymous codon usage, entropy, and other indicators reveal that the incidence of low complexity regions and their codon bias is species-specific and subject to selective evolutionary pressure. We also observed that protein length, a relaxed selective pressure, and a broad repertoire of codons in proteins, are strongly correlated with the occurrence of low complexity regions. Overall, it seems plausible that the codon bias of low-complexity regions contributes to functional innovation and codon bias enhancement of proteins on which Plasmodium species rest as successful evolutionary parasites.


Assuntos
Uso do Códon , Evolução Molecular , Plasmodium/genética , Proteínas de Protozoários/química , Proteínas de Protozoários/genética , Sequências Repetitivas de Aminoácidos , Seleção Genética
19.
J Biol Chem ; 297(2): 100945, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34246632

RESUMO

RNA-binding proteins play crucial roles in various cellular functions and contain abundant disordered protein regions. The disordered regions in RNA-binding proteins are rich in repetitive sequences, such as poly-K/R, poly-N/Q, poly-A, and poly-G residues. Our bioinformatic analysis identified a largely neglected repetitive sequence family we define as electronegative clusters (ENCs) that contain acidic residues and/or phosphorylation sites. The abundance and length of ENCs exceed other known repetitive sequences. Despite their abundance, the functions of ENCs in RNA-binding proteins are still elusive. To investigate the impacts of ENCs on protein stability, RNA-binding affinity, and specificity, we selected one RNA-binding protein, the ribosomal biogenesis factor 15 (Nop15), as a model. We found that the Nop15 ENC increases protein stability and inhibits nonspecific RNA binding, but minimally interferes with specific RNA binding. To investigate the effect of ENCs on sequence specificity of RNA binding, we grafted an ENC to another RNA-binding protein, Ser/Arg-rich splicing factor 3. Using RNA Bind-n-Seq, we found that the engineered ENC inhibits disparate RNA motifs differently, instead of weakening all RNA motifs to the same extent. The motif site directly involved in electrostatic interaction is more susceptible to the ENC inhibition. These results suggest that one of functions of ENCs is to regulate RNA binding via electrostatic interaction. This is consistent with our finding that ENCs are also overrepresented in DNA-binding proteins, whereas underrepresented in halophiles, in which nonspecific nucleic acid binding is inhibited by high concentrations of salts.


Assuntos
Proteínas Intrinsicamente Desordenadas , Proteínas de Ligação a RNA , Sequência de Aminoácidos , Biologia Computacional , Ligação Proteica
20.
Genes (Basel) ; 12(3)2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-33809982

RESUMO

Low complexity regions (LCRs) in proteins are characterized by amino acid frequencies that differ from the average. These regions evolve faster and tend to be less conserved between homologs than globular domains. They are not common in bacteria, as compared to their prevalence in eukaryotes. Studying their conservation could help provide hypotheses about their function. To obtain the appropriate evolutionary focus for this rapidly evolving feature, here we study the conservation of LCRs in bacterial strains and compare their high variability to the closeness of the strains. For this, we selected 20 taxonomically diverse bacterial species and obtained the completely sequenced proteomes of two strains per species. We calculated all orthologous pairs for each of the 20 strain pairs. Per orthologous pair, we computed the conservation of two types of LCRs: compositionally biased regions (CBRs) and homorepeats (polyX). Our results show that, in bacteria, Q-rich CBRs are the most conserved, while A-rich CBRs and polyA are the most variable. LCRs have generally higher conservation when comparing pathogenic strains. However, this result depends on protein subcellular location: LCRs accumulate in extracellular and outer membrane proteins, with conservation increased in the extracellular proteins of pathogens, and decreased for polyX in the outer membrane proteins of pathogens. We conclude that these dependencies support the functional importance of LCRs in host-pathogen interactions.


Assuntos
Bactérias/patogenicidade , Proteínas de Bactérias/química , Análise de Sequência de Proteína/métodos , Bactérias/classificação , Bactérias/metabolismo , Proteínas de Bactérias/metabolismo , Biologia Computacional , Evolução Molecular , Proteômica , Virulência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA