Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
1.
Nature ; 529(7585): 239-42, 2016 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-26762462

RESUMEN

Nonribosomal peptide synthetases (NRPSs) are very large proteins that produce small peptide molecules with wide-ranging biological activities, including environmentally friendly chemicals and many widely used therapeutics. NRPSs are macromolecular machines, with modular assembly-line logic, a complex catalytic cycle, moving parts and many active sites. In addition to the core domains required to link the substrates, they often include specialized tailoring domains, which introduce chemical modifications and allow the product to access a large expanse of chemical space. It is still unknown how the NRPS tailoring domains are structurally accommodated into megaenzymes or how they have adapted to function in nonribosomal peptide synthesis. Here we present a series of crystal structures of the initiation module of an antibiotic-producing NRPS, linear gramicidin synthetase. This module includes the specialized tailoring formylation domain, and states are captured that represent every major step of the assembly-line synthesis in the initiation module. The transitions between conformations are large in scale, with both the peptidyl carrier protein domain and the adenylation subdomain undergoing huge movements to transport substrate between distal active sites. The structures highlight the great versatility of NRPSs, as small domains repurpose and recycle their limited interfaces to interact with their various binding partners. Understanding tailoring domains is important if NRPSs are to be utilized in the production of novel therapeutics.


Asunto(s)
Biocatálisis , Brevibacillus/enzimología , Gramicidina/biosíntesis , Péptido Sintasas/química , Péptido Sintasas/metabolismo , Isomerasas de Aminoácido/química , Isomerasas de Aminoácido/metabolismo , Antibacterianos/biosíntesis , Sitios de Unión , Metabolismo de los Hidratos de Carbono , Proteínas Portadoras/química , Proteínas Portadoras/metabolismo , Dominio Catalítico , Coenzimas/metabolismo , Cristalografía por Rayos X , Transferasas de Hidroximetilo y Formilo/química , Transferasas de Hidroximetilo y Formilo/metabolismo , Modelos Moleculares , Complejos Multienzimáticos/química , Complejos Multienzimáticos/metabolismo , Panteteína/análogos & derivados , Panteteína/metabolismo , Unión Proteica , Estructura Terciaria de Proteína , ARN de Transferencia/química , ARN de Transferencia/metabolismo
2.
PLoS Comput Biol ; 14(8): e1006349, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-30096183

RESUMEN

Intrinsically disordered regions (IDRs) of proteins play significant biological functional roles despite lacking a well-defined 3D structure. For example, IDRs provide efficient housing for large numbers of post-translational modification (PTM) sites in eukaryotic proteins. Here, we study the distribution of more than 15,000 experimentally determined human methylation, acetylation and ubiquitination sites (collectively termed 'MAU' sites) in ordered and disordered regions, and analyse their conservation across 380 eukaryotic species. Conservation signals for the maintenance and novel emergence of MAU sites are examined at 11 evolutionary levels from the whole eukaryotic domain down to the ape superfamily, in both ordered and disordered regions. We discover that MAU PTM is a major driver of conservation for arginines and lysines in both ordered and disordered regions, across the 11 levels, most significantly across the mammalian clade. Conservation of human methylatable arginines is very strongly favoured for ordered regions rather than for disordered, whereas methylatable lysines are conserved in either set of regions, and conservation of acetylatable and ubiquitinatable lysines is favoured in disordered over ordered. Notably, we find evidence for the emergence of new lysine MAU sites in disordered regions of proteins in deuterostomes and mammals, and in ordered regions after the dawn of eutherians. For histones specifically, MAU sites demonstrate an idiosyncratic significant conservation pattern that is evident since the last common ancestor of mammals. Similarly, folding-on-binding (FB) regions are highly enriched for MAU sites relative to either ordered or disordered regions, with ubiquitination sites in FBs being highly conserved at all evolutionary levels back as far as mammals. This investigation clearly demonstrates the complex patterns of PTM evolution across the human proteome and that it is necessary to consider conservation of sequence features at multiple evolutionary levels in order not to get an incomplete or misleading picture.


Asunto(s)
Proteínas Intrínsecamente Desordenadas/química , Proteínas Intrínsecamente Desordenadas/fisiología , Procesamiento Proteico-Postraduccional/fisiología , Acetilación , Secuencia de Aminoácidos , Animales , Evolución Biológica , Biología Computacional , Eucariontes , Evolución Molecular , Humanos , Metilación , Procesamiento Proteico-Postraduccional/genética , Proteoma/metabolismo , Ubiquitinación
3.
Proteomics ; 18(21-22): e1800069, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30260558

RESUMEN

Compositionally biased regions (BRs) occur when a few amino-acid types are enriched in a protein segment. There are possibly BR types in the known protein universe that have not been characterized experimentally. The UniProt protein database has been surveyed for evidence of such compositionally ''dark matter''. A ''dark biased region'' (DBR) is defined as a biased region with low probability of being an individual structural domain or intrinsically disordered region. The bias annotation program fLPS is used to generate a list of >13 million BRs, which is then thoroughly filtered for structure and intrinsic disorder. About a third of BRs (31%) has both substantial intrinsic disorder and structure. After filtering, there are ≈0.9 million DBRs (≈7% of the original BRs in ≈1.4% of proteins). These DBRs are hugely enriched in eukaryotes and hugely depleted in bacteria. They tend to be more hydrophobic than other protein regions, but are made of less extreme combinations of hydrophobic/hydrophilic residues. Given varying assumptions, It has been estimated that how many DBRs there might be for the high bias levels examined (with p-values < 1 × 10-06 ), deriving a reasonable range of 0.7-7.2% of proteins having such DBRs. Hypotheses are examined about what such DBRs might be, that is, that they are from un- or undersampled domain/region categories or are unappreciated categories somewhat like existing ones.


Asunto(s)
Proteínas/química , Algoritmos , Bases de Datos de Proteínas , Priones/química , Priones/metabolismo , Análisis de Secuencia de Proteína
4.
BMC Bioinformatics ; 18(1): 476, 2017 Nov 13.
Artículo en Inglés | MEDLINE | ID: mdl-29132292

RESUMEN

BACKGROUND: Proteins often contain regions that are compositionally biased (CB), i.e., they are made from a small subset of amino-acid residue types. These CB regions can be functionally important, e.g., the prion-forming and prion-like regions that are rich in asparagine and glutamine residues. RESULTS: Here I report a new program fLPS that can rapidly annotate CB regions. It discovers both single-residue and multiple-residue biases. It works through a process of probability minimization. First, contigs are constructed for each amino-acid type out of sequence windows with a low degree of bias; second, these contigs are searched exhaustively for low-probability subsequences (LPSs); third, such LPSs are iteratively assessed for merger into possible multiple-residue biases. At each of these stages, efficiency measures are taken to avoid or delay probability calculations unless/until they are necessary. On a current desktop workstation, the fLPS algorithm can annotate the biased regions of the yeast proteome (>5700 sequences) in <1 s, and of the whole current TrEMBL database (>65 million sequences) in as little as ~1 h, which is >2 times faster than the commonly used program SEG, using default parameters. fLPS discovers both shorter CB regions (of the sort that are often termed 'low-complexity sequence'), and milder biases that may only be detectable over long tracts of sequence. CONCLUSIONS: fLPS can readily handle very large protein data sets, such as might come from metagenomics projects. It is useful in searching for proteins with similar CB regions, and for making functional inferences about CB regions for a protein of interest. The fLPS package is available from: http://biology.mcgill.ca/faculty/harrison/flps.html , or https://github.com/pmharrison/flps , or is a supplement to this article.


Asunto(s)
Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Sesgo , Proteoma , Saccharomyces cerevisiae/metabolismo
5.
J Physiol ; 594(10): 2751-72, 2016 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-26915902

RESUMEN

KEY POINTS: The presynaptic protein α-synuclein forms aggregates during Parkinson's disease. Accumulating evidence suggests that the small soluble oligomers of α-synuclein are more toxic than the larger aggregates appearing later in the disease. The link between oligomer toxicity and structure still remains unclear. In the present study, we have produced two structurally-defined oligomers that have a similar morphology but differ in secondary structure. These oligomers were introduced into neocortical pyramidal cells during whole-cell recording and, using a combination of experimentation and modelling, electrophysiological parameters were extracted. Both oligomeric species had similar effects on neuronal properties reducing input resistance, time constant and increasing capacitance. The net effect was a marked reduction in neuronal excitability that could impact on network activity. ABSTRACT: The presynaptic protein α-synuclein (αSyn) aggregates during Parkinson's disease (PD) to form large proteinaceous amyloid plaques, the spread of which throughout the brain clinically defines the severity of the disease. During early stages of aggregation, αSyn forms soluble annular oligomers that show greater toxicity than much larger fibrils. These oligomers produce toxicity via a number of possible mechanisms, including the production of pore-forming complexes that permeabilize membranes. In the present study, two well-defined species of soluble αSyn oligomers were produced by different protocols: by polymerization of monomer and by sonication of fibrils. The two oligomeric species produced were morphologically similar, with both having an annular structure and consisting of approximately the same number of monomer subunits, although they differed in their secondary structure. Oligomeric and monomeric αSyn were injected directly into the soma of pyramidal neurons in mouse neocortical brain slices during whole-cell patch clamp recording. Using a combined experimental and modelling approach, neuronal parameters were extracted to measure, for the first time in the neocortex, specific changes in neuronal electrophysiology. Both species of oligomer had similar effects: (i) a significant reduction in input resistance and the membrane time constant and (ii) an increase in the current required to trigger an action potential with a resultant reduction in the firing rate. Differences in oligomer secondary structure appeared to produce only subtle differences in the activity of the oligomers. Monomeric αSyn had no effect on neuronal parameters, even at high concentrations. The oligomer-induced fall in neuronal excitability has the potential to impact both network activity and cognitive processing.


Asunto(s)
Potenciales de Acción/fisiología , Líquido Intracelular/metabolismo , Células Piramidales/fisiología , alfa-Sinucleína/metabolismo , Potenciales de Acción/efectos de los fármacos , Animales , Humanos , Líquido Intracelular/efectos de los fármacos , Masculino , Ratones , Ratones Endogámicos C57BL , Técnicas de Cultivo de Órganos , Células Piramidales/efectos de los fármacos , alfa-Sinucleína/farmacología
6.
BMC Evol Biol ; 16: 24, 2016 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-26809710

RESUMEN

BACKGROUND: Prions are transmissible, propagating alternative states of proteins, and are usually made from the fibrillar, beta-sheet-rich assemblies termed amyloid. Prions in the budding yeast Saccharomyces cerevisiae propagate heritable phenotypes, uncover hidden genetic variation, function in large-scale gene regulation, and can act like diseases. Almost all these amyloid prions have asparagine/glutamine-rich (N/Q-rich) domains. Other proteins, that we term here 'prionogenic amyloid formers' (PAFs), have been shown to form amyloid in vivo, and to have N/Q-rich domains that can propagate heritable states in yeast cells. Also, there are >200 other S.cerevisiae proteins with prion-like N/Q-rich sequence composition. Furthermore, human proteins with such N/Q-rich composition have been linked to the pathomechanisms of neurodegenerative amyloid diseases. RESULTS: Here, we exploit the increasing abundance of complete fungal genomes to examine the ancestry of prions/PAFs and other N/Q-rich proteins across the fungal kingdom. We find distinct evolutionary behavior for Q-rich and N-rich prions/PAFs; those of ancient ancestry (outside the budding yeasts, Saccharomycetes) are Q-rich, whereas N-rich cases arose early in Saccharomycetes evolution. This emergence of N-rich prion/PAFs is linked to a large-scale emergence of N-rich proteins during Saccharomycetes evolution, with Saccharomycetes showing a distinctive trend for population sizes of prion-like proteins that sets them apart from all the other fungi. Conversely, some clades, e.g. Eurotiales, have much fewer N/Q-rich proteins, and in some cases likely lose them en masse, perhaps due to greater amyloid intolerance, although they contain relatively more non-N/Q-rich predicted prions. We find that recent mutational tendencies arising during Saccharomycetes evolution (i.e., increased numbers of N residues and a tendency to form more poly-N tracts), contributed to the expansion/development of the prion phenomenon. Variation in these mutational tendencies in Saccharomycetes is correlated with the population sizes of prion-like proteins, thus implying that selection pressures on N/Q-rich protein sequences against amyloidogenesis are not generally maintained in budding yeasts. CONCLUSIONS: These results help to delineate further the limits and origins of N/Q-rich prions, and provide insight as a case study of the evolution of compositionally-defined protein domains.


Asunto(s)
Ascomicetos/genética , Evolución Molecular , Proteínas Fúngicas/genética , Priones/genética , Levaduras/genética , Secuencias de Aminoácidos , Amiloide/química , Amiloide/genética , Ascomicetos/clasificación , Proteínas Fúngicas/química , Genoma Fúngico , Priones/química , Estructura Terciaria de Proteína
7.
PLoS Comput Biol ; 11(8): e1004165, 2015 08.
Artículo en Inglés | MEDLINE | ID: mdl-26291316

RESUMEN

Models of neocortical networks are increasingly including the diversity of excitatory and inhibitory neuronal classes. Significant variability in cellular properties are also seen within a nominal neuronal class and this heterogeneity can be expected to influence the population response and information processing in networks. Recent studies have examined the population and network effects of variability in a particular neuronal parameter with some plausibly chosen distribution. However, the empirical variability and covariance seen across multiple parameters are rarely included, partly due to the lack of data on parameter correlations in forms convenient for model construction. To addess this we quantify the heterogeneity within and between the neocortical pyramidal-cell classes in layers 2/3, 4, and the slender-tufted and thick-tufted pyramidal cells of layer 5 using a combination of intracellular recordings, single-neuron modelling and statistical analyses. From the response to both square-pulse and naturalistic fluctuating stimuli, we examined the class-dependent variance and covariance of electrophysiological parameters and identify the role of the h current in generating parameter correlations. A byproduct of the dynamic I-V method we employed is the straightforward extraction of reduced neuron models from experiment. Empirically these models took the refractory exponential integrate-and-fire form and provide an accurate fit to the perisomatic voltage responses of the diverse pyramidal-cell populations when the class-dependent statistics of the model parameters were respected. By quantifying the parameter statistics we obtained an algorithm which generates populations of model neurons, for each of the four pyramidal-cell classes, that adhere to experimentally observed marginal distributions and parameter correlations. As well as providing this tool, which we hope will be of use for exploring the effects of heterogeneity in neocortical networks, we also provide the code for the dynamic I-V method and make the full electrophysiological data set available.


Asunto(s)
Biología Computacional/métodos , Modelos Neurológicos , Neocórtex/citología , Células Piramidales/fisiología , Algoritmos , Animales , Masculino , Ratas , Ratas Wistar
8.
BMC Genomics ; 16: 444, 2015 Jun 09.
Artículo en Inglés | MEDLINE | ID: mdl-26054753

RESUMEN

BACKGROUND: Natural antisense transcripts (NATs) are regulatory RNAs that contain sequence complementary to other RNAs, these other RNAs usually being messenger RNAs. In eukaryotic genomes, cis-NATs overlap the gene they complement. RESULTS: Here, our goal is to analyze the distribution and evolutionary conservation of cis-NATs for a variety of available data sets for Arabidopsis thaliana, to gain insights into cis-NAT functional mechanisms and their significance. Cis-NATs derived from traditional sequencing are largely validated by other data sets, although different cis-NAT data sets have different prevalent cis-NAT topologies with respect to overlapping protein-coding genes. A. thaliana cis-NATs have substantial conservation (28-35% in the three substantive data sets analyzed) of expression in A. lyrata. We examined evolutionary sequence conservation at cis-NAT loci in Arabidopsis thaliana across nine sequenced Brassicaceae species (picked for optimal discernment of purifying selection), focussing on the parts of their sequences not overlapping protein-coding transcripts (dubbed 'NOLPs'). We found significant NOLP sequence conservation for 28-34% NATs across different cis-NAT sets. This NAT NOLP sequence conservation versus A. lyrata is generally significantly correlated with conservation of expression. We discover a significant enrichment of transcription factor binding sites (as evidenced by CHIP-seq data) in NOLPs compared to randomly sampled near-gene NOLP-like DNA , that is linked to significant sequence conservation. Conversely, there is no such evidence for a general significant link between NOLPs and formation of small interfering RNAs (siRNAs), with the substantial majority of unique siRNAs arising from the overlapping portions of the cis-NATs. CONCLUSIONS: In aggregate, our results suggest that many cis-NAT NOLPs function in the regulation of conserved promoter/regulatory elements that they 'over-hang'.


Asunto(s)
Arabidopsis/genética , ARN sin Sentido/análisis , ARN de Planta/análisis , ARN Interferente Pequeño/análisis , Sitios de Unión , Brassica/clasificación , Brassica/genética , Secuencia Conservada , Evolución Molecular , Regulación de la Expresión Génica de las Plantas , ARN Interferente Pequeño/química , Análisis de Secuencia de ARN/métodos
9.
Proteomics ; 19(15): e1970134, 2019 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-31368634
10.
Sci Rep ; 14(1): 680, 2024 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-38182699

RESUMEN

Proteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed 'low-complexity regions' (LCRs) or 'compositionally-biased regions' (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to annotate these regions are often arbitrarily chosen. Also, investigators are sometimes interested in longer regions, or sometimes very short ones. Here, two programs for annotating LCRs/CBRs, namely SEG and fLPS, are investigated in detail across the whole expanse of their parameter spaces. In doing so, boundary behaviours are resolved that are used to derive an optimized systematic strategy for annotating LCRs/CBRs. Sets of parameters that progressively annotate or 'cover' more of protein sequence space and are optimized for a given target length have been derived. This progressive annotation can be applied to discern the biological relevance of CBRs, e.g., in parsing domains for experimental constructs and in generating hypotheses. It is also useful for picking out candidate regions of interest of a given target length and bias signature, and for assessing the parameter dependence of annotations. This latter application is demonstrated for a set of human intrinsically-disordered proteins associated with cancer.


Asunto(s)
Antifibrinolíticos , Proteínas Intrínsecamente Desordenadas , Humanos , Aminoácidos , Secuencia de Aminoácidos , Dominios Proteicos
11.
PLoS Comput Biol ; 8(8): e1002646, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22912570

RESUMEN

Prion Proteins (PrP) are among a small number of proteins for which large numbers of NMR ensembles have been resolved for sequence mutants and diverse species. Here, we perform a comprehensive principle components analysis (PCA) on the tertiary structures of PrP globular proteins to discern PrP subdomains that exhibit conformational change in response to point mutations and clade-specific evolutionary sequence mutation trends. This is to our knowledge the first such large-scale analysis of multiple NMR ensembles of protein structures, and the first study of its kind for PrPs. We conducted PCA on human (n = 11), mouse (n = 14), and wildtype (n = 21) sets of PrP globular structures, from which we identified five conformationally variable subdomains within PrP. PCA shows that different non-local patterns and rankings of variable subdomains arise for different pathogenic mutants. These subdomains may thus be key areas for initiating PrP conversion during disease. Furthermore, we have observed the conformational clustering of divergent TSE-non-susceptible species pairs; these non-phylogenetic clusterings indicate structural solutions towards TSE resistance that do not necessarily coincide with evolutionary divergence. We discuss the novelty of our approach and the importance of PrP subdomains in structural conversion during disease.


Asunto(s)
Mutación , Resonancia Magnética Nuclear Biomolecular/métodos , Priones/química , Secuencia de Aminoácidos , Animales , Humanos , Ratones , Datos de Secuencia Molecular , Filogenia , Análisis de Componente Principal , Priones/genética , Conformación Proteica
12.
PeerJ ; 10: e14417, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36415860

RESUMEN

Prions are proteinaceous particles that can propagate an alternative conformation to further copies of the same protein. They have been described in mammals, fungi, bacteria and archaea. Furthermore, across diverse organisms from bacteria to eukaryotes, prion-like proteins that have similar sequence characters are evident. Such prion-like proteins have been linked to pathomechanisms of amyotrophic lateral sclerosis (ALS) in humans, in particular TDP43, FUS, TAF15, EWSR1 and hnRNPA2. Because of the desire to study human disease-linked proteins in model organisms, and to gain insights into the functionally important parts of these proteins and how they have changed across hundreds of millions of years of evolution, we analyzed how the sequence traits of these five proteins have evolved across eukaryotes, including plants and metazoa. We discover that the RNA-binding domain architecture of these proteins is deeply conserved since their emergence. Prion-like regions are also deeply and widely conserved since the origination of the protein families for FUS, TAF15 and EWSR1, and since the last common ancestor of metazoa for TDP43 and hnRNPA2. Prion-like composition is uncommon or weak in any plant orthologs observed, however in TDP43 many plant proteins have equivalent regions rich in other amino acids (namely glycine and tyrosine and/or serine) that may be linked to stress granule recruitment. Deeply conserved low-complexity domains are identified that likely have functional significance.


Asunto(s)
Esclerosis Amiotrófica Lateral , Priones , Animales , Humanos , Esclerosis Amiotrófica Lateral/genética , Priones/genética , Proteínas de Unión al ARN/química , Mamíferos/metabolismo
13.
PLoS One ; 17(6): e0267744, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35653309

RESUMEN

Immunoglobulin superfamily, member 1 (IGSF1) is a transmembrane glycoprotein with high expression in the mammalian pituitary gland. Mutations in the IGSF1 gene cause congenital central hypothyroidism in humans. The IGSF1 protein is co-translationally cleaved into N- and C-terminal domains (NTD and CTD), the latter of which is trafficked to the plasma membrane and appears to be the functional portion of the molecule. Though the IGSF1-NTD is retained in the endoplasmic reticulum and has no apparent function, it has a high degree of sequence identity with the IGSF1-CTD and is conserved across mammalian species. Based upon phylogenetic analyses, we propose that the ancestral IGSF1 gene encoded the IGSF1-CTD, which was duplicated and integrated immediately upstream of itself, yielding a larger protein encompassing the IGSF1-NTD and IGSF1-CTD. The selective pressures favoring the initial gene duplication and subsequent retention of a conserved IGSF1-NTD are unresolved.


Asunto(s)
Euterios , Duplicación de Gen , Animales , Humanos , Inmunoglobulinas/genética , Inmunoglobulinas/metabolismo , Proteínas de la Membrana/genética , Proteínas de la Membrana/metabolismo , Filogenia
14.
Genomics ; 95(5): 268-77, 2010 May.
Artículo en Inglés | MEDLINE | ID: mdl-20206252

RESUMEN

Prion diseases are devastating neurological disorders caused by the propagation of particles containing an alternative beta-sheet-rich form of the prion protein (PrP). Genes paralogous to PrP, called Doppel and Shadoo, have been identified, that also have neuropathological relevance. To aid in the further functional characterization of PrP and its relatives, we annotated completely the PrP gene family (PrP-GF), in the genomes of 42 vertebrates, through combined strategic application of gene prediction programs and advanced remote homology detection techniques (such as HMMs, PSI-TBLASTN and pGenThreader). We have uncovered several previously undescribed paralogous genes and pseudogenes. We find that current high-quality genomic evidence indicates that the PrP relative Doppel, was likely present in the last common ancestor of present-day Tetrapoda, but was lost in the bird lineage, since its divergence from reptiles. Using the new gene annotations, we have defined the consensus of structural features that are characteristic of the PrP and Doppel structures, across diverse Tetrapoda clades. Furthermore, we describe in detail a transcribed pseudogene derived from Shadoo that is conserved across primates, and that overlaps the meiosis gene, SYCE1, thus possibly regulating its expression. In addition, we analysed the locus of PRNP/PRND for significant conservation across the genomic DNA of eleven mammals, and determined the phylogenetic penetration of non-coding exons. The genomic evidence indicates that the second PRNP non-coding exon found in even-toed ungulates and rodents, is conserved in all high-coverage genome assemblies of primates (human, chimp, orang utan and macaque), and is, at least, likely to have fallen out of use during primate speciation. Furthermore, we have demonstrated that the PRNT gene (at the PRNP human locus) is conserved across at least sixteen mammals, and evolves like a long non-coding RNA, fashioned from fragments of ancient, long, interspersed elements. These annotations and evolutionary analyses will be of further use for functional characterisation of the PrP-GF, and will be updatable in a semi-automated fashion as more genomes accumulate.


Asunto(s)
Evolución Molecular , Sitios Genéticos/genética , Genoma Humano/genética , Priones/genética , Programas Informáticos , Animales , Proteínas Ligadas a GPI , Humanos , Proteínas Priónicas , Análisis de Secuencia de ADN/métodos
15.
PeerJ ; 9: e12363, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34760378

RESUMEN

Compositionally-biased (CB) regions in biological sequences are enriched for a subset of sequence residue types. These can be shorter regions with a concentrated bias (i.e., those termed 'low-complexity'), or longer regions that have a compositional skew. These regions comprise a prominent class of the uncharacterized 'dark matter' of the protein universe. Here, I report the latest version of the fLPS package for the annotation of CB regions, which includes added consideration of DNA sequences, to label the eight possible biased regions of DNA. In this version, the user is now able to restrict analysis to a specified subset of residue types, and also to filter for previously annotated domains to enable detection of discontinuous CB regions. A 'thorough' option has been added which enables the labelling of subtler biases, typically made from a skew for several residue types. In the output, protein CB regions are now labelled with bias classes reflecting the physico-chemical character of the biasing residues. The fLPS 2.0 package is available from: https://github.com/pmharrison/flps2 or in a Supplemental File of this paper.

16.
Methods Mol Biol ; 2324: 35-48, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34165707

RESUMEN

The number of complete genome sequences explodes more and more with each passing year. Thus, methods for genome annotation need to be honed constantly to handle the deluge of information. Annotation of pseudogenes (i.e., gene copies that appear not to make a functional protein) in genomes is a persistent problem; here, we overview pseudogene annotation methods that are based on the detection of sequence homology in genomic DNA.


Asunto(s)
Biología Computacional/métodos , Anotación de Secuencia Molecular/métodos , Seudogenes/genética , Análisis de Secuencia de ADN/métodos , Animales , Genómica , Humanos , Alineación de Secuencia , Homología de Secuencia , Programas Informáticos
17.
Sci Rep ; 11(1): 10025, 2021 05 11.
Artículo en Inglés | MEDLINE | ID: mdl-33976321

RESUMEN

Homopeptides (runs of one amino-acid type) are evolutionarily important since they are prone to expand/contract during DNA replication, recombination and repair. To gain insight into the genomic/proteomic traits driving their variation, we analyzed how homopeptides and homocodons (which are pure codon repeats) vary across 405 Dikarya, and probed their linkage to genome GC/AT bias and other factors. We find that amino-acid homopeptide frequencies vary diversely between clades, with the AT-rich Saccharomycotina trending distinctly. As organisms evolve, homocodon and homopeptide numbers are majorly coupled to GC/AT-bias, exhibiting a bi-furcated correlation with degree of AT- or GC-bias. Mid-GC/AT genomes tend to have markedly fewer simply because they are mid-GC/AT. Despite these trends, homopeptides tend to be GC-biased relative to other parts of coding sequences, even in AT-rich organisms, indicating they absorb AT bias less or are inherently more GC-rich. The most frequent and most variable homopeptide amino acids favour intrinsic disorder, and there are an opposing correlation and anti-correlation versus homopeptide levels for intrinsic disorder and structured-domain content respectively. Specific homopeptides show unique behaviours that we suggest are linked to inherent slippage probabilities during DNA replication and recombination, such as poly-glutamine, which is an evolutionarily very variable homopeptide with a codon repertoire unbiased for GC/AT, and poly-lysine whose homocodons are overwhelmingly made from the codon AAG.


Asunto(s)
Secuencia de Aminoácidos , Ascomicetos/genética , Basidiomycota/genética , Uso de Codones , Péptidos/genética
18.
PeerJ ; 8: e9669, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32844065

RESUMEN

Prions are self-propagating alternative states of protein domains. They are linked to both diseases and functional protein roles in eukaryotes. Prion-forming domains in Saccharomyces cerevisiae are typically domains with high intrinsic protein disorder (i.e., that remain unfolded in the cell during at least some part of their functioning), that are converted to self-replicating amyloid forms. S. cerevisiae is a member of the fungal class Saccharomycetes, during the evolution of which a large population of prion-like domains has appeared. It is still unclear what principles might govern the molecular evolution of prion-forming domains, and intrinsically disordered domains generally. Here, it is discovered that in a set of such prion-forming domains some evolve in the fungal class Saccharomycetes in such a way as to absorb general mutation biases across millions of years, whereas others do not, indicating a spectrum of selection pressures on composition and sequence. Thus, if the bias-absorbing prion formers are conserving a prion-forming capability, then this capability is not interfered with by the absorption of bias changes over the duration of evolutionary epochs. Evidence is discovered for selective constraint against the occurrence of lysine residues (which likely disrupt prion formation) in S. cerevisiae prion-forming domains as they evolve across Saccharomycetes. These results provide a case study of the absorption of mutational trends by compositionally biased domains, and suggest methodology for assessing selection pressures on the composition of intrinsically disordered regions.

19.
PeerJ ; 8: e9940, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33062426

RESUMEN

The proteome of the malaria parasite Plasmodium falciparum is notable for the pervasive occurrence of homopeptides or low-complexity regions (i.e., regions that are made from a small subset of amino-acid residue types). The most prevalent of these are made from residues encoded by adenine/thymidine (AT)-rich codons, in particular asparagine. We examined homopeptide occurrences within protein domains in P. falciparum. Homopeptide enrichments occur for hydrophobic (e.g., valine), or small residues (alanine or glycine) in short spans (<5 residues), but these enrichments disappear for longer lengths. We observe that short asparagine homopeptides (<10 residues long) have a dramatic relative depletion inside protein domains, indicating some selective constraint to keep them from forming. We surmise that this is possibly linked to co-translational protein folding, although there are specific protein domains that are enriched in longer asparagine homopeptides (≥10 residues) indicating a functional linkage for specific poly-asparagine tracts. Top gene ontology functional category enrichments for homopeptides associated with diverse protein domains include "vesicle-mediated transport", and "DNA-directed 5'-3' RNA polymerase activity", with various categories linked to "binding" evidencing significant homopeptide depletions. Also, in general homopeptides are substantially enriched in the parts of protein domains that are near/in IDRs. The implications of these findings are discussed.

20.
PeerJ ; 8: e9023, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32337108

RESUMEN

Pub1 protein is an important RNA-binding protein functional in stress granule assembly in budding yeast Saccharomyces cerevisiae and, as its co-ortholog Tia1, in humans. It is unique among proteins in evidencing prion-like aggregation in both its yeast and human forms. Previously, we noted that Pub1/Tia1 was the only protein linked to human disease that has prion-like character and and has demonstrated such aggregation in both species. Thus, we were motivated to probe further into the evolution of the Pub1/Tia1 family (and its close relative Nam8 and its orthologs) to gain a picture of how such a protein has evolved over deep evolutionary time since the last common ancestor of eukaryotes. Here, we discover that the prion-like composition of this protein family is deeply conserved across eukaryotes, as is the prion-like composition of its close relative Nam8/Ngr1. A sizeable minority of protein orthologs have multiple prion-like domains within their sequences (6-20% depending on criteria). The number of RNA-binding RRM domains is conserved at three copies over >86% of the Pub1 family (>71% of the Nam8 family), but proteins with just one or two RRM domains occur frequently in some clades, indicating that these are not due to annotation errors. Overall, our results indicate that a basic scaffold comprising three RNA-binding domains and at least one prion-like region has been largely conserved since the last common ancestor of eukaryotes, providing further evidence that prion-like aggregation may be a very ancient and conserved phenomenon for certain specific proteins.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA