RESUMEN
The Simple Modular Architecture Research Tool (SMART) is an online resource (http://smart.embl.de/) used for protein domain identification and the analysis of protein domain architectures. Many new features were implemented to make SMART more accessible to scientists from different fields. The new 'Genomic' mode in SMART makes it easy to analyze domain architectures in completely sequenced genomes. Domain annotation has been updated with a detailed taxonomic breakdown and a prediction of the catalytic activity for 50 SMART domains is now available, based on the presence of essential amino acids. Furthermore, intrinsically disordered protein regions can be identified and displayed. The network context is now displayed in the results page for more than 350 000 proteins, enabling easy analyses of domain interactions.
Asunto(s)
Bases de Datos de Proteínas , Genómica , Estructura Terciaria de Proteína , Catálisis , Dominio Catalítico , Internet , Modelos Biológicos , Complejos Multiproteicos/metabolismo , Estructura Terciaria de Proteína/genética , Alineación de Secuencia , Análisis de Secuencia de Proteína , Interfaz Usuario-ComputadorRESUMEN
BACKGROUND: In plants the hormone cytokinin is perceived by members of a small cytokinin receptor family, which are hybrid sensor histidine kinases. While the immediate downstream signaling pathway is well characterized, the domain of the receptor responsible for ligand binding and which residues are involved in this process has not been determined experimentally. RESULTS: Using a live cell hormone-binding assay, we show that cytokinin is bound by a receptor domain predicted to be extracellular, the so called CHASE (cyclases, histidine kinase associated sensory extracellular) domain. The CHASE domain occurs not only in plant cytokinin receptors but also in numerous orphan receptors in lower eukaryotes and bacteria. Taking advantage of this fact, we used an evolutionary proteomics approach to identify amino acids important for cytokinin binding by looking for residues conserved in cytokinin receptors, but not in other receptors. By comparing differences in evolutionary rates, we predicted five amino acids within the plant CHASE domains to be crucial for cytokinin binding. Mutagenesis of the predicted sites and subsequent binding assays confirmed the relevance of four of the selected amino acids, showing the biological significance of site-specific evolutionary rate differences. CONCLUSION: This work demonstrates the use of a bioinformatic analysis to mine the huge set of genomic data from different taxa in order to generate a testable hypothesis. We verified the hypothesis experimentally and identified four amino acids which are to a different degree required for ligand-binding of a plant hormone receptor.
Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Proteínas Portadoras/genética , Citocininas/genética , Evolución Molecular , Proteínas Quinasas/genética , Proteómica , Receptores de Superficie Celular/genética , Secuencia de Aminoácidos , Sitios de Unión , Immunoblotting , Ligandos , Datos de Secuencia Molecular , Filogenia , Alineación de SecuenciaRESUMEN
MOTIVATION: Due to the growing number of completely sequenced genomes, functional annotation of proteins becomes a more and more important issue. Here, we describe a method for the prediction of sites within protein domains, which are part of protein-ligand interactions. As recently demonstrated, these sites are not trivial to detect because of a varying degree of conservation of their location and type within a domain family. RESULTS: The developed method for the prediction of protein-ligand interaction sites is based on a newly defined interaction profile hidden Markov model (ipHMM) topology that takes structural and sequence data into account. It is based on a homology search via a posterior decoding algorithm that yields probabilities for interacting sequence positions and inherits the efficiency and the power of the profile hidden Markov model (pHMM) methodology. The algorithm enhances the quality of interaction site predictions and is a suitable tool for large scale studies, which was already demonstrated for pHMMs. AVAILABILITY: The MATLAB-files are available on request from the first author.
Asunto(s)
Algoritmos , Reconocimiento de Normas Patrones Automatizadas/métodos , Mapeo de Interacción de Proteínas/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Inteligencia Artificial , Sitios de Unión , Cadenas de Markov , Datos de Secuencia Molecular , Unión Proteica , Estructura Terciaria de ProteínaRESUMEN
BACKGROUND: The functional sites of a protein present important information for determining its cellular function and are fundamental in drug design. Accordingly, accurate methods for the prediction of functional sites are of immense value. Most available methods are based on a set of homologous sequences and structural or evolutionary information, and assume that functional sites are more conserved than the average. In the analysis presented here, we have investigated the conservation of location and type of amino acids at functional sites, and compared the behaviour of functional sites between different protein domains. RESULTS: Functional sites were extracted from experimentally determined structural complexes from the Protein Data Bank harbouring a conserved protein domain from the SMART database. In general, functional (i.e. interacting) sites whose location is more highly conserved are also more conserved in their type of amino acid. However, even highly conserved functional sites can present a wide spectrum of amino acids. The degree of conservation strongly depends on the function of the protein domain and ranges from highly conserved in location and amino acid to very variable. Differentiation by binding partner shows that ion binding sites tend to be more conserved than functional sites binding peptides or nucleotides. CONCLUSION: The results gained by this analysis will help improve the accuracy of functional site prediction and facilitate the characterization of unknown protein sequences.
Asunto(s)
Biología Computacional/métodos , Secuencia Conservada , Estructura Terciaria de Proteína/genética , Proteínas/química , Proteínas/clasificación , Secuencia de Aminoácidos , Computadores Moleculares , Sondas de ADN , Bases de Datos Genéticas , Ligandos , Nucleótidos/clasificación , Péptidos/clasificación , Mapeo de Interacción de Proteínas/métodos , Alineación de Secuencia , Homología de SecuenciaRESUMEN
Although the catalytic center of an enzyme is usually highly conserved, there have been a few reports of proteins with substitutions at essential catalytic positions, which convert the enzyme into a catalytically inactive form. Here, we report a large-scale analysis of substitutions at enzymes' catalytic sites in order to gain insight into the function and evolution of inactive enzyme-homologues. Our analysis revealed that inactive enzyme-homologues are not an exception only found in single enzyme families, but that they are represented in a large variety of enzyme families and conserved among metazoan species. Even though they have lost their catalytic activity, they have adopted new functions and are now mainly involved in regulatory processes, as shown by several case studies. This modification of existing modules is an efficient mechanism to evolve new functions. The invention of inactive enzyme-homologues in metazoa has thereby led to an enhancement of complexity of regulatory networks.
Asunto(s)
Enzimas/metabolismo , Enzimas/genética , Seudogenes , Transducción de SeñalRESUMEN
N-Acetyl-beta-D-glucosaminidase (O-GlcNAcase) is a key enzyme in the posttranslational modification of intracellular proteins by O-linked N-acetylglucosamine (O-GlcNAc). Here, we show that this protein contains two catalytic domains, one homologous to bacterial hyaluronidases and one belonging to the GCN5-related family of acetyltransferases (GNATs). Using sequence and structural information, we predict that the GNAT homologous region contains the O-GlcNAcase activity. Thus, O-GlcNAcase is the first member of the GNAT family not involved in transfer of acetyl groups, adding a new mode of evolution to this large protein family. Comparison with solved structures of different GNATs led to a reliable structure prediction and mapping of residues involved in binding of the GlcNAc-modified proteins and catalysis.
Asunto(s)
N-Acetilglucosaminiltransferasas/metabolismo , Secuencia de Aminoácidos , Dominio Catalítico , Evolución Molecular , Datos de Secuencia Molecular , N-Acetilglucosaminiltransferasas/química , Homología de Secuencia de Aminoácido , Relación Estructura-ActividadRESUMEN
The conquest of the land by plants required dramatic morphological and metabolic adaptations. Complex developmental programs under tight regulation evolved during this process. Key regulators of plant development are phytohormones, such as cytokinins. Cytokinins are adenine derivatives that affect various processes in plants. The cytokinin signal transduction system, which is mediated via a multistep variant of the bacterial two-component signaling system, is well characterized in the model plant Arabidopsis (Arabidopsis thaliana). To understand the origin and evolutionary pattern of this signaling pathway, we surveyed the genomes of several sequenced key plant species ranging from unicellular algae, moss, and lycophytes, to higher land plants, including Arabidopsis and rice (Oryza sativa), for proteins involved in cytokinin signal transduction. Phylogenetic analysis revealed that the hormone-binding receptor and a class of negative regulators first appeared in land plants. Other components of the signaling pathway were present in all species investigated. Furthermore, we found that the receptors evolved under different evolutionary constraints from the other components of the pathway: The number of receptors remained fairly constant, while the other protein families expanded.
Asunto(s)
Evolución Biológica , Citocininas/metabolismo , Transducción de Señal , Bryopsida/metabolismo , Duplicación de Gen , Modelos Biológicos , Filogenia , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , Estructura Terciaria de Proteína , Receptores de Superficie Celular/metabolismo , Factores de TiempoRESUMEN
We report the draft genome sequence of the model moss Physcomitrella patens and compare its features with those of flowering plants, from which it is separated by more than 400 million years, and unicellular aquatic algae. This comparison reveals genomic changes concomitant with the evolutionary movement to land, including a general increase in gene family complexity; loss of genes associated with aquatic environments (e.g., flagellar arms); acquisition of genes for tolerating terrestrial stresses (e.g., variation in temperature and water availability); and the development of the auxin and abscisic acid signaling pathways for coordinating multicellular growth and dehydration response. The Physcomitrella genome provides a resource for phylogenetic inferences about gene function and for experimental analysis of plant processes through this plant's unique facility for reverse genetics.
Asunto(s)
Evolución Biológica , Bryopsida/genética , Genoma de Planta , Adaptación Fisiológica , Animales , Arabidopsis/genética , Arabidopsis/fisiología , Bryopsida/fisiología , Chlamydomonas reinhardtii/genética , Chlamydomonas reinhardtii/fisiología , Biología Computacional , Reparación del ADN , Deshidratación , Duplicación de Gen , Genes de Plantas , Magnoliopsida/genética , Magnoliopsida/fisiología , Redes y Vías Metabólicas/genética , Familia de Multigenes , Oryza/genética , Oryza/fisiología , Filogenia , Proteínas de Plantas/genética , Proteínas de Plantas/fisiología , Secuencias Repetitivas de Ácidos Nucleicos , Retroelementos , Análisis de Secuencia de ADN , Transducción de Señal/genéticaRESUMEN
The protein tyrosine phosphatase (PTP) family plays a central role in signal transduction pathways by controlling the phosphorylation state of serine, threonine, and tyrosine residues. PTPs can be divided into dual specificity phosphatases and the classical PTPs, which can comprise of one or two phosphatase domains. We studied amino acid substitutions at functional sites in the phosphatase domain and identified putative noncatalytic phosphatase domains in all subclasses of the PTP family. The presence of inactive phosphatase domains in all subclasses indicates that they were invented multiple times in evolution. Depending on the domain composition, loss of catalytic activity can result in different consequences for the function of the protein. Inactive single-domain phosphatases can still specifically bind substrate and protect it from dephosphorylation by other phosphatases. The inactive domains of tandem phosphatases can be further subdivided. The first class is more conserved, still able to bind phosphorylated tyrosine residues and might recruit multiphosphorylated substrates for the adjacent active domain. The second has accumulated several variable amino acid substitutions in the catalytic center, indicating a complete loss of tyrosine-binding capabilities. To study the impact of substitutions in the catalytic center to the evolution of the whole domain, we examined the evolutionary rates for each individual site and compared them between the classes. This analysis revealed a release of evolutionary constraint for multiple sites surrounding the catalytic center only in the second class, emphasizing its difference in function compared with the first class. Furthermore, we found a region of higher conservation common to both domain classes, suggesting a new regulatory center. We discuss the influence of evolutionary forces on the development of the phosphatase domain, which has led to additional functions, such as the specific protection of phosphorylated tyrosine residues, substrate recruitment, and regulation of the catalytic activity of adjacent domains.
Asunto(s)
Evolución Molecular , Proteínas Tirosina Fosfatasas/química , Proteínas Tirosina Fosfatasas/genética , Sustitución de Aminoácidos , Dominio Catalítico , Biología Computacional , Estructura Terciaria de Proteína , Proteínas Tirosina Fosfatasas/clasificaciónRESUMEN
Prior work has suggested that loss of expression of one or more of the many C/D box small nucleolar RNAs (snoRNAs) encoded within the complex, paternally expressed SNRPN (small nuclear ribonuclear protein N) locus may result in the phenotype of Prader-Willi syndrome (PWS). We suggest that the minimal critical region for PWS is approximately 121 kb within the >460-kb SNRPN locus, bordered by a breakpoint cluster region identified in three individuals with PWS who have balanced reciprocal translocations and by the proximal deletion breakpoint of a familial deletion found in an unaffected mother, her three children with Angelman syndrome, and her father. The subset of SNRPN-encoded snoRNAs within this region comprises the PWCR1/HBII-85 cluster of snoRNAs and the single HBII-438A snoRNA. These are the only known genes within this region, which suggests that loss of their expression may be responsible for much or all of the phenotype of PWS. This hypothesis is challenged by findings in two individuals with PWS who have balanced translocations with breakpoints upstream of the proposed minimal critical region but whose cells were reported to express transcripts within it, adjacent to these snoRNAs. By use of real-time quantitative reverse-transcriptase polymerase chain reaction, we reassessed expression of these transcripts and of the snoRNAs themselves in fibroblasts of one of these patients. We find that the transcripts reported to be expressed in lymphoblast-somatic cell hybrids are not expressed in fibroblasts, and we suggest that the original results were misinterpreted. Most important, we show that the PWCR1/HBII-85 snoRNAs are not expressed in fibroblasts of this individual. These results are consistent with the hypothesis that loss of expression of the snoRNAs in the proposed minimal critical region confers much or all of the phenotype of PWS.