ABSTRACT
Research data management (RDM) is central to the implementation of the FAIR (Findable Accessible, Interoperable, Reusable) and Open Science principles. Recognising the importance of RDM, ELIXIR Platforms and Nodes have invested in RDM and launched various projects and initiatives to ensure good data management practices for scientific excellence. These projects have resulted in a rich set of tools and resources highly valuable for FAIR data management. However, these resources remain scattered across projects and ELIXIR structures, making their dissemination and application challenging. Therefore, it becomes imminent to coordinate these efforts for sustainable and harmonised RDM practices with dedicated forums for RDM professionals to exchange knowledge and share resources. The proposed ELIXIR RDM Community will bring together RDM experts to develop ELIXIR's vision and coordinate its activities, taking advantage of the available assets. It aims to coordinate RDM best practices and illustrate how to use the existing ELIXIR RDM services. The Community will be built around three integral pillars, namely, a network of RDM professionals, RDM knowledge management and RDM training expertise and resources. It will also engage with external stakeholders to leverage benefits and provide a forum to RDM professionals for regular knowledge exchange, capacity building and development of harmonised RDM practices, keeping in line with the overall scope of the RDM Community. In the short term, the Community aims to build upon the existing resources and ensure that the content of these remain up to date and fit for purpose. In the long run, the Community will aim to strengthen the skills and knowledge of its RDM professionals to support the emerging needs of the scientific community. The Community will also devise an effective strategy to engage with other ELIXIR structures and international stakeholders to influence and align with developments and solutions in the RDM field.
Subject(s)
Data Management , Data Management/methods , Humans , ResearchABSTRACT
BACKGROUND: Short-chain dehydrogenases/reductases (SDRs) form one of the largest and oldest NAD(P)(H) dependent oxidoreductase families. Despite a conserved 'Rossmann-fold' structure, members of the SDR superfamily exhibit low sequence similarities, which constituted a bottleneck in terms of identification. Recent classification methods, relying on hidden-Markov models (HMMs), improved identification and enabled the construction of a nomenclature. However, functional annotations of plant SDRs remain scarce. RESULTS: Wide-scale analyses were performed on ten plant genomes. The combination of hidden Markov model (HMM) based analyses and similarity searches led to the construction of an exhaustive inventory of plant SDR. With 68 to 315 members found in each analysed genome, the inventory confirmed the over-representation of SDRs in plants compared to animals, fungi and prokaryotes. The plant SDRs were first classified into three major types - 'classical', 'extended' and 'divergent' - but a minority (10% of the predicted SDRs) could not be classified into these general types ('unknown' or 'atypical' types). In a second step, we could categorize the vast majority of land plant SDRs into a set of 49 families. Out of these 49 families, 35 appeared early during evolution since they are commonly found through all the Green Lineage. Yet, some SDR families - tropinone reductase-like proteins (SDR65C), 'ABA2-like'-NAD dehydrogenase (SDR110C), 'salutaridine/menthone-reductase-like' proteins (SDR114C), 'dihydroflavonol 4-reductase'-like proteins (SDR108E) and 'isoflavone-reductase-like' (SDR460A) proteins - have undergone significant functional diversification within vascular plants since they diverged from Bryophytes. Interestingly, these diversified families are either involved in the secondary metabolism routes (terpenoids, alkaloids, phenolics) or participate in developmental processes (hormone biosynthesis or catabolism, flower development), in opposition to SDR families involved in primary metabolism which are poorly diversified. CONCLUSION: The application of HMMs to plant genomes enabled us to identify 49 families that encompass all Angiosperms ('higher plants') SDRs, each family being sufficiently conserved to enable simpler analyses based only on overall sequence similarity. The multiplicity of SDRs in plant kingdom is mainly explained by the diversification of large families involved in different secondary metabolism pathways, suggesting that the chemical diversification that accompanied the emergence of vascular plants acted as a driving force for SDR evolution.
Subject(s)
Genetic Variation , Genome, Plant/genetics , Multigene Family , Oxidoreductases/genetics , Plants/enzymology , Plants/genetics , Evolution, Molecular , Lipid Metabolism/genetics , Markov Chains , Multigene Family/genetics , Oxidoreductases/classification , Oxidoreductases/metabolism , Phylogeny , Principal Component Analysis , Quantitative Trait, HeritableABSTRACT
BACKGROUND: Many parasites use multicopy protein families to avoid their host's immune system through a strategy called antigenic variation. RIFIN and STEVOR proteins are variable surface antigens uniquely found in the malaria parasites Plasmodium falciparum and P. reichenowi. Although these two protein families are different, they have more similarity to each other than to any other proteins described to date. As a result, they have been grouped together in one Pfam domain. However, a recent study has described the sub-division of the RIFIN protein family into several functionally distinct groups. These sub-groups require phylogenetic analysis to sort out, which is not practical for large-scale projects, such as the sequencing of patient isolates and meta-genomic analysis. RESULTS: We have manually curated the rif and stevor gene repertoires of two Plasmodium falciparum genomes, isolates DD2 and HB3. We have identified 25% of mis-annotated and ~30 missing rif and stevor genes. Using these data sets, as well as sequences from the well curated reference genome (isolate 3D7) and field isolate data from Uniprot, we have developed a tool named RSpred. The tool, based on a set of hidden Markov models and an evaluation program, automatically identifies STEVOR and RIFIN sequences as well as the sub-groups: A-RIFIN, B-RIFIN, B1-RIFIN and B2-RIFIN. In addition to these groups, we distinguish a small subset of STEVOR proteins that we named STEVOR-like, as they either differ remarkably from typical STEVOR proteins or are too fragmented to reach a high enough score. When compared to Pfam and TIGRFAMs, RSpred proves to be a more robust and more sensitive method. We have applied RSpred to the proteomes of several P. falciparum strains, P. reichenowi, P. vivax, P. knowlesi and the rodent malaria species. All groups were found in the P. falciparum strains, and also in the P. reichenowi parasite, whereas none were predicted in the other species. CONCLUSIONS: We have generated a tool for the sorting of RIFIN and STEVOR proteins, large antigenic variant protein groups, into homogeneous sub-families. Assigning functions to such protein families requires their subdivision into meaningful groups such as we have shown for the RIFIN protein family. RSpred removes the need for complicated and time consuming phylogenetic analysis methods. It will benefit both research groups sequencing whole genomes as well as others working with field isolates. RSpred is freely accessible via http://www.ifm.liu.se/bioinfo/.
Subject(s)
Antigens, Protozoan/classification , Markov Chains , Membrane Proteins/classification , Plasmodium falciparum/genetics , Protozoan Proteins/classification , Antigens, Protozoan/genetics , Computational Biology/methods , Genome, Protozoan , Limit of Detection , Membrane Proteins/genetics , Phylogeny , Protozoan Proteins/genetics , Sequence Alignment , Sequence Analysis, DNA , SoftwareABSTRACT
Dehydrogenases and reductases are enzymes of fundamental metabolic importance that often adopt a specific structure known as the Rossmann fold. This fold, consisting of a six-stranded beta-sheet surrounded by alpha-helices, is responsible for coenzyme binding. We have developed a method to identify Rossmann folds and predict their coenzyme specificity (NAD, NADP or FAD) using only the amino acid sequence as input. The method is based upon hidden Markov models and sequence pattern analysis. The prediction sensitivity is 79% and the selectivity close to 100%. The method was applied on a set of 68 genomes, representing the three kingdoms archaea, bacteria and eukaryota. In prokaryotes, 3% of the genes were found to code for Rossmann-fold proteins, while the corresponding ratio in eukaryotes is only around 1%. In all genomes, NAD is the most preferred cofactor (41-49%), followed by NADP with 30-38%, while FAD is the least preferred cofactor (21%). However, the NAD preponderance over NADP is most pronounced in archaea, and least in eukaryotes. In all three kingdoms, only 3-8% of the Rossmann proteins are predicted to have more than one membrane-spanning segment, which is much lower than the frequency of membrane proteins in general. Analysis of the major protein types in eukaryotes reveals that the most common type (26%) of the Rossmann proteins are short-chain dehydrogenases/reductases. In addition, the identified Rossmann proteins were analyzed with respect to further protein types, enzyme classes and redundancy. The described method is available at http://www.ifm.liu.se/bioinfo, where the preferred coenzyme and its binding region are predicted given an amino acid sequence as input.
Subject(s)
Genome , Markov Chains , Models, Molecular , Oxidoreductases/chemistry , Protein Folding , Amino Acid Motifs , Amino Acid Sequence , Animals , Catalytic Domain , Coenzymes , Genome, Archaeal , Genome, Bacterial , Genome, Human , Humans , Molecular Sequence Data , Protein Conformation , Sensitivity and Specificity , Stochastic Processes , Substrate SpecificityABSTRACT
The progress in genome characterizations has opened new routes for studying enzyme families. The availability of the human genome enabled us to delineate the large family of short-chain dehydrogenase/reductase (SDR) members. Although the human genome releases are not yet final, we have already found 63 members. We have also compared these SDR forms with those of three model organisms: Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana. We detect eight SDR ortholog clusters in a cross-genome comparison. Four of these clusters represent extended SDR forms, a subgroup found in all life forms. The other four are classical SDRs with activities involved in cellular differentiation and signalling. We also find 18 SDR genes that are present only in the human genome of the four genomes studied, reflecting enzyme forms specific to mammals. Close to half of these gene products represent steroid dehydrogenases, emphasizing the regulatory importance of these enzymes.
Subject(s)
Multigene Family , Oxidoreductases/genetics , Animals , Arabidopsis/enzymology , Arabidopsis/genetics , Caenorhabditis elegans/enzymology , Caenorhabditis elegans/genetics , Drosophila melanogaster/enzymology , Drosophila melanogaster/genetics , Evolution, Molecular , Genome , Genome, Human , Humans , Oxidoreductases/chemistry , Oxidoreductases/classificationABSTRACT
Several proteins and peptides that can convert from alpha-helical to beta-sheet conformation and form amyloid fibrils, including the amyloid beta-peptide (Abeta) and the prion protein, contain a discordant alpha-helix that is composed of residues that strongly favor beta-strand formation. In their native states, 37 of 38 discordant helices are now found to interact with other protein segments or with lipid membranes, but Abeta apparently lacks such interactions. The helical propensity of the Abeta discordant region (K16LVFFAED23) is increased by introducing V18A/F19A/F20A replacements, and this is associated with reduced fibril formation. Addition of the tripeptide KAD or phospho-L-serine likewise increases the alpha-helical content of Abeta(12-28) and reduces aggregation and fibril formation of Abeta(1-40), Abeta(12-28), Abeta(12-24), and Abeta(14-23). In contrast, tripeptides with all-neutral, all-acidic or all-basic side chains, as well as phosphoethanolamine, phosphocholine, and phosphoglycerol have no significant effects on Abeta secondary structure or fibril formation. These data suggest that in free Abeta, the discordant alpha-helix lacks stabilizing interactions (likely as a consequence of proteolytic removal from a membrane-associated precursor protein) and that stabilization of this helix can reduce fibril formation.
Subject(s)
Amyloid beta-Peptides/chemistry , Amino Acid Sequence , Amino Acid Substitution , Amyloid beta-Peptides/metabolism , Amyloid beta-Peptides/ultrastructure , Molecular Sequence Data , Oligopeptides/chemistry , Organophosphorus Compounds/chemistry , Protein Structure, SecondaryABSTRACT
Short-chain dehydrogenases/reductases (SDRs) are enzymes of great functional diversity. In spite of a residue identity of only 15-30%, the folds are conserved to a large extent, with specific sequence motifs detectable. We have developed an assignment scheme based on these motifs and detect five families. Only two of these were known before, called 'Classical' and 'Extended', but are now distinguished at a further level based on patterns of charged residues in the coenzyme-binding region, giving seven subfamilies of classical SDRs and three subfamilies of extended SDRs. Three further families are novel entities, denoted 'Intermediate', 'Divergent' and 'Complex', encompassing short-chain alcohol dehydrogenases, enoyl reductases and multifunctional enzymes, respectively. The assignment scheme was applied to the genomes of human, mouse, D. melanogaster, C. elegans, A. thaliana and S. cerevisiae. In the animal genomes, genes corresponding to the extended SDRs amount to around one quarter or less of the total number of SDR genes, while in those of A. thaliana and S. cerevisiae, the extended members constitute about 40% of the SDR forms. The NAD(H)-dependent SDRs are about equally many as the NADP(H)-dependent ones in human, mouse and plant, while the proportions of NAD(H)-dependent enzymes are much lower in fruit fly, worm and yeast. We also find that NADP(H) is the preferred coenzyme among most classical SDRs, while NAD(H) is that preferred among most extended SDRs.
Subject(s)
Coenzymes/metabolism , Oxidoreductases/metabolism , Animals , Genome , Humans , Species SpecificityABSTRACT
Short-chain dehydrogenases/reductases (SDR) form a large, functionally heterogeneous protein family presently with about 3000 primary and about 30 3D structures deposited in databases. Despite low sequence identities between different forms (about 15-30%), the 3D structures display highly similar alpha/beta folding patterns with a central beta-sheet, typical of the Rossmann-fold. Based on distinct sequence motifs functional assignments and classifications are possible, making it possible to build a general nomenclature system. Recent mutagenetic and structural studies considerably extend the knowledge on the general reaction mechanism, thereby establishing a catalytic tetrad of Asn-Ser-Tyr-Lys residues, which presumably form the framework for a proton relay system including the 2'-OH of the nicotinamide ribose, similar to the mechanism found in horse liver ADH. Based on their cellular functions, several SDR enzymes appear as possible and promising pharmacological targets with application areas spanning hormone-dependent cancer forms or metabolic diseases such as obesity and diabetes, and infectious diseases.
Subject(s)
Oxidoreductases/metabolism , Crystallography, X-Ray , Models, Molecular , Oxidoreductases/chemistry , Oxidoreductases/drug effects , Protein ConformationABSTRACT
The short-chain dehydrogenases/reductases (SDRs) constitute one of the largest protein superfamilies known today. The members are distantly related with typically 20-30% residue identity in pair-wise comparisons. Still, all hitherto structurally known SDRs present a common three-dimensional structure consisting of a Rossmann fold with a parallel beta sheet flanked by three helices on each side. Using hidden Markov models (HMMs), we have developed a semi-automated subclassification system for this huge family. Currently, 75% of all SDR forms have been assigned to one of the 464 families totalling 122,940 proteins. There are 47 human SDR families, corresponding to 75 genes. Most human SDR families (35 families) have only one gene, while 12 have between 2 and 8 genes. For more than half of the human SDR families, the three-dimensional fold is known. The number of SDR members increases considerably every year, but the number of SDR families now starts to converge. The classification method has paved the ground for a sustainable and expandable nomenclature system. Information on the SDR superfamily is continuously updated at http://sdr-enzymes.org/.
Subject(s)
Butyryl-CoA Dehydrogenase/classification , Fatty Acid Synthases/classification , NADH, NADPH Oxidoreductases/classification , Butyryl-CoA Dehydrogenase/chemistry , Butyryl-CoA Dehydrogenase/genetics , Butyryl-CoA Dehydrogenase/metabolism , Fatty Acid Synthases/chemistry , Fatty Acid Synthases/genetics , Fatty Acid Synthases/metabolism , Humans , NADH, NADPH Oxidoreductases/chemistry , NADH, NADPH Oxidoreductases/genetics , NADH, NADPH Oxidoreductases/metabolism , Terminology as TopicABSTRACT
Different lines of alcohol dehydrogenases (ADHs) have separate superfamily origins, already recognized but now extended and re-evaluated by re-screening of the latest databank update. The short-chain form (SDR) is still the superfamily with most abundant occurrence, most multiple divergence, most prokaryotic emphasis, and most non-complicated architecture. This pattern is compatible with an early appearance at the time of the emergence of prokaryotic cellular life. The medium-chain form (MDR) is also old but second in terms of all the parameters above, and therefore compatible with a second emergence. However, this step appears seemingly earlier than previously considered, and may indicate sub-stages of early emergences at the increased resolution available from the now greater number of data entries. The Zn-MDR origin constitutes a third stage, possibly compatible with the transition to oxidative conditions on earth. Within all these three lines, repeated enzymogeneses gave the present divergence. MDR-ADH origin(s), at a fourth stage, may also be further resolved in multiple or extended modes, but the classical liver MDR-ADH of the liver type can still be traced to a gene duplication ~550 MYA (million years ago), at the early vertebrate radiation, compatible with the post-eon-shift, "Cambrian explosion". Classes and isozymes correspond to subsequent and recent duplicatory events, respectively. They illustrate a peculiar pattern with functional and emerging evolutionary distinctions between parent and emerging lines, suggesting a parallelism between duplicatory and mutational events, now also visible at separate sub-stages. Combined, all forms show distinctive patterns at different levels and illustrate correlations with global events. They further show that simple molecular observations on patterns, multiplicities and occurrence give much information, suggesting common divergence rules not much disturbed by horizontal gene transfers after the initial origins.
Subject(s)
Alcohol Dehydrogenase/genetics , Evolution, Molecular , Animals , Gene Transfer, Horizontal , Humans , Isoenzymes/genetics , Liver/enzymology , Liver/metabolism , Oxidation-Reduction , Prokaryotic Cells/enzymology , Prokaryotic Cells/metabolismABSTRACT
Ribosome biogenesis in eukaryotes requires coordinated folding and assembly of a pre-rRNA into sequential pre-rRNA-protein complexes in which chemical modifications and RNA cleavages occur. These processes require many small nucleolar RNAs (snoRNAs) and proteins. Rbm19/Mrd1 is one such protein that is built from multiple RNA-binding domains (RBDs). We find that Rbm19/Mrd1 with five RBDs is present in all branches of the eukaryotic phylogenetic tree, except in animals and Choanoflagellates, that instead have a version with six RBDs and Microsporidia which have a minimal Rbm19/Mrd1 protein with four RBDs. Rbm19/Mrd1 therefore evolved as a multi-RBD protein very early in eukaryotes. The linkers between the RBDs have conserved properties; they are disordered, except for linker 3, and position the RBDs at conserved relative distances from each other. All but one of the RBDs have conserved properties for RNA-binding and each RBD has a specific consensus sequence and a conserved position in the protein, suggesting a functionally important modular design. The patterns of evolutionary conservation provide information for experimental analyses of the function of Rbm19/Mrd1. In vivo mutational analysis confirmed that a highly conserved loop 5-ß4-strand in RBD6 is essential for function.
Subject(s)
Conserved Sequence , Evolution, Molecular , RNA-Binding Proteins/genetics , Ribosomes/metabolism , Saccharomyces cerevisiae Proteins/genetics , Amino Acid Sequence , Animals , DNA Mutational Analysis , Genome, Fungal/genetics , Humans , Microsporida/genetics , Molecular Sequence Data , Phylogeny , Protein Structure, Secondary , Protein Structure, Tertiary , RNA-Binding Proteins/chemistry , Reproducibility of Results , Saccharomyces cerevisiae Proteins/chemistry , Sequence Alignment , Sequence Homology, Amino AcidABSTRACT
The short-chain dehydrogenase/reductase (SDR) superfamily now has over 47 000 members, most of which are distantly related, with typically 20-30% residue identity in pairwise comparisons, making it difficult to obtain an overview of this superfamily. We have therefore developed a family classification system, based upon hidden Markov models (HMMs). To this end, we have identified 314 SDR families, encompassing about 31,900 members. In addition, about 9700 SDR forms belong to families with too few members at present to establish valid HMMs. In the human genome, we find 47 SDR families, corresponding to 82 genes. Thirteen families are present in all three domains (Eukaryota, Bacteria, and Archaea), and are hence expected to catalyze fundamental metabolic processes. The majority of these enzymes are of the 'extended' type, in agreement with earlier findings. About half of the SDR families are only found among bacteria, where the 'classical' SDR type is most prominent. The HMM-based classification is used as a basis for a sustainable and expandable nomenclature system.
Subject(s)
Computational Biology/methods , Markov Chains , Oxidoreductases/classification , Animals , Archaea/enzymology , Archaea/genetics , Bacteria/enzymology , Bacteria/genetics , Cluster Analysis , Databases, Protein , Eukaryota/enzymology , Eukaryota/genetics , Genome/genetics , Humans , Mice , Oxidoreductases/genetics , Rats , Sequence AlignmentABSTRACT
Short-chain dehydrogenases/reductases (SDR) constitute one of the largest enzyme superfamilies with presently over 46,000 members. In phylogenetic comparisons, members of this superfamily show early divergence where the majority have only low pairwise sequence identity, although sharing common structural properties. The SDR enzymes are present in virtually all genomes investigated, and in humans over 70 SDR genes have been identified. In humans, these enzymes are involved in the metabolism of a large variety of compounds, including steroid hormones, prostaglandins, retinoids, lipids and xenobiotics. It is now clear that SDRs represent one of the oldest protein families and contribute to essential functions and interactions of all forms of life. As this field continues to grow rapidly, a systematic nomenclature is essential for future annotation and reference purposes. A functional subdivision of the SDR superfamily into at least 200 SDR families based upon hidden Markov models forms a suitable foundation for such a nomenclature system, which we present in this paper using human SDRs as examples.
Subject(s)
Oxidoreductases Acting on CH-CH Group Donors , Terminology as Topic , Internet , Markov ChainsABSTRACT
Diseases associated with protein fibril-formation, such as the prion diseases and Alzheimer's disease, are gaining increased attention due to their medical importance and complex origins. Using molecular dynamics (MD) simulations in an aqueous environment, we have studied the stability of the alpha-helix covering positions 15-25 of the amyloid beta-peptide (A beta) involved in Alzheimer's disease. The effects of residue replacements, including the effects of A beta disease related mutations, were also investigated. The MD simulations show a very early (2 ns) loss of alpha-helical structure for the Flemish (A beta(A21G)), Italian (A beta(E22K)), and Iowa (A beta(D23N)) forms associated with hereditary Alzheimer's disease. Similarly, an early (5 ns) loss of alpha-helical structure was observed for the Dutch (A beta(E22Q)) variant. MD here provides a possible explanation for the structural changes. Two variants of A beta, A beta(K16A,L17A,F20A) and A beta(V18A,F19A,F20A), that do not produce fibrils in vitro were also investigated. The A beta(V18A,F19A,F20A) initially loses its helical conformation but refolds into helix several times and spends most of the simulation time in helical conformation. However, the A beta(K16A,L17A,F20A) loses the alpha-helical structure after 5 ns and does not refold. For the wildtype A beta(1-40) and A beta(1-42), the helical conformation is lost after 5 ns or after 40 ns, respectively, while for the "familial" (A beta(A42T)) variant, the MD simulations suggest that a C-terminal beta-strand is stabilised, which could explain the fibrillation. The simulations for the Arctic (A beta(E22G)) variant indicate that the alpha-helix is kept for 2 ns, but reappears 2 ns later, whereafter it disappears after 10 ns. The MD results are in several cases compatible with known experimental data, but the correlation is not perfect, indicating that multimerisation tendency and other factors might also be important for fibril formation.
Subject(s)
Peptides/chemistry , Nuclear Magnetic Resonance, Biomolecular , Protein Structure, SecondaryABSTRACT
Short-chain dehydrogenases/reductases (SDRs) are enzymes of great functional diversity. Even at sequence identities of typically only 15-30%, specific sequence motifs are detectable, reflecting common folding patterns. We have developed a functional assignment scheme based on these motifs and we find five families. Two of these families were known previously and are called 'classical' and 'extended' families, but they are now distinguished at a further level based on coenzyme specificities. This analysis gives seven subfamilies of classical SDRs and three subfamilies of extended SDRs. We find that NADP(H) is the preferred coenzyme among most classical SDRs, while NAD(H) is that preferred among most extended SDRs. Three families are novel entities, denoted 'intermediate', 'divergent' and 'complex', encompassing short-chain alcohol dehydrogenases, enoyl reductases and multifunctional enzymes, respectively. The assignment scheme was applied to the genomes of human, mouse, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana and Saccharomyces cerevisiae. In the animal genomes, the extended SDRs amount to around one quarter or less of the total number of SDRs, while in the A. thaliana and S. cerevisiae genomes, the extended members constitute about 40% of the SDR forms. The numbers of NAD(H)-dependent and NADP(H)-dependent SDRs are similar in human, mouse and plant, while the proportions of NAD(H)-dependent enzymes are much lower in fruit fly, worm and yeast. We show that, in spite of the great diversity of the SDR superfamily, the primary structure alone can be used for functional assignments and for predictions of coenzyme preference.