Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Nucleic Acids Res ; 52(D1): D67-D71, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37971299

RESUMEN

The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) provides database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), DDBJ accepts and distributes nucleotide sequence data as well as their study and sample information along with the National Center for Biotechnology Information in the United States and the European Bioinformatics Institute (EBI). Besides INSDC databases, the DDBJ Center provides databases for functional genomics (GEA: Genomic Expression Archive), metabolomics (MetaboBank) and human genetic and phenotypic data (JGA: Japanese Genotype-phenotype Archive). These database systems have been built on the National Institute of Genetics (NIG) supercomputer, which is also open for domestic life science researchers to analyze large-scale sequence data. This paper reports recent updates on the archival databases and the services of the DDBJ Center, highlighting the newly redesigned MetaboBank. MetaboBank uses BioProject and BioSample in its metadata description making it suitable for multi-omics large studies. Its collaboration with MetaboLights at EBI brings synergy in locating and reusing public data.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Metabolómica , Metadatos , Humanos , Biología Computacional , Genómica , Internet , Japón , Multiómica/métodos
2.
F1000Res ; 9: 136, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32308977

RESUMEN

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.


Asunto(s)
Disciplinas de las Ciencias Biológicas , Biología Computacional , Web Semántica , Minería de Datos , Metadatos , Reproducibilidad de los Resultados
3.
Sci Rep ; 7: 43368, 2017 03 06.
Artículo en Inglés | MEDLINE | ID: mdl-28262809

RESUMEN

Although host-plant selection is a central topic in ecology, its general underpinnings are poorly understood. Here, we performed a case study focusing on the publicly available data on Japanese butterflies. A combined statistical analysis of plant-herbivore relationships and taxonomy revealed that some butterfly subfamilies in different families feed on the same plant families, and the occurrence of this phenomenon more than just by chance, thus indicating the independent acquisition of adaptive phenotypes to the same hosts. We consequently integrated plant-herbivore and plant-compound relationship data and conducted a statistical analysis to identify compounds unique to host plants of specific butterfly families. Some of the identified plant compounds are known to attract certain butterfly groups while repelling others. The additional incorporation of insect-compound relationship data revealed potential metabolic processes that are related to host plant selection. Our results demonstrate that data integration enables the computational detection of compounds putatively involved in particular interspecies interactions and that further data enrichment and integration of genomic and transcriptomic data facilitates the unveiling of the molecular mechanisms involved in host plant selection.


Asunto(s)
Mariposas Diurnas/fisiología , Biología Computacional/métodos , Conducta Alimentaria , Plantas/parasitología , Animales , Factores Quimiotácticos/análisis , Repelentes de Insectos/análisis , Fitoquímicos/análisis , Plantas/química
4.
J Chem Inf Model ; 56(3): 510-6, 2016 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-26822930

RESUMEN

Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies that estimate the number of candidate enzyme genes, these studies required some additional information aside from the structures of metabolites such as gene expression and order in the genome. In this study, we developed a novel method to identify a candidate enzyme gene of a reaction using the chemical structures of the substrate-product pair (reactant pair). The proposed method is based on a search for similar reactant pairs in a reference database and offers ortholog groups that possibly mediate the given reaction. We applied the proposed method to two experimentally validated reactions. As a result, we confirmed that the histidine transaminase was correctly identified. Although our method could not directly identify the asparagine oxo-acid transaminase, we successfully found the paralog gene most similar to the correct enzyme gene. We also applied our method to infer candidate enzyme genes in the mesaconate pathway. The advantage of our method lies in the prediction of possible genes for orphan enzyme reactions where any associated gene sequences are not determined yet. We believe that this approach will facilitate experimental identification of genes for orphan enzymes.


Asunto(s)
Enzimas/genética , Bases de Datos de Proteínas , Enzimas/metabolismo , Especificidad por Sustrato
5.
Plant Physiol ; 168(1): 47-59, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25761715

RESUMEN

Grape (Vitis vinifera) accumulates various polyphenolic compounds, which protect against environmental stresses, including ultraviolet-C (UV-C) light and pathogens. In this study, we looked at the transcriptome and metabolome in grape berry skin after UV-C irradiation, which demonstrated the effectiveness of omics approaches to clarify important traits of grape. We performed transcriptome analysis using a genome-wide microarray, which revealed 238 genes up-regulated more than 5-fold by UV-C light. Enrichment analysis of Gene Ontology terms showed that genes encoding stilbene synthase, a key enzyme for resveratrol synthesis, were enriched in the up-regulated genes. We performed metabolome analysis using liquid chromatography-quadrupole time-of-flight mass spectrometry, and 2,012 metabolite peaks, including unidentified peaks, were detected. Principal component analysis using the peaks showed that only one metabolite peak, identified as resveratrol, was highly induced by UV-C light. We updated the metabolic pathway map of grape in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and in the KaPPA-View 4 KEGG system, then projected the transcriptome and metabolome data on a metabolic pathway map. The map showed specific induction of the resveratrol synthetic pathway by UV-C light. Our results showed that multiomics is a powerful tool to elucidate the accumulation mechanisms of secondary metabolites, and updated systems, such as KEGG and KaPPA-View 4 KEGG for grape, can support such studies.


Asunto(s)
Vías Biosintéticas , Frutas/genética , Perfilación de la Expresión Génica , Metabolómica , Estilbenos/metabolismo , Rayos Ultravioleta , Vitis/genética , Vías Biosintéticas/efectos de la radiación , Calibración , Oscuridad , Fluorescencia , Frutas/metabolismo , Frutas/efectos de la radiación , Ontología de Genes , Genes de Plantas , Metaboloma/genética , Metaboloma/efectos de la radiación , Anotación de Secuencia Molecular , Análisis de Componente Principal , Metabolismo Secundario/genética , Metabolismo Secundario/efectos de la radiación , Vitis/metabolismo , Vitis/efectos de la radiación
6.
J Bioinform Comput Biol ; 12(6): 1442001, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25385078

RESUMEN

Genomics is faced with the issue of many partially annotated putative enzyme-encoding genes for which activities have not yet been verified, while metabolomics is faced with the issue of many putative enzyme reactions for which full equations have not been verified. Knowledge of enzymes has been collected by IUBMB, and has been made public as the Enzyme List. To date, however, the terminology of the Enzyme List has not been assessed comprehensively by bioinformatics studies. Instead, most of the bioinformatics studies simply use the identifiers of the enzymes, i.e. the Enzyme Commission (EC) numbers. We investigated the actual usage of terminology throughout the Enzyme List, and demonstrated that the partial characteristics of reactions cannot be retrieved by simply using EC numbers. Thus, we developed a novel ontology, named PIERO, for annotating biochemical transformations as follows. First, the terminology describing enzymatic reactions was retrieved from the Enzyme List, and was grouped into those related to overall reactions and biochemical transformations. Consequently, these terms were mapped onto the actual transformations taken from enzymatic reaction equations. This ontology was linked to Gene Ontology (GO) and EC numbers, allowing the extraction of common partial reaction characteristics from given sets of orthologous genes and the elucidation of possible enzymes from the given transformations. Further future development of the PIERO ontology should enhance the Enzyme List to promote the integration of genomics and metabolomics.


Asunto(s)
Ontologías Biológicas , Bases de Datos de Proteínas , Enzimas/química , Enzimas/clasificación , Almacenamiento y Recuperación de la Información/métodos , Terminología como Asunto , Enzimas/genética , Procesamiento de Lenguaje Natural
7.
Bioinformatics ; 30(12): i165-74, 2014 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-24931980

RESUMEN

MOTIVATION: Metabolic pathway analysis is crucial not only in metabolic engineering but also in rational drug design. However, the biosynthetic/biodegradation pathways are known only for a small portion of metabolites, and a vast amount of pathways remain uncharacterized. Therefore, an important challenge in metabolomics is the de novo reconstruction of potential reaction networks on a metabolome-scale. RESULTS: In this article, we develop a novel method to predict the multistep reaction sequences for de novo reconstruction of metabolic pathways in the reaction-filling framework. We propose a supervised approach to learn what we refer to as 'multistep reaction sequence likeness', i.e. whether a compound-compound pair is possibly converted to each other by a sequence of enzymatic reactions. In the algorithm, we propose a recursive procedure of using step-specific classifiers to predict the intermediate compounds in the multistep reaction sequences, based on chemical substructure fingerprints/descriptors of compounds. We further demonstrate the usefulness of our proposed method on the prediction of enzymatic reaction networks from a metabolome-scale compound set and discuss characteristic features of the extracted chemical substructure transformation patterns in multistep reaction sequences. Our comprehensively predicted reaction networks help to fill the metabolic gap and to infer new reaction sequences in metabolic pathways. AVAILABILITY AND IMPLEMENTATION: Materials are available for free at http://web.kuicr.kyoto-u.ac.jp/supp/kot/ismb2014/


Asunto(s)
Redes y Vías Metabólicas , Metaboloma , Metabolómica/métodos , Algoritmos , Máquina de Vectores de Soporte
8.
Bioinformatics ; 29(13): i135-44, 2013 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-23812977

RESUMEN

MOTIVATION: The metabolic pathway is an important biochemical reaction network involving enzymatic reactions among chemical compounds. However, it is assumed that a large number of metabolic pathways remain unknown, and many reactions are still missing even in known pathways. Therefore, the most important challenge in metabolomics is the automated de novo reconstruction of metabolic pathways, which includes the elucidation of previously unknown reactions to bridge the metabolic gaps. RESULTS: In this article, we develop a novel method to reconstruct metabolic pathways from a large compound set in the reaction-filling framework. We define feature vectors representing the chemical transformation patterns of compound-compound pairs in enzymatic reactions using chemical fingerprints. We apply a sparsity-induced classifier to learn what we refer to as 'enzymatic-reaction likeness', i.e. whether compound pairs are possibly converted to each other by enzymatic reactions. The originality of our method lies in the search for potential reactions among many compounds at a time, in the extraction of reaction-related chemical transformation patterns and in the large-scale applicability owing to the computational efficiency. In the results, we demonstrate the usefulness of our proposed method on the de novo reconstruction of 134 metabolic pathways in Kyoto Encyclopedia of Genes and Genomes (KEGG). Our comprehensively predicted reaction networks of 15 698 compounds enable us to suggest many potential pathways and to increase research productivity in metabolomics. AVAILABILITY: Softwares are available on request. Supplementary material are available at http://web.kuicr.kyoto-u.ac.jp/supp/kot/ismb2013/.


Asunto(s)
Redes y Vías Metabólicas , Metabolómica/métodos , Algoritmos , Enzimas/metabolismo , Modelos Lineales , Metaboloma , Máquina de Vectores de Soporte
9.
J Chem Inf Model ; 53(3): 613-22, 2013 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-23384306

RESUMEN

The metabolic network is both a network of chemical reactions and a network of enzymes that catalyze reactions. Toward better understanding of this duality in the evolution of the metabolic network, we developed a method to extract conserved sequences of reactions called reaction modules from the analysis of chemical compound structure transformation patterns in all known metabolic pathways stored in the KEGG PATHWAY database. The extracted reaction modules are repeatedly used as if they are building blocks of the metabolic network and contain chemical logic of organic reactions. Furthermore, the reaction modules often correspond to traditional pathway modules defined as sets of enzymes in the KEGG MODULE database and sometimes to operon-like gene clusters in prokaryotic genomes. We identified well-conserved, possibly ancient, reaction modules involving 2-oxocarboxylic acids. The chain extension module that appears as the tricarboxylic acid (TCA) reaction sequence in the TCA cycle is now shown to be used in other pathways together with different types of modification modules. We also identified reaction modules and their connection patterns for aromatic ring cleavages in microbial biodegradation pathways, which are most characteristic in terms of both distinct reaction sequences and distinct gene clusters. The modular architecture of biodegradation modules will have a potential for predicting degradation pathways of xenobiotic compounds. The collection of these and many other reaction modules is made available as part of the KEGG database.


Asunto(s)
Secuencia Conservada , Redes y Vías Metabólicas/genética , Biotransformación , Ciclo del Ácido Cítrico/genética , Bases de Datos Genéticas , Enzimas/química , Ácidos Grasos/síntesis química , Familia de Multigenes , Oxidación-Reducción
10.
Nucleic Acids Res ; 41(Database issue): D353-7, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23193276

RESUMEN

The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and 129 archaea). The OCs were constructed by applying the quasi-clique-based clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. It is computationally efficient to calculate OCs, which enables to regularly update the contents. KEGG OC has the following two features: (i) It consists of all complete genomes of a wide variety of organisms from three domains of life, and the number of organisms is the largest among the existing databases; and (ii) It is compatible with the KEGG database by sharing the same sets of genes and identifiers, which leads to seamless integration of OCs with useful components in KEGG such as biological pathways, pathway modules, functional hierarchy, diseases and drugs. The KEGG OC resources are accessible via OC Viewer that provides an interactive visualization of OCs at different taxonomic levels.


Asunto(s)
Bases de Datos Genéticas , Genes Arqueales , Genes Bacterianos , Genes , Algoritmos , Clasificación/métodos , Análisis por Conglomerados , Eucariontes/genética , Genoma Arqueal , Genoma Bacteriano , Genómica/métodos , Internet , Homología de Secuencia de Aminoácido
11.
BMC Syst Biol ; 7 Suppl 6: S2, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24564846

RESUMEN

BACKGROUND: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. METHODS: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. RESULTS: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. CONCLUSIONS: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources.


Asunto(s)
Biología Computacional/métodos , Análisis por Conglomerados , Bases de Datos de Compuestos Químicos , Enzimas/metabolismo , Redes y Vías Metabólicas , Reproducibilidad de los Resultados , Relación Estructura-Actividad
12.
Methods Mol Biol ; 802: 19-39, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22130871

RESUMEN

In this chapter, we demonstrate the usability of the KEGG (Kyoto encyclopedia of genes and genomes) databases and tools, especially focusing on the visualization of the omics data. The desktop application KegArray and many Web-based tools are tightly integrated with the KEGG knowledgebase, which helps visualize and interpret large amount of data derived from high-throughput measurement techniques including microarray, metagenome, and metabolome analyses. Recently developed resources for human disease, drug, and plant research are also mentioned.


Asunto(s)
Bases de Datos Genéticas , Genómica , Programas Informáticos , Minería de Datos , Enfermedad/genética , Humanos , Internet , Redes y Vías Metabólicas , Metaboloma , Preparaciones Farmacéuticas/química
13.
BMC Bioinformatics ; 12 Suppl 14: S1, 2011 Dec 14.
Artículo en Inglés | MEDLINE | ID: mdl-22373367

RESUMEN

BACKGROUND: In contrast to the increasing number of the successful genome projects, there still remain many orphan metabolites for which their synthesis processes are unknown. Metabolites, including these orphan metabolites, can be classified into groups that share the same core substructures, originated from the same biosynthetic pathways. It is known that many metabolites are synthesized by adding up building blocks to existing metabolites. Therefore, it is proposed that, for any given group of metabolites, finding the core substructure and the branched substructures can help predict their biosynthetic pathway. There already have been many reports on the multiple graph alignment techniques to find the conserved chemical substructures in relatively small molecules. However, they are optimized for ligand binding and are not suitable for metabolomic studies. RESULTS: We developed an efficient multiple graph alignment method named as MUCHA (Multiple Chemical Alignment), specialized for finding metabolic building blocks. This method showed the strength in finding metabolic building blocks with preserving the relative positions among the substructures, which is not achieved by simply applying the frequent graph mining techniques. Compared with the combined pairwise alignments, this proposed MUCHA method generally reduced computational costs with improving the quality of the alignment. CONCLUSIONS: MUCHA successfully find building blocks of secondary metabolites, and has a potential to complement to other existing methods to reconstruct metabolic networks using reaction patterns.


Asunto(s)
Química/métodos , Redes y Vías Metabólicas , Algoritmos
14.
Nucleic Acids Res ; 39(Database issue): D677-84, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21097783

RESUMEN

Correlations of gene-to-gene co-expression and metabolite-to-metabolite co-accumulation calculated from large amounts of transcriptome and metabolome data are useful for uncovering unknown functions of genes, functional diversities of gene family members and regulatory mechanisms of metabolic pathway flows. Many databases and tools are available to interpret quantitative transcriptome and metabolome data, but there are only limited ones that connect correlation data to biological knowledge and can be utilized to find biological significance of it. We report here a new metabolic pathway database, KaPPA-View4 (http://kpv.kazusa.or.jp/kpv4/), which is able to overlay gene-to-gene and/or metabolite-to-metabolite relationships as curves on a metabolic pathway map, or on a combination of up to four maps. This representation would help to discover, for example, novel functions of a transcription factor that regulates genes on a metabolic pathway. Pathway maps of the Kyoto Encyclopedia of Genes and Genomes (KEGG) and maps generated from their gene classifications are available at KaPPA-View4 KEGG version (http://kpv.kazusa.or.jp/kpv4-kegg/). At present, gene co-expression data from the databases ATTED-II, COXPRESdb, CoP and MiBASE for human, mouse, rat, Arabidopsis, rice, tomato and other plants are available.


Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Redes y Vías Metabólicas/genética , Metaboloma/genética , Animales , Humanos , Internet , Ratones , Ratas
15.
Nucleic Acids Res ; 38(Web Server issue): W138-43, 2010 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-20435670

RESUMEN

The KEGG RPAIR database is a collection of biochemical structure transformation patterns, called RDM patterns, and chemical structure alignments of substrate-product pairs (reactant pairs) in all known enzyme-catalyzed reactions taken from the Enzyme Nomenclature and the KEGG PATHWAY database. Here, we present PathPred (http://www.genome.jp/tools/pathpred/), a web-based server to predict plausible pathways of muti-step reactions starting from a query compound, based on the local RDM pattern match and the global chemical structure alignment against the reactant pair library. In this server, we focus on predicting pathways for microbial biodegradation of environmental compounds and biosynthesis of plant secondary metabolites, which correspond to characteristic RDM patterns in 947 and 1397 reactant pairs, respectively. The server provides transformed compounds and reference transformation patterns in each predicted reaction, and displays all predicted multi-step reaction pathways in a tree-shaped graph.


Asunto(s)
Enzimas/metabolismo , Redes y Vías Metabólicas , Programas Informáticos , Biocatálisis , Vías Biosintéticas , Contaminantes Ambientales/metabolismo , Internet
16.
Genome Inform ; 24: 104-15, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-22081593

RESUMEN

Many cofactors and nucleotides containing sulfur atoms are known to have important functions in a variety of organisms. Recently, the biosynthetic pathways of these sulfur containing compounds have been revealed, where many enzymes relay sulfur atoms. Increasing evidence also suggests that the prokaryotic sulfur-relay enzymes might be the evolutionary origin of ubiquitination and the related systems that control a wide range of physiological processes in eukaryotic cells. However, these sulfur-relay enzymes have been studied in only a small number of organisms. Here we carried out comparative genomic analysis and examined the presence and absence of sulfurtransferases utilized in the biosynthetic pathways of molybdenum cofactor (Moco), 2-thiouridine (S(2)U), and 4-thiouridine (S(4)U), and IscS, a cysteine desulfurase. We found that all eukaryotes and many other organisms lack the intermediate enzymes in S(2)U biosynthesis. It is also found that most genes lack rhodanese homology domain (RHD), a catalytic domain of sulfurtransferase. Some organisms have a conserved sequence composed of about 100 residues in the C terminus of TusA, different from RHD. Host-associated organisms have a tendency to lose Moco biosynthetic enzymes, and some organisms have MoaD-MoaE fusion protein. Our findings suggest that sulfur-relay pathways have been so diversified that some putative sulfurtransferases possibly function in other unknown pathways.


Asunto(s)
Regulación de la Expresión Génica , Azufre/metabolismo , Sulfurtransferasas/metabolismo , Algoritmos , Animales , Proteínas Bacterianas/metabolismo , Análisis por Conglomerados , Biología Computacional/métodos , Escherichia coli/genética , Proteínas Fúngicas/metabolismo , Perfilación de la Expresión Génica , Genómica , Humanos , Estructura Terciaria de Proteína , Alineación de Secuencia , Programas Informáticos , Ubiquitina/metabolismo
17.
Genome Inform ; 24: 127-38, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-22081595

RESUMEN

UGTs (UDP glycosyltransferase) are the largest glycosyltransferase gene family in higher plants, modifying secondary metabolites, hormones, and xenobiotics. This gene family plays an important role in the vast diversity of plant secondary metabolites specific to species. Experimental data of biochemical activities and physiological roles of plant UGTs are increasing but most UGTs are not still functionally characterized. To understand their catalytic specificity and function from sequence data, phylogenetic analyses have been achieved mainly in Arabidopsis, but massive and comprehensive approach covering various species has not been applied yet. In this study, we collected 733 UGT sequences derived from 96 plant species and 252 substrate specificity data. We constructed a phylogenetic tree and divided most part of these genes into nine sequence groups, which are characterized by biochemical specificity. Furthermore, we performed genome-wide analysis of seven plant species UGTs by mapping them into these groups. We propose this is the first step to understand whole glycosylated secondary metabolites of each plant species from its genome information.


Asunto(s)
Biología Computacional/métodos , Glucuronosiltransferasa/genética , Proteínas de Plantas/genética , Algoritmos , Arabidopsis/enzimología , Arabidopsis/genética , Catálisis , Genes de Plantas , Glicosilación , Familia de Multigenes , Filogenia , Plantas/genética , Unión Proteica , Programas Informáticos , Especificidad por Sustrato
18.
Carbohydr Res ; 344(7): 881-7, 2009 May 12.
Artículo en Inglés | MEDLINE | ID: mdl-19327755

RESUMEN

Glycosyltransferases comprise highly divergent groups of enzymes, which play a central role in the synthesis of complex glycans. Because the repertoire of glycosyltransferases in the genome determines the range of synthesizable glycans, and because the increasing amount of genome sequence data is now available, it is essential to examine these enzymes across organisms to explore possible structures and functions of the glycoconjugates. In this study, we systematically investigated 36 eukaryotic genomes and obtained 3426 glycosyltransferase homologs for biosynthesis of major glycans, classified into 53 families based on sequence similarity. The families were further grouped into six functional categories based on the biosynthetic pathways, which revealed characteristic patterns among organism groups in the degree of conservation and in the number of paralogs. The results also revealed a strong correlation between the number of glycosyltransferases and the number of coding genes in each genome. We then predicted the ability to synthesize major glycan structures including N-glycan precursors and GPI-anchors in each organism from the combination of the glycosyltransferase families. This indicates that not only parasitic protists but also some algae are likely to synthesize smaller structures than the structures known to be conserved among a wide range of eukaryotes. Finally we discuss the functions of two large families, sialyltransferases and beta 4-glycosyltransferases, by performing finer classifications into subfamilies. Our findings suggest that universality and diversity of glycans originate from two types of evolution of glycosyltransferase families, namely conserved families with few paralogs and diverged families with many paralogs.


Asunto(s)
Células Eucariotas/enzimología , Células Eucariotas/metabolismo , Genoma/genética , Glicosiltransferasas/clasificación , Glicosiltransferasas/genética , Polisacáridos/biosíntesis , Polisacáridos/química , Animales , Glicosiltransferasas/metabolismo , Humanos , Modelos Moleculares , Sialiltransferasas/clasificación , Sialiltransferasas/genética , Sialiltransferasas/metabolismo
19.
Nucleic Acids Res ; 36(Database issue): D480-4, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18077471

RESUMEN

KEGG (http://www.genome.jp/kegg/) is a database of biological systems that integrates genomic, chemical and systemic functional information. KEGG provides a reference knowledge base for linking genomes to life through the process of PATHWAY mapping, which is to map, for example, a genomic or transcriptomic content of genes to KEGG reference pathways to infer systemic behaviors of the cell or the organism. In addition, KEGG provides a reference knowledge base for linking genomes to the environment, such as for the analysis of drug-target relationships, through the process of BRITE mapping. KEGG BRITE is an ontology database representing functional hierarchies of various biological objects, including molecules, cells, organisms, diseases and drugs, as well as relationships among them. KEGG PATHWAY is now supplemented with a new global map of metabolic pathways, which is essentially a combined map of about 120 existing pathway maps. In addition, smaller pathway modules are defined and stored in KEGG MODULE that also contains other functional units and complexes. The KEGG resource is being expanded to suit the needs for practical applications. KEGG DRUG contains all approved drugs in the US and Japan, and KEGG DISEASE is a new database linking disease genes, pathways, drugs and diagnostic markers.


Asunto(s)
Bases de Datos Factuales , Genómica , Biología de Sistemas , Enfermedad , Humanos , Internet , Redes y Vías Metabólicas , Estructura Molecular , Preparaciones Farmacéuticas/química , Integración de Sistemas , Interfaz Usuario-Computador
20.
Genome Inform ; 19: 3-14, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-18546500

RESUMEN

Almost half of biological molecules (proteins and metabolites) are extrapolated as glycosylated within cells. Detection of glycosylation patterns and of attached sugar types is therefore an important step in future glycomics research. We present two algorithms to detect sugar types in Haworth projection, i.e., from x-y coordinates. The algorithms were applied to the database of flavonoid and identified backbone-specific biases of sugar types and their conjugated positions. The algorithms contribute not only to bridge between polysaccharide databases and pathway databases, but also to detect structural errors in metabolic databases.


Asunto(s)
Biología Computacional/métodos , Monosacáridos/química , Algoritmos , Carbohidratos/química , Glicosilación , Modelos Químicos , Conformación Molecular , Plantas , Polisacáridos/química , Lenguajes de Programación , Estereoisomerismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...