Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 157
Filtrar
1.
Protein Sci ; 32(12): e4820, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37881892

RESUMO

The KEGG database and analysis tools (https://www.kegg.jp) have been developed mostly for understanding genes and genomes of cellular organisms. The KO (KEGG Orthology) dataset, which is a collection of functional orthologs, plays the role of linking genes in the genome to pathways and other molecular networks, enabling KEGG mapping to uncover hidden features in the genome. Although viruses were part of KEGG for some time, they were not fully integrated in the KEGG analysis tools, because the KO assignment rate is very low for virus genes. To supplement KOs a new dataset named virus ortholog clusters (VOCs) is computationally generated, covering 90% of viral proteins in KEGG. VOCs can be used, in place of KOs, for taxonomy mapping to uncover relationships of sequence similarity groups and taxonomic groups and for identifying conserved gene orders in virus genomes. Furthermore, selected VOCs are used to define tentative KOs for characterizing protein functions. Here an overview of KEGG tools is presented focusing on these extensions for viral protein analysis.


Assuntos
Proteínas Virais , Vírus , Proteínas Virais/genética , Genoma , Bases de Dados Factuais , Vírus/genética
2.
Nucleic Acids Res ; 51(D1): D587-D592, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36300620

RESUMO

KEGG (https://www.kegg.jp) is a manually curated database resource integrating various biological objects categorized into systems, genomic, chemical and health information. Each object (database entry) is identified by the KEGG identifier (kid), which generally takes the form of a prefix followed by a five-digit number, and can be retrieved by appending /entry/kid in the URL. The KEGG pathway map viewer, the Brite hierarchy viewer and the newly released KEGG genome browser can be launched by appending /pathway/kid, /brite/kid and /genome/kid, respectively, in the URL. Together with an improved annotation procedure for KO (KEGG Orthology) assignment, an increasing number of eukaryotic genomes have been included in KEGG for better representation of organisms in the taxonomic tree. Multiple taxonomy files are generated for classification of KEGG organisms and viruses, and the Brite hierarchy viewer is used for taxonomy mapping, a variant of Brite mapping in the new KEGG Mapper suite. The taxonomy mapping enables analysis of, for example, how functional links of genes in the pathway and physical links of genes on the chromosome are conserved among organism groups.


Assuntos
Genoma , Genômica , Genômica/métodos , Bases de Dados Factuais , Bases de Dados Genéticas
3.
Protein Sci ; 31(1): 47-53, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34423492

RESUMO

In contrast to artificial intelligence and machine learning approaches, KEGG (https://www.kegg.jp) has relied on human intelligence to develop "models" of biological systems, especially in the form of KEGG pathway maps that are manually created by capturing knowledge from published literature. The KEGG models can then be used in biological big data analysis, for example, for uncovering systemic functions of an organism hidden in its genome sequence through the simple procedure of KEGG mapping. Here we present an updated version of KEGG Mapper, a suite of KEGG mapping tools reported previously (Kanehisa and Sato, Protein Sci 2020; 29:28-35), together with the new versions of the KEGG pathway map viewer and the BRITE hierarchy viewer. Significant enhancements have been made for BRITE mapping, where the mapping result can be examined by manipulation of hierarchical trees, such as pruning and zooming. The tree manipulation feature has also been implemented in the taxonomy mapping tool for linking KO (KEGG Orthology) groups and modules to phenotypes.


Assuntos
Inteligência Artificial , Biologia Computacional , Bases de Dados Genéticas , Software
4.
Nucleic Acids Res ; 49(D1): D545-D551, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33125081

RESUMO

KEGG (https://www.kegg.jp/) is a manually curated resource integrating eighteen databases categorized into systems, genomic, chemical and health information. It also provides KEGG mapping tools, which enable understanding of cellular and organism-level functions from genome sequences and other molecular datasets. KEGG mapping is a predictive method of reconstructing molecular network systems from molecular building blocks based on the concept of functional orthologs. Since the introduction of the KEGG NETWORK database, various diseases have been associated with network variants, which are perturbed molecular networks caused by human gene variants, viruses, other pathogens and environmental factors. The network variation maps are created as aligned sets of related networks showing, for example, how different viruses inhibit or activate specific cellular signaling pathways. The KEGG pathway maps are now integrated with network variation maps in the NETWORK database, as well as with conserved functional units of KEGG modules and reaction modules in the MODULE database. The KO database for functional orthologs continues to be improved and virus KOs are being expanded for better understanding of virus-cell interactions and for enabling prediction of viral perturbations.


Assuntos
Células/metabolismo , Vírus/metabolismo , Apoptose/genética , Redes Reguladoras de Genes , Genoma , Humanos , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular
5.
Protein Sci ; 29(1): 28-35, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31423653

RESUMO

KEGG is a reference knowledge base for biological interpretation of large-scale molecular datasets, such as genome and metagenome sequences. It accumulates experimental knowledge about high-level functions of the cell and the organism represented in terms of KEGG molecular networks, including KEGG pathway maps, BRITE hierarchies, and KEGG modules. By the process called KEGG mapping, a set of protein coding genes in the genome, for example, can be converted to KEGG molecular networks enabling interpretation of cellular functions and other high-level features. Here we report a new version of KEGG Mapper, a suite of KEGG mapping tools available at the KEGG website (https://www.kegg.jp/ or https://www.genome.jp/kegg/), together with the KOALA family tools for automatic assignment of KO (KEGG Orthology) identifiers used in the mapping.


Assuntos
Biologia Computacional/métodos , Proteínas/genética , Proteínas/metabolismo , Sequência de Aminoácidos , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Mapeamento de Interação de Proteínas
6.
Bioinformatics ; 36(7): 2251-2252, 2020 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-31742321

RESUMO

SUMMARY: KofamKOALA is a web server to assign KEGG Orthologs (KOs) to protein sequences by homology search against a database of profile hidden Markov models (KOfam) with pre-computed adaptive score thresholds. KofamKOALA is faster than existing KO assignment tools with its accuracy being comparable to the best performing tools. Function annotation by KofamKOALA helps linking genes to KEGG resources such as the KEGG pathway maps and facilitates molecular network reconstruction. AVAILABILITY AND IMPLEMENTATION: KofamKOALA, KofamScan and KOfam are freely available from GenomeNet (https://www.genome.jp/tools/kofamkoala/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Computadores , Sequência de Aminoácidos , Bases de Dados Factuais
7.
Protein Sci ; 28(11): 1947-1951, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31441146

RESUMO

In this era of high-throughput biology, bioinformatics has become a major discipline for making sense out of large-scale datasets. Bioinformatics is usually considered as a practical field developing databases and software tools for supporting other fields, rather than a fundamental scientific discipline for uncovering principles of biology. The KEGG resource that we have been developing is a reference knowledge base for biological interpretation of genome sequences and other high-throughput data. It is now one of the most utilized biological databases because of its practical values. For me personally, KEGG is a step toward understanding the origin and evolution of cellular organisms.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Ensaios de Triagem em Larga Escala , Humanos , Software
8.
Nucleic Acids Res ; 47(D1): D590-D595, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30321428

RESUMO

KEGG (Kyoto Encyclopedia of Genes and Genomes; https://www.kegg.jp/ or https://www.genome.jp/kegg/) is a reference knowledge base for biological interpretation of genome sequences and other high-throughput data. It is an integrated database consisting of three generic categories of systems information, genomic information and chemical information, and an additional human-specific category of health information. KEGG pathway maps, BRITE hierarchies and KEGG modules have been developed as generic molecular networks with KEGG Orthology nodes of functional orthologs so that KEGG pathway mapping and other procedures can be applied to any cellular organism. Unfortunately, however, this generic approach was inadequate for knowledge representation in the health information category, where variations of human genomes, especially disease-related variations, had to be considered. Thus, we have introduced a new approach where human gene variants are explicitly incorporated into what we call 'network variants' in the recently released KEGG NETWORK database. This allows accumulation of knowledge about disease-related perturbed molecular networks caused not only by gene variants, but also by viruses and other pathogens, environmental factors and drugs. We expect that KEGG NETWORK will become another reference knowledge base for the basic understanding of disease mechanisms and practical use in clinical sequencing and drug development.


Assuntos
Bases de Dados Genéticas , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Genoma , Humanos , Software
9.
Methods Mol Biol ; 1807: 225-239, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30030815

RESUMO

The KEGG database is widely used as a reference knowledge base for biological interpretation of genome sequences and other high-throughput data. It contains, among others, KEGG pathway maps and BRITE hierarchies (ontologies) representing high-level systemic functions of the cell and the organism. By the processes called pathway mapping and BRITE mapping, information encoded in the genome, especially the repertoire of genes, is converted to such high-level functional information. This general methodology can be applied to microbial genomes to infer antimicrobial resistance (AMR), which is becoming an increasingly serious threat to the global public health. Here we present how knowledge on AMR is accumulated in the KEGG Pathogen resource and how such knowledge can be utilized by BlastKOALA and other web tools.


Assuntos
Antibacterianos/farmacologia , Bases de Dados Genéticas , Farmacorresistência Bacteriana/genética , Genoma Bacteriano , Carbapenêmicos/farmacologia , Farmacorresistência Bacteriana/efeitos dos fármacos , Filogenia , beta-Lactamases/metabolismo , beta-Lactamas/farmacologia
10.
Methods Mol Biol ; 1611: 135-145, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28451977

RESUMO

KEGG is an integrated database resource for linking sequences to biological functions from molecular to higher levels. Knowledge on molecular functions is stored in the KO (KEGG Orthology) database, while cellular- and organism-level functions are represented in the PATHWAY and MODULE databases. Genes in the complete genomes, which are stored in the GENES database, are given KO identifiers by the internal annotation procedure, enabling reconstruction of KEGG pathways and modules for interpretation of higher-level functions. This is possible because all the KEGG pathways and modules are represented as networks of KO nodes. Here we present knowledge-based prediction methods for functional characterization of amino acid sequences using the KEGG resource. Specifically we show how the tools available at the KEGG website including BlastKOALA and KEGG Mapper can be utilized for enzyme annotation and metabolic reconstruction.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos
11.
PLoS One ; 12(4): e0176530, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28445522

RESUMO

Genome-wide scans for positive selection have become important for genomic medicine, and many studies aim to find genomic regions affected by positive selection that are associated with risk allele variations among populations. Most such studies are designed to detect recent positive selection. However, we hypothesize that ancient positive selection is also important for adaptation to pathogens, and has affected current immune-mediated common diseases. Based on this hypothesis, we developed a novel linkage disequilibrium-based pipeline, which aims to detect regions associated with ancient positive selection across populations from single nucleotide polymorphism (SNP) data. By applying this pipeline to the genotypes in the International HapMap project database, we show that genes in the detected regions are enriched in pathways related to the immune system and infectious diseases. The detected regions also contain SNPs reported to be associated with cancers and metabolic diseases, obesity-related traits, type 2 diabetes, and allergic sensitization. These SNPs were further mapped to biological pathways to determine the associations between phenotypes and molecular functions. Assessments of candidate regions to identify functions associated with variations in incidence rates of these diseases are needed in the future.


Assuntos
Genoma Humano , Estudo de Associação Genômica Ampla , Bases de Dados Genéticas , Genética Populacional , Genótipo , Projeto HapMap , Haplótipos , Humanos , Desequilíbrio de Ligação , Doenças Metabólicas/genética , Doenças Metabólicas/patologia , Método de Monte Carlo , Família Multigênica , Neoplasias/genética , Neoplasias/patologia , Doenças Neurodegenerativas/genética , Doenças Neurodegenerativas/patologia , Fenótipo , Polimorfismo de Nucleotídeo Único
12.
Nucleic Acids Res ; 45(D1): D353-D361, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899662

RESUMO

KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an encyclopedia of genes and genomes. Assigning functional meanings to genes and genomes both at the molecular and higher levels is the primary objective of the KEGG database project. Molecular-level functions are stored in the KO (KEGG Orthology) database, where each KO is defined as a functional ortholog of genes and proteins. Higher-level functions are represented by networks of molecular interactions, reactions and relations in the forms of KEGG pathway maps, BRITE hierarchies and KEGG modules. In the past the KO database was developed for the purpose of defining nodes of molecular networks, but now the content has been expanded and the quality improved irrespective of whether or not the KOs appear in the three molecular network databases. The newly introduced addendum category of the GENES database is a collection of individual proteins whose functions are experimentally characterized and from which an increasing number of KOs are defined. Furthermore, the DISEASE and DRUG databases have been improved by systematic analysis of drug labels for better integration of diseases and drugs with the KEGG molecular networks. KEGG is moving towards becoming a comprehensive knowledge base for both functional interpretation and practical application of genomic information.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Descoberta de Drogas , Redes e Vias Metabólicas , Navegador
13.
J Chem Inf Model ; 56(3): 510-6, 2016 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-26822930

RESUMO

Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies that estimate the number of candidate enzyme genes, these studies required some additional information aside from the structures of metabolites such as gene expression and order in the genome. In this study, we developed a novel method to identify a candidate enzyme gene of a reaction using the chemical structures of the substrate-product pair (reactant pair). The proposed method is based on a search for similar reactant pairs in a reference database and offers ortholog groups that possibly mediate the given reaction. We applied the proposed method to two experimentally validated reactions. As a result, we confirmed that the histidine transaminase was correctly identified. Although our method could not directly identify the asparagine oxo-acid transaminase, we successfully found the paralog gene most similar to the correct enzyme gene. We also applied our method to infer candidate enzyme genes in the mesaconate pathway. The advantage of our method lies in the prediction of possible genes for orphan enzyme reactions where any associated gene sequences are not determined yet. We believe that this approach will facilitate experimental identification of genes for orphan enzymes.


Assuntos
Enzimas/genética , Bases de Dados de Proteínas , Enzimas/metabolismo , Especificidade por Substrato
14.
Methods Mol Biol ; 1374: 55-70, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26519400

RESUMO

In the era of high-throughput biology it is necessary to develop not only elaborate computational methods but also well-curated databases that can be used as reference for data interpretation. KEGG ( http://www.kegg.jp/ ) is such a reference knowledge base with two specific aims. One is to compile knowledge on high-level functions of the cell and the organism in terms of the molecular interaction and reaction networks, which is implemented in KEGG pathway maps, BRITE functional hierarchies, and KEGG modules. The other is to expand knowledge on genes and proteins involved in the molecular networks from experimentally observed organisms to other organisms using the concept of orthologs, which is implemented in the KEGG Orthology (KO) system. Thus, KEGG is a generic resource applicable to all organisms and enables interpretation of high-level functions from genomic and molecular data. Here we first present a brief overview of the entire KEGG resource, and then give an introduction of how to use KEGG in plant genomics and metabolomics research.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Metabolômica/métodos , Plantas/genética , Plantas/metabolismo , Bases de Dados Genéticas , Navegador
15.
J Mol Biol ; 428(4): 726-731, 2016 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-26585406

RESUMO

BlastKOALA and GhostKOALA are automatic annotation servers for genome and metagenome sequences, which perform KO (KEGG Orthology) assignments to characterize individual gene functions and reconstruct KEGG pathways, BRITE hierarchies and KEGG modules to infer high-level functions of the organism or the ecosystem. Both servers are made freely available at the KEGG Web site (http://www.kegg.jp/blastkoala/). In BlastKOALA, the KO assignment is performed by a modified version of the internally used KOALA algorithm after the BLAST search against a non-redundant dataset of pangenome sequences at the species, genus or family level, which is generated from the KEGG GENES database by retaining the KO content of each taxonomic category. In GhostKOALA, which utilizes more rapid GHOSTX for database search and is suitable for metagenome annotation, the pangenome dataset is supplemented with Cd-hit clusters including those for viral genes. The result files may be downloaded and manipulated for further KEGG Mapper analysis, such as comparative pathway analysis using multiple BlastKOALA results.


Assuntos
Biologia Computacional/métodos , Genoma , Metagenoma , Análise de Sequência de DNA/métodos , Internet
16.
Nucleic Acids Res ; 44(D1): D457-62, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26476454

RESUMO

KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, representing high-level functions of the cell and the organism. Currently, more than 4000 complete genomes are annotated with KOs in the KEGG GENES database, which can be used as a reference data set for KO assignment and subsequent reconstruction of KEGG pathways and other molecular networks. As an annotation resource, the following improvements have been made. First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization. Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes. Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database. As a resource for translational bioinformatics, various data sets are created for antimicrobial resistance and drug interaction networks.


Assuntos
Sequência de Aminoácidos , Bases de Dados Genéticas , Genes , Anotação de Sequência Molecular , Resistência Microbiana a Medicamentos , Genoma , Redes e Vias Metabólicas , Plasmídeos/genética , Proteínas/genética , Vírus/genética
18.
Methods Mol Biol ; 1273: 97-107, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25753705

RESUMO

This chapter describes the KEGG GLYCAN database of the KEGG resource, including descriptions of links to the other databases in KEGG. In particular, KEGG GLYCAN consists of glycan structures, with links to glycogenes, orthologs, reactions, pathways, drugs, diseases, and others, all within the KEGG resources. A number of analytical tools are also available, including the composite structure map (CSM), KegDraw, KCam, and GECS. These databases and tools will be described along with simple examples of their usage.


Assuntos
Bases de Dados Factuais , Glicômica/métodos , Polissacarídeos/química , Sequência de Carboidratos , Expressão Gênica
19.
J Bioinform Comput Biol ; 12(6): 1442001, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25385078

RESUMO

Genomics is faced with the issue of many partially annotated putative enzyme-encoding genes for which activities have not yet been verified, while metabolomics is faced with the issue of many putative enzyme reactions for which full equations have not been verified. Knowledge of enzymes has been collected by IUBMB, and has been made public as the Enzyme List. To date, however, the terminology of the Enzyme List has not been assessed comprehensively by bioinformatics studies. Instead, most of the bioinformatics studies simply use the identifiers of the enzymes, i.e. the Enzyme Commission (EC) numbers. We investigated the actual usage of terminology throughout the Enzyme List, and demonstrated that the partial characteristics of reactions cannot be retrieved by simply using EC numbers. Thus, we developed a novel ontology, named PIERO, for annotating biochemical transformations as follows. First, the terminology describing enzymatic reactions was retrieved from the Enzyme List, and was grouped into those related to overall reactions and biochemical transformations. Consequently, these terms were mapped onto the actual transformations taken from enzymatic reaction equations. This ontology was linked to Gene Ontology (GO) and EC numbers, allowing the extraction of common partial reaction characteristics from given sets of orthologous genes and the elucidation of possible enzymes from the given transformations. Further future development of the PIERO ontology should enhance the Enzyme List to promote the integration of genomics and metabolomics.


Assuntos
Ontologias Biológicas , Bases de Dados de Proteínas , Enzimas/química , Enzimas/classificação , Armazenamento e Recuperação da Informação/métodos , Terminologia como Assunto , Enzimas/genética , Processamento de Linguagem Natural
20.
Nucleic Acids Res ; 42(Web Server issue): W39-45, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24838565

RESUMO

DINIES (drug-target interaction network inference engine based on supervised analysis) is a web server for predicting unknown drug-target interaction networks from various types of biological data (e.g. chemical structures, drug side effects, amino acid sequences and protein domains) in the framework of supervised network inference. The originality of DINIES lies in prediction with state-of-the-art machine learning methods, in the integration of heterogeneous biological data and in compatibility with the KEGG database. The DINIES server accepts any 'profiles' or precalculated similarity matrices (or 'kernels') of drugs and target proteins in tab-delimited file format. When a training data set is submitted to learn a predictive model, users can select either known interaction information in the KEGG DRUG database or their own interaction data. The user can also select an algorithm for supervised network inference, select various parameters in the method and specify weights for heterogeneous data integration. The server can provide integrative analyses with useful components in KEGG, such as biological pathways, functional hierarchy and human diseases. DINIES (http://www.genome.jp/tools/dinies/) is publicly available as one of the genome analysis tools in GenomeNet.


Assuntos
Inteligência Artificial , Descoberta de Drogas , Proteínas/química , Software , Algoritmos , Humanos , Internet , Preparações Farmacêuticas/química , Estrutura Terciária de Proteína , Proteínas/efeitos dos fármacos , Análise de Sequência de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...