Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
Int J Hematol ; 112(4): 535-543, 2020 Oct.
Article in English | MEDLINE | ID: mdl-32683598

ABSTRACT

A hemoglobin (Hb) threshold level of 7 g/dL has been proposed for red blood cell (RBC) transfusion in patients with chronic anemia in the Japanese guideline since 2005. However, Hb thresholds for hematological diseases in clinical practice and factors responsible for higher Hb thresholds remain unclear. Hb thresholds were collected for patients with hematological diseases from 32 Japanese teaching hospitals. Uni- and multivariate analyses were used to analyze relationships between Hb threshold level and various patient and hospital factors. In total, 4996 units of RBC were transfused to 1054 patients with hematological diseases in 2421 transfusions. Median age was 68 years. Myelodysplastic syndrome was the most frequent diagnosis. Overall median Hb threshold level was 6.9 g/dL. Multivariate linear regression analysis detected the following variables associated with Hb threshold level: hospital; cardiovascular disease; symptomatic anemia; and hematopoietic stem cell transplantation. Hospital was the most significant factor. Collectively, median Hb threshold level in clinical practice in Japan was similar to the guidelines. Higher Hb threshold level depended on the hospitals at which the transfusions were performed as well as patient condition. Educational approaches directed toward hospitals may be useful to promote transfusion guidelines.


Subject(s)
Erythrocyte Transfusion/standards , Hematologic Diseases/blood , Hemoglobins , Hospitals, Teaching , Aged , Differential Threshold , Female , Humans , Japan , Male , Middle Aged , Multivariate Analysis , Myelodysplastic Syndromes , Practice Guidelines as Topic , Surveys and Questionnaires
2.
Nucleic Acids Res ; 41(Database issue): D353-7, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23193276

ABSTRACT

The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and 129 archaea). The OCs were constructed by applying the quasi-clique-based clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. It is computationally efficient to calculate OCs, which enables to regularly update the contents. KEGG OC has the following two features: (i) It consists of all complete genomes of a wide variety of organisms from three domains of life, and the number of organisms is the largest among the existing databases; and (ii) It is compatible with the KEGG database by sharing the same sets of genes and identifiers, which leads to seamless integration of OCs with useful components in KEGG such as biological pathways, pathway modules, functional hierarchy, diseases and drugs. The KEGG OC resources are accessible via OC Viewer that provides an interactive visualization of OCs at different taxonomic levels.


Subject(s)
Databases, Genetic , Genes, Archaeal , Genes, Bacterial , Genes , Algorithms , Classification/methods , Cluster Analysis , Eukaryota/genetics , Genome, Archaeal , Genome, Bacterial , Genomics/methods , Internet , Sequence Homology, Amino Acid
3.
Nat Commun ; 3: 1203, 2012.
Article in English | MEDLINE | ID: mdl-23149747

ABSTRACT

Microbial ecologists have investigated roles of species richness and diversity in a wide variety of ecosystems. Recently, metagenomics have been developed to measure functions in ecosystems, but this approach is cost-intensive. Here we describe a novel method for the rapid and efficient reconstruction of a virtual metagenome in environmental microbial communities without using large-scale genomic sequencing. We demonstrate this approach using 16S rRNA gene sequences obtained from denaturing gradient gel electrophoresis analysis, mapped to fully sequenced genomes, to reconstruct virtual metagenome-like organizations. Furthermore, we validate a virtual metagenome using a published metagenome for cocoa bean fermentation samples, and show that metagenomes reconstructed from biofilm formation samples allow for the study of the gene pool dynamics that are necessary for biofilm growth.


Subject(s)
Metagenome/genetics , Metagenomics/methods , RNA, Ribosomal, 16S/genetics , User-Computer Interface , Base Sequence , Biofilms/growth & development , Cacao/genetics , Computational Biology , Denaturing Gradient Gel Electrophoresis , Fermentation/genetics , Molecular Sequence Data , Reproducibility of Results , Sequence Analysis, DNA , Sequence Homology, Nucleic Acid
4.
J Biomed Semantics ; 2: 4, 2011 Aug 02.
Article in English | MEDLINE | ID: mdl-21806842

ABSTRACT

BACKGROUND: The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. RESULTS: Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. CONCLUSIONS: Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.

5.
Bioinformatics ; 25(7): 958-9, 2009 Apr 01.
Article in English | MEDLINE | ID: mdl-19218352

ABSTRACT

SUMMARY: Comparative approach is one of the most essential methods for extracting functional and evolutionary information from genomic sequences. So far, a number of sequence comparison tools have been developed, and most are either for on-site use, requiring program installation but providing a wide variety of analyses, or for the online search of user's sequences against given databases on a server. We newly devised an Asynchronous JavaScript and XML (Ajax)-based system for comparative genomic analyses, CGAS, with highly interactive interface within a browser, requiring no software installation. The current version, CGAS version 1, provides functionality for viewing similarity relationships between user's sequences, including a multiple dot plot between sequences with their annotation information. The scrollbar-less 'draggable' interface of CGAS is implemented with Google Maps API version 2. The annotation information associated with the genomic sequences compared is synchronously displayed with the comparison view. The multiple-comparison viewer is one of the unique functionalities of this system to allow the users to compare the differences between different pairs of sequences. In this viewer, the system tells orthologous correspondences between the sequences compared interactively. This web-based tool is platform-independent and will provide biologists having no computational skills with opportunities to analyze their own data without software installation and customization of the computer system. AVAILABILITY AND IMPLEMENTATION: CGAS is available at http://cgas.ist.hokudai.ac.jp/.


Subject(s)
Genome , Genomics/methods , Software , Comparative Genomic Hybridization , Internet , User-Computer Interface
6.
Nucleic Acids Res ; 36(Web Server issue): W423-6, 2008 Jul 01.
Article in English | MEDLINE | ID: mdl-18477636

ABSTRACT

KEGG Atlas is a new graphical interface to the KEGG suite of databases, especially to the systems information in the PATHWAY and BRITE databases. It currently consists of a single global map and an associated viewer for metabolism, covering about 120 KEGG metabolic pathway maps and about 10 BRITE hierarchies. The viewer allows the user to navigate and zoom the global map under the Ajax technology. The mapping of high-throughput experimental data onto the global map is the main use of KEGG Atlas. In the global metabolism map, the node (circle) is a chemical compound and the edge (line) is a set of reactions linked to a set of KEGG Orthology (KO) entries for enzyme genes. Once gene identifiers in different organisms are converted to the K number identifiers in the KO system, corresponding line segments can be highlighted in the global map, allowing the user to view genome sequence data as organism-specific pathways, gene expression data as up- or down-regulated pathways, etc. Once chemical compounds are converted to the C number identifiers in KEGG, metabolomics data can also be displayed in the global map. KEGG Atlas is available at http://www.genome.jp/kegg/atlas/.


Subject(s)
Metabolic Networks and Pathways , Software , Computer Graphics , Databases, Factual , Genomics , Internet , Metabolic Networks and Pathways/genetics
7.
Nucleic Acids Res ; 36(Database issue): D480-4, 2008 Jan.
Article in English | MEDLINE | ID: mdl-18077471

ABSTRACT

KEGG (http://www.genome.jp/kegg/) is a database of biological systems that integrates genomic, chemical and systemic functional information. KEGG provides a reference knowledge base for linking genomes to life through the process of PATHWAY mapping, which is to map, for example, a genomic or transcriptomic content of genes to KEGG reference pathways to infer systemic behaviors of the cell or the organism. In addition, KEGG provides a reference knowledge base for linking genomes to the environment, such as for the analysis of drug-target relationships, through the process of BRITE mapping. KEGG BRITE is an ontology database representing functional hierarchies of various biological objects, including molecules, cells, organisms, diseases and drugs, as well as relationships among them. KEGG PATHWAY is now supplemented with a new global map of metabolic pathways, which is essentially a combined map of about 120 existing pathway maps. In addition, smaller pathway modules are defined and stored in KEGG MODULE that also contains other functional units and complexes. The KEGG resource is being expanded to suit the needs for practical applications. KEGG DRUG contains all approved drugs in the US and Japan, and KEGG DISEASE is a new database linking disease genes, pathways, drugs and diagnostic markers.


Subject(s)
Databases, Factual , Genomics , Systems Biology , Disease , Humans , Internet , Metabolic Networks and Pathways , Molecular Structure , Pharmaceutical Preparations/chemistry , Systems Integration , User-Computer Interface
8.
Genome Inform ; 20: 252-9, 2008.
Article in English | MEDLINE | ID: mdl-19425139

ABSTRACT

Harmful effects associated with use of drugs are caused as a result of their side effects and combined use of different drugs. These drug interactions result in increased or decreased drug effects, or produce other new unwanted effects and are serious problems for medical institutions and pharmaceutical companies. In this study, we created a drug-drug interaction network from drug package inserts and characterized drug interactions. The known information about the potential risk of drug interactions is described in drug package inserts. Japanese drug package inserts are stored in the JAPIC (Japan Pharmaceutical Information Center) database and GenomeNet provides the GenomeNet pharmaceutical products database, which integrate the JAPIC and KEGG databases. We extracted drug interaction data from GenomeNet, where interactions are classified according to risks, contraindications or cautions for coadministration, and some entries include information about enzymes metabolizing the drugs. We defined drug target and drug-metabolizing enzymes as interaction factors using information on them in KEGG DRUG, and classified drugs into pharmacological/chemical subgroups. In the resulting drug-drug interaction network, the drugs that are associated with the same interaction factors are closely interconnected. Mechanisms of these interactions were then identified by each interaction factor. To characterize other interactions without interaction factors, we used the ATC classification system and found an association between interaction mechanisms and pharmacological/chemical subgroups.


Subject(s)
Drug-Related Side Effects and Adverse Reactions/epidemiology , Cytochrome P-450 CYP3A/metabolism , Databases, Factual , Drug Interactions , Drug Therapy, Combination , Humans , Pharmaceutical Preparations/chemistry , Receptors, Biogenic Amine/drug effects
10.
Genome Biol ; 8(6): R121, 2007.
Article in English | MEDLINE | ID: mdl-17588271

ABSTRACT

BACKGROUND: In higher multicellular eukaryotes, complex protein domain combinations contribute to various cellular functions such as regulation of intercellular or intracellular signaling and interactions. To elucidate the characteristics and evolutionary mechanisms that underlie such domain combinations, it is essential to examine the different types of domains and their combinations among different groups of eukaryotes. RESULTS: We observed a large number of group-specific domain combinations in animals, especially in vertebrates. Examples include animal-specific combinations in tyrosine phosphorylation systems and vertebrate-specific combinations in complement and coagulation cascades. These systems apparently underwent extensive evolution in the ancestors of these groups. In extant animals, especially in vertebrates, animal-specific domains have greater connectivity than do other domains on average, and contribute to the varying number of combinations in each animal subgroup. In other groups, the connectivities of older domains were greater on average. To observe the global behavior of domain combinations during evolution, we traced the changes in domain combinations among animals and fungi in a network analysis. Our results indicate that there is a correlation between the differences in domain combinations among different phylogenetic groups and different global behaviors. CONCLUSION: Rapid emergence of animal-specific domains was observed in animals, contributing to specific domain combinations and functional diversification, but no such trends were observed in other clades of eukaryotes. We therefore suggest that the strategy for achieving complex multicellular systems in animals differs from that of other eukaryotes.


Subject(s)
Evolution, Molecular , Proteins/chemistry , Proteins/genetics , Animals , Eukaryotic Cells/chemistry , Eukaryotic Cells/metabolism , Humans , Phylogeny , Prokaryotic Cells/chemistry , Prokaryotic Cells/metabolism , Protein Structure, Tertiary , Proteome
11.
Nucleic Acids Res ; 35(Web Server issue): W182-5, 2007 Jul.
Article in English | MEDLINE | ID: mdl-17526522

ABSTRACT

The number of complete and draft genomes is rapidly growing in recent years, and it has become increasingly important to automate the identification of functional properties and biological roles of genes in these genomes. In the KEGG database, genes in complete genomes are annotated with the KEGG orthology (KO) identifiers, or the K numbers, based on the best hit information using Smith-Waterman scores as well as by the manual curation. Each K number represents an ortholog group of genes, and it is directly linked to an object in the KEGG pathway map or the BRITE functional hierarchy. Here, we have developed a web-based server called KAAS (KEGG Automatic Annotation Server: http://www.genome.jp/kegg/kaas/) i.e. an implementation of a rapid method to automatically assign K numbers to genes in the genome, enabling reconstruction of KEGG pathways and BRITE hierarchies. The method is based on sequence similarities, bi-directional best hit information and some heuristics, and has achieved a high degree of accuracy when compared with the manually curated KEGG GENES database.


Subject(s)
Chromosome Mapping/methods , Computational Biology/methods , Documentation/methods , Genome , Proteome/classification , Proteome/metabolism , Sequence Analysis/methods , Signal Transduction/physiology , Vocabulary, Controlled , Animals , Artificial Intelligence , Automation , Database Management Systems , Humans , Information Storage and Retrieval/methods , Internet
12.
Traffic ; 7(8): 1104-18, 2006 Aug.
Article in English | MEDLINE | ID: mdl-16882042

ABSTRACT

The SNARE proteins are required for membrane fusion during intracellular vesicular transport and for its specificity. Only the unique combination of SNARE proteins (cognates) can be bound and can lead to membrane fusion, although the characteristics of the possible specificity of the binding combinations encoded in the SNARE sequences have not yet been determined. We discovered by whole genome sequence analysis that sequence motifs (conserved sequences) in the SNARE motif domains for each protein group correspond to localization sites or transport pathways. We claim that these motifs reflect the specificity of the binding combinations of SNARE motif domains. Using these motifs, we could classify SNARE proteins from 48 organisms into their localization sites or transport pathways. The classification result shows that more than 10 SNARE subgroups are kingdom specific and that the SNARE paralogs involved in the plasma membrane-related transport pathways have developed greater variations in higher animals and higher plants than those involved in the endoplasmic reticulum-related transport pathways throughout eukaryotic evolution.


Subject(s)
Amino Acid Motifs , Phylogeny , SNARE Proteins/physiology , Cluster Analysis , Protein Transport , SNARE Proteins/chemistry
13.
Nucleic Acids Res ; 34(Web Server issue): W459-62, 2006 Jul 01.
Article in English | MEDLINE | ID: mdl-16845049

ABSTRACT

Expressed sequence tag (EST) sequencing has proven to be an economically feasible alternative for gene discovery in species lacking a draft genome sequence. Ongoing large-scale EST sequencing projects feel the need for bioinformatics tools to facilitate uniform EST handling. This brings about a renewed importance for a universal tool for processing and functional annotation of large sets of ESTs. EGassembler (http://egassembler.hgc.jp/) is a web server, which provides an automated as well as a user-customized analysis tool for cleaning, repeat masking, vector trimming, organelle masking, clustering and assembling of ESTs and genomic fragments. The web server is publicly available and provides the community a unique all-in-one online application web service for large-scale ESTs and genomic DNA clustering and assembling. Running on a Sun Fire 15K supercomputer, a significantly large volume of data can be processed in a short period of time. The results can be used to functionally annotate genes, to facilitate splice alignment analysis, to link the transcripts to genetic and physical maps, design microarray chips, to perform transcriptome analysis and to map to KEGG metabolic pathways. The service provides an excellent bioinformatics tool to research groups in wet-lab as well as an all-in-one-tool for sequence handling to bioinformatics researchers.


Subject(s)
Computational Biology/methods , Expressed Sequence Tags , Genomics/methods , Software , Internet , Sequence Analysis, DNA , User-Computer Interface
14.
Nucleic Acids Res ; 34(Database issue): D354-7, 2006 Jan 01.
Article in English | MEDLINE | ID: mdl-16381885

ABSTRACT

The increasing amount of genomic and molecular information is the basis for understanding higher-order biological systems, such as the cell and the organism, and their interactions with the environment, as well as for medical, industrial and other practical applications. The KEGG resource (http://www.genome.jp/kegg/) provides a reference knowledge base for linking genomes to biological systems, categorized as building blocks in the genomic space (KEGG GENES) and the chemical space (KEGG LIGAND), and wiring diagrams of interaction networks and reaction networks (KEGG PATHWAY). A fourth component, KEGG BRITE, has been formally added to the KEGG suite of databases. This reflects our attempt to computerize functional interpretations as part of the pathway reconstruction process based on the hierarchically structured knowledge about the genomic, chemical and network spaces. In accordance with the new chemical genomics initiatives, the scope of KEGG LIGAND has been significantly expanded to cover both endogenous and exogenous molecules. Specifically, RPAIR contains curated chemical structure transformation patterns extracted from known enzymatic reactions, which would enable analysis of genome-environment interactions, such as the prediction of new reactions and new enzyme genes that would degrade new environmental compounds. Additionally, drug information is now stored separately and linked to new KEGG DRUG structure maps.


Subject(s)
Biotransformation , Chemistry , Databases, Factual , Databases, Genetic , Genomics , Chemical Phenomena , Environment , Enzymes/chemistry , Enzymes/genetics , Humans , Internet , Ligands , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/classification , Signal Transduction , Systems Integration , User-Computer Interface
15.
Genome Inform ; 17(1): 230-9, 2006.
Article in English | MEDLINE | ID: mdl-17503372

ABSTRACT

Recent evidence points to the existence of scale-free properties in many biological networks. By topological analysis, several models including preferential attachment and hierarchical modules have been proposed to explain how these networks are organized. On the other hand, analyses using dynamics have suggested that gene expression and metabolic networks have been organized with the scale-free property by the other models such as "rich-travel-more" and "log-normal dynamics." Because most of these approaches are based on comparative genomics of extant species, and did not consider evolutionary events such as horizontal gene transfer, gene loss and gene gain, we have analyzed transition of metabolic networks from the vertical point of view of evolution. First, to identify metabolic networks of common ancestors, we applied a parsimony algorithm for the enzymatic reaction set. Then by comparing the estimated metabolic networks among common ancestors, we investigated the transition of metabolic networks along the evolutionary process. As a result, we estimated enzymatic reaction contents of 227 common ancestors from 228 extant species, and found that links of several specific metabolites have frequently changed during the course of evolution.


Subject(s)
Eukaryotic Cells/metabolism , Metabolic Networks and Pathways/physiology , Models, Biological , Prokaryotic Cells/metabolism , Animals , Escherichia coli/metabolism , Evolution, Molecular , Methanococcaceae/metabolism , Phylogeny , Species Specificity
16.
Bioinformatics ; 21(7): 912-21, 2005 Apr 01.
Article in English | MEDLINE | ID: mdl-15509606

ABSTRACT

MOTIVATION: It is widely recognized that homology search and ortholog clustering are very useful for analyzing biological sequences. However, recent growth of sequence database size makes homolog detection difficult, and rapid and accurate methods are required. RESULTS: We present a novel method for fast and accurate homology detection, assuming that the Smith-Waterman (SW) scores between all similar sequence pairs in a target database are computed and stored. In this method, SW alignment is computed only if the upper bound, which is derived from our novel inequality, is higher than the given threshold. In contrast to other methods such as FASTA and BLAST, this method is guaranteed to find all sequences whose scores against the query are higher than the specified threshold. Results of computational experiments suggest that the method is dozens of times faster than SSEARCH if genome sequence data of closely related species are available.


Subject(s)
Algorithms , Databases, Protein , Information Storage and Retrieval/methods , Proteins/chemistry , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Amino Acid Sequence , Database Management Systems , Molecular Sequence Data , Proteins/analysis , Software
17.
Genome Inform ; 15(2): 266-75, 2004.
Article in English | MEDLINE | ID: mdl-15706512

ABSTRACT

We have studied the projection of protein family data onto single bacterial translated genome as a solution to visualise relationships between families restricted to bacterial sequences. Any member of any type of family as defined in the Pfam database (domains, signatures, etc.) is considered as a protein module. Our first goal is to discover rules correlating the occurrence of modules with biochemical properties. To achieve this goal we have developed a platform to quantify information found in protein databases and to support the analysis of the nature of modules, their position and corresponding frequencies of occurrence (in isolation or in combination) in association with pathway knowledge as found in KEGG. This paper focuses on two pathways: the two-component system and the aminophosphonate metabolism, that are partially but not completely documented. Proteins involved in those pathways were listed separately in each organism to analyse module composition and rules constraining pathway interactions were identified. It is shown how these results can be used to update KEGG pathways and orthologue tables.


Subject(s)
Databases, Genetic , Databases, Protein , Genome , Proteins , Animals , Computational Biology , Computer Graphics , Gene Expression Profiling , Humans , Information Storage and Retrieval , Multigene Family , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Sequence Homology
18.
Genome Inform ; 15(1): 93-104, 2004.
Article in English | MEDLINE | ID: mdl-15712113

ABSTRACT

Homology data are among the most important information used to predict the functions of unknown proteins and thus fast and accurate methods are needed. In this paper, we propose a new approach for fast and accurate homology search using pre-computed all-against-all similarity scores in a target database. We previously developed a method for derivation of an upper bound of the Smith-Waterman score (SW-score) between a query and a homolog candidate sequence using the SW-score between the candidate and a sequence similar to the query. In this paper, by using this upper bound, we first cluster the sequences in the target database so that upper bounds of SW-scores for all the members in the clusters are less than a given value and select representative sequences for respective clusters. Then, the query sequence is searched against the representative sequences and the upper bounds of SW-scores for respective clusters are estimated. Only if the upper bound is higher than a given threshold, SW-alignments are computed for all the sequences in the cluster. We performed computational experiments to test efficiency of the proposed method for the KEGG/GENES database using the KEGG/SSDB. The results suggest that our method is efficient for redundant databases that include multiple closely related species.


Subject(s)
Databases, Factual , Sequence Homology, Nucleic Acid , Algorithms , Base Sequence , Cluster Analysis , Databases, Nucleic Acid , Escherichia coli/classification , Escherichia coli/genetics , Models, Genetic , Phylogeny , Salmonella/classification , Salmonella/genetics , Shigella flexneri/classification , Shigella flexneri/genetics , Templates, Genetic
19.
Genome Inform ; 13: 61-70, 2002.
Article in English | MEDLINE | ID: mdl-14571375

ABSTRACT

In recent years, the analysis of orthologous genes based on phylogenetic profiles has received popularity in bioinfomatics. We propose a new method to extract organism groups and their hierarchy from phylogenetic profiles using the independent component analysis (ICA). The method involves first finding independent axes in the projected space from the multivariate data matrix representing phylogenetic profiles for a number of orthologous genes. Then the extracted axes are correlated with major organism groups, according to the extent of affiliation of axes scores for all the genes to specific organisms. The ICA was applied to the phylogenetic profiles created for 2,875 orthologs in 77 organisms by using the KEGG/GENES database. The 9 extracted components out of 18 predefined components well represented the organism groups as categorized in KEGG. Furthermore, we performed the cluster analysis and obtained the hierarchy of organism groups.


Subject(s)
Computational Biology/methods , Data Interpretation, Statistical , Phylogeny , Sequence Analysis, DNA/methods , Animals , Genome , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...