Search | VHL Regional Portal

H-InvDB in 2009: extended database and data mining resources for human genes and transcripts.

Yamasaki, Chisato; Murakami, Katsuhiko; Takeda, Jun-ichi; Sato, Yoshiharu; Noda, Akiko; Sakate, Ryuichi; Habara, Takuya; Nakaoka, Hajime; Todokoro, Fusano; Matsuya, Akihiro; Imanishi, Tadashi; Gojobori, Takashi.

Nucleic Acids Res ; 38(Database issue): D626-32, 2010 Jan.

Article in English | MEDLINE | ID: mdl-19933760

ABSTRACT

We report the extended database and data mining resources newly released in the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). H-InvDB is a comprehensive annotation resource of human genes and transcripts, and consists of two main views and six sub-databases. The latest release of H-InvDB (release 6.2) provides the annotation for 219,765 human transcripts in 43,159 human gene clusters based on human full-length cDNAs and mRNAs. H-InvDB now provides several new annotation features, such as mapping of microarray probes, new gene models, relation to known ncRNAs and information from the Glycogene database. H-InvDB also provides useful data mining resources-'Navigation search', 'H-InvDB Enrichment Analysis Tool (HEAT)' and web service APIs. 'Navigation search' is an extended search system that enables complicated searches by combining 16 different search options. HEAT is a data mining tool for automatically identifying features specific to a given human gene set. HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set, as compared with the entire H-InvDB representative transcripts. H-InvDB now has web service APIs of SOAP and REST to allow the use of H-InvDB data in programs, providing the users extended data accessibility.

Subject(s)

Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Multigene Family , Computational Biology/trends , DNA, Complementary/metabolism , Genome, Human , Humans , Information Storage and Retrieval/methods , Internet , Models, Genetic , Oligonucleotide Array Sequence Analysis , Protein Structure, Tertiary , RNA, Messenger/metabolism , Software , User-Computer Interface

The Rice Annotation Project Database (RAP-DB): 2008 update.

Tanaka, Tsuyoshi; Antonio, Baltazar A; Kikuchi, Shoshi; Matsumoto, Takashi; Nagamura, Yoshiaki; Numa, Hisataka; Sakai, Hiroaki; Wu, Jianzhong; Itoh, Takeshi; Sasaki, Takuji; Aono, Ryo; Fujii, Yasuyuki; Habara, Takuya; Harada, Erimi; Kanno, Masako; Kawahara, Yoshihiro; Kawashima, Hiroaki; Kubooka, Hiromi; Matsuya, Akihiro; Nakaoka, Hajime; Saichi, Naomi; Sanbonmatsu, Ryoko; Sato, Yoshiharu; Shinso, Yuji; Suzuki, Mami; Takeda, Jun-ichi; Tanino, Motohiko; Todokoro, Fusano; Yamaguchi, Kaori; Yamamoto, Naoyuki; Yamasaki, Chisato; Imanishi, Tadashi; Okido, Toshihisa; Tada, Masahito; Ikeo, Kazuho; Tateno, Yoshio; Gojobori, Takashi; Lin, Yao-Cheng; Wei, Fu-Jin; Hsing, Yue-ie; Zhao, Qiang; Han, Bin; Kramer, Melissa R; McCombie, Richard W; Lonsdale, David; O'Donovan, Claire C; Whitfield, Eleanor J; Apweiler, Rolf; Koyanagi, Kanako O; Khurana, Jitendra P.

Nucleic Acids Res ; 36(Database issue): D1028-33, 2008 Jan.

Article in English | MEDLINE | ID: mdl-18089549

ABSTRACT

The Rice Annotation Project Database (RAP-DB) was created to provide the genome sequence assembly of the International Rice Genome Sequencing Project (IRGSP), manually curated annotation of the sequence, and other genomics information that could be useful for comprehensive understanding of the rice biology. Since the last publication of the RAP-DB, the IRGSP genome has been revised and reassembled. In addition, a large number of rice-expressed sequence tags have been released, and functional genomics resources have been produced worldwide. Thus, we have thoroughly updated our genome annotation by manual curation of all the functional descriptions of rice genes. The latest version of the RAP-DB contains a variety of annotation data as follows: clone positions, structures and functions of 31 439 genes validated by cDNAs, RNA genes detected by massively parallel signature sequencing (MPSS) technology and sequence similarity, flanking sequences of mutant lines, transposable elements, etc. Other annotation data such as Gnomon can be displayed along with those of RAP for comparison. We have also developed a new keyword search system to allow the user to access useful information. The RAP-DB is available at: http://rapdb.dna.affrc.go.jp/ and http://rapdb.lab.nig.ac.jp/.

Subject(s)

Databases, Nucleic Acid , Genome, Plant , Oryza/genetics , Genes, Plant , Genomics , Internet , MicroRNAs/genetics , RNA, Small Interfering/genetics , User-Computer Interface

The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts.

Yamasaki, Chisato; Murakami, Katsuhiko; Fujii, Yasuyuki; Sato, Yoshiharu; Harada, Erimi; Takeda, Jun-ichi; Taniya, Takayuki; Sakate, Ryuichi; Kikugawa, Shingo; Shimada, Makoto; Tanino, Motohiko; Koyanagi, Kanako O; Barrero, Roberto A; Gough, Craig; Chun, Hong-Woo; Habara, Takuya; Hanaoka, Hideki; Hayakawa, Yosuke; Hilton, Phillip B; Kaneko, Yayoi; Kanno, Masako; Kawahara, Yoshihiro; Kawamura, Toshiyuki; Matsuya, Akihiro; Nagata, Naoki; Nishikata, Kensaku; Noda, Akiko Ogura; Nurimoto, Shin; Saichi, Naomi; Sakai, Hiroaki; Sanbonmatsu, Ryoko; Shiba, Rie; Suzuki, Mami; Takabayashi, Kazuhiko; Takahashi, Aiko; Tamura, Takuro; Tanaka, Masayuki; Tanaka, Susumu; Todokoro, Fusano; Yamaguchi, Kaori; Yamamoto, Naoyuki; Okido, Toshihisa; Mashima, Jun; Hashizume, Aki; Jin, Lihua; Lee, Kyung-Bum; Lin, Yi-Chueh; Nozaki, Asami; Sakai, Katsunaga; Tada, Masahito.

Nucleic Acids Res ; 36(Database issue): D793-9, 2008 Jan.

Article in English | MEDLINE | ID: mdl-18089548

ABSTRACT

Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of full-length cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.

Subject(s)

Databases, Genetic , Genes , RNA, Messenger/chemistry , Animals , Chromosome Mapping , DNA, Complementary/chemistry , Humans , Internet , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , RNA, Messenger/genetics , User-Computer Interface

Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees.

Matsuya, Akihiro; Sakate, Ryuichi; Kawahara, Yoshihiro; Koyanagi, Kanako O; Sato, Yoshiharu; Fujii, Yasuyuki; Yamasaki, Chisato; Habara, Takuya; Nakaoka, Hajime; Todokoro, Fusano; Yamaguchi, Kaori; Endo, Toshinori; Oota, Satoshi; Makalowski, Wojciech; Ikeo, Kazuho; Suzuki, Yoshiyuki; Hanada, Kousuke; Hashimoto, Katsuyuki; Hirai, Momoki; Iwama, Hisakazu; Saitou, Naruya; Hiraki, Aiko T; Jin, Lihua; Kaneko, Yayoi; Kanno, Masako; Murakami, Katsuhiko; Noda, Akiko Ogura; Saichi, Naomi; Sanbonmatsu, Ryoko; Suzuki, Mami; Takeda, Jun-ichi; Tanaka, Masayuki; Gojobori, Takashi; Imanishi, Tadashi; Itoh, Takeshi.

Nucleic Acids Res ; 36(Database issue): D787-92, 2008 Jan.

Article in English | MEDLINE | ID: mdl-17982176

ABSTRACT

Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Currently, with the rapid growth of transcriptome data of various species, more reliable orthology information is prerequisite for further studies. However, detection of orthologs could be erroneous if pairwise distance-based methods, such as reciprocal BLAST searches, are utilized. Thus, as a sub-database of H-InvDB, an integrated database of annotated human genes (http://h-invitational.jp/), we constructed a fully curated database of evolutionary features of human genes, called 'Evola'. In the process of the ortholog detection, computational analysis based on conserved genome synteny and transcript sequence similarity was followed by manual curation by researchers examining phylogenetic trees. In total, 18 968 human genes have orthologs among 11 vertebrates (chimpanzee, mouse, cow, chicken, zebrafish, etc.), either computationally detected or manually curated orthologs. Evola provides amino acid sequence alignments and phylogenetic trees of orthologs and homologs. In 'd(N)/d(S) view', natural selection on genes can be analyzed between human and other species. In 'Locus maps', all transcript variants and their exon/intron structures can be compared among orthologous gene loci. We expect the Evola to serve as a comprehensive and reliable database to be utilized in comparative analyses for obtaining new knowledge about human genes. Evola is available at http://www.h-invitational.jp/evola/.

Subject(s)

Databases, Genetic , Genes , Genome, Human , Phylogeny , Animals , Computational Biology , Genomics , Humans , Internet , RNA, Messenger/chemistry , Selection, Genetic , Sequence Alignment , Sequence Analysis, Protein , Synteny

TACT: Transcriptome Auto-annotation Conducting Tool of H-InvDB.

Yamasaki, Chisato; Kawashima, Hiroaki; Todokoro, Fusano; Imamizu, Yasuhiro; Ogawa, Makoto; Tanino, Motohiko; Itoh, Takeshi; Gojobori, Takashi; Imanishi, Tadashi.

Nucleic Acids Res ; 34(Web Server issue): W345-9, 2006 Jul 01.

Article in English | MEDLINE | ID: mdl-16845023

ABSTRACT

Transcriptome Auto-annotation Conducting Tool (TACT) is a newly developed web-based automated tool for conducting functional annotation of transcripts by the integration of sequence similarity searches and functional motif predictions. We developed the TACT system by integrating two kinds of similarity searches, FASTY and BLASTX, against protein sequence databases, UniProtKB (Swiss-Prot/TrEMBL) and RefSeq, and a unified motif prediction program, InterProScan, into the ORF-prediction pipeline originally designed for the 'H-Invitational' human transcriptome annotation project. This system successively applies these constituent programs to an mRNA sequence in order to predict the most plausible ORF and the function of the protein encoded. In this study, we applied the TACT system to 19 574 non-redundant human transcripts registered in H-InvDB and evaluated its predictive power by the degree of agreement with human-curated functional annotation in H-InvDB. As a result, the TACT system could assign functional description to 12 559 transcripts (64.2%), the remainder being hypothetical proteins. Furthermore, the overall agreement of functional annotation with H-InvDB, including those transcripts annotated as hypothetical proteins, was 83.9% (16 432/19 574). These results show that the TACT system is useful for functional annotation and that the prediction of ORFs and protein functions is highly accurate and close to the results of human curation. TACT is freely available at http://www.jbirc.aist.go.jp/tact/.

Subject(s)

RNA, Messenger/chemistry , Sequence Analysis/methods , Software , Amino Acid Motifs , Computational Biology/methods , DNA, Complementary/chemistry , Databases, Protein , Expressed Sequence Tags/chemistry , Humans , Internet , Open Reading Frames , Proteins/genetics , Proteins/physiology , Sequence Analysis, DNA , Sequence Analysis, RNA , Systems Integration , User-Computer Interface

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL