Pesquisa | Portal Regional da BVS

Positional correlation analysis improves reconstruction of full-length transcripts and alternative isoforms from noisy array signals or short reads.

Kawaguchi, Shuji; Iida, Kei; Harada, Erimi; Hanada, Kousuke; Matsui, Akihiro; Okamoto, Masanori; Shinozaki, Kazuo; Seki, Motoaki; Toyoda, Tetsuro.

Bioinformatics ; 28(7): 929-37, 2012 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-22332235

RESUMO

MOTIVATION: A reconstruction of full-length transcripts observed by next-generation sequencer or tiling arrays is an essential technique to know all phenomena of transcriptomes. Several techniques of the reconstruction have been developed. However, problems of high-level noises and biases still remain and interrupt the reconstruction. A method is required that is robust against noise and bias and correctly reconstructs transcripts regardless of equipment used. RESULTS: We propose a completely new statistical method that reconstructs full-length transcripts and can be applied on both next-generation sequencers and tiling arrays. The method called ARTADE2 analyzes 'positional correlation', meaning correlations of expression values for every combination on genomic positions of multiple transcriptional data. ARTADE2 then reconstructs full-length transcripts using a logistic model based on the positional correlation and the Markov model. ARTADE2 elucidated 17 591 full-length transcripts from 55 transcriptome datasets and showed notable performance compared with other recent prediction methods. Moreover, 1489 novel transcripts were discovered. We experimentally tested 16 novel transcripts, among which 14 were confirmed by reverse transcription-polymerase chain reaction and sequence mapping. The method also showed notable performance for reconstructing of mRNA observed by a next-generation sequencer. Moreover, the positional correlation and factor analysis embedded in ARTADE2 successfully detected regions at which alternative isoforms may exist, and thus are expected to be applied for discovering transcript biomarkers for a wide range of disciplines including preemptive medicine. AVAILABILITY: http://matome.base.riken.jp CONTACT: toyoda@base.riken.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de DNA/métodos , Transcriptoma , Algoritmos , Modelos Logísticos , Cadeias de Markov , Isoformas de Proteínas/genética , RNA Mensageiro/genética

ARTADE2DB: improved statistical inferences for Arabidopsis gene functions and structure predictions by dynamic structure-based dynamic expression (DSDE) analyses.

Iida, Kei; Kawaguchi, Shuji; Kobayashi, Norio; Yoshida, Yuko; Ishii, Manabu; Harada, Erimi; Hanada, Kousuke; Matsui, Akihiro; Okamoto, Masanori; Ishida, Junko; Tanaka, Maho; Morosawa, Taeko; Seki, Motoaki; Toyoda, Tetsuro.

Plant Cell Physiol ; 52(2): 254-64, 2011 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-21227933

RESUMO

Recent advances in technologies for observing high-resolution genomic activities, such as whole-genome tiling arrays and high-throughput sequencers, provide detailed information for understanding genome functions. However, the functions of 50% of known Arabidopsis thaliana genes remain unknown or are annotated only on the basis of static analyses such as protein motifs or similarities. In this paper, we describe dynamic structure-based dynamic expression (DSDE) analysis, which sequentially predicts both structural and functional features of transcripts. We show that DSDE analysis inferred gene functions 12% more precisely than static structure-based dynamic expression (SSDE) analysis or conventional co-expression analysis based on previously determined gene structures of A. thaliana. This result suggests that more precise structural information than the fixed conventional annotated structures is crucial for co-expression analysis in systems biology of transcriptional regulation and dynamics. Our DSDE method, ARabidopsis Tiling-Array-based Detection of Exons version 2 and over-representation analysis (ARTADE2-ORA), precisely predicts each gene structure by combining two statistical analyses: a probe-wise co-expression analysis of multiple transcriptome measurements and a Markov model analysis of genome sequences. ARTADE2-ORA successfully identified the true functions of about 90% of functionally annotated genes, inferred the functions of 98% of functionally unknown genes and predicted 1,489 new gene structures and functions. We developed a database ARTADE2DB that integrates not only the information predicted by ARTADE2-ORA but also annotations and other functional information, such as phenotypes and literature citations, and is expected to contribute to the study of the functional genomics of A. thaliana. URL: http://artade.org.

Assuntos

Arabidopsis/genética , Bases de Dados Genéticas , Genômica/métodos , Éxons , Perfilação da Expressão Gênica , Genoma de Planta , Cadeias de Markov , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Relação Estrutura-Atividade , Biologia de Sistemas , Interface Usuário-Computador

The Rice Annotation Project Database (RAP-DB): 2008 update.

Tanaka, Tsuyoshi; Antonio, Baltazar A; Kikuchi, Shoshi; Matsumoto, Takashi; Nagamura, Yoshiaki; Numa, Hisataka; Sakai, Hiroaki; Wu, Jianzhong; Itoh, Takeshi; Sasaki, Takuji; Aono, Ryo; Fujii, Yasuyuki; Habara, Takuya; Harada, Erimi; Kanno, Masako; Kawahara, Yoshihiro; Kawashima, Hiroaki; Kubooka, Hiromi; Matsuya, Akihiro; Nakaoka, Hajime; Saichi, Naomi; Sanbonmatsu, Ryoko; Sato, Yoshiharu; Shinso, Yuji; Suzuki, Mami; Takeda, Jun-ichi; Tanino, Motohiko; Todokoro, Fusano; Yamaguchi, Kaori; Yamamoto, Naoyuki; Yamasaki, Chisato; Imanishi, Tadashi; Okido, Toshihisa; Tada, Masahito; Ikeo, Kazuho; Tateno, Yoshio; Gojobori, Takashi; Lin, Yao-Cheng; Wei, Fu-Jin; Hsing, Yue-ie; Zhao, Qiang; Han, Bin; Kramer, Melissa R; McCombie, Richard W; Lonsdale, David; O'Donovan, Claire C; Whitfield, Eleanor J; Apweiler, Rolf; Koyanagi, Kanako O; Khurana, Jitendra P.

Nucleic Acids Res ; 36(Database issue): D1028-33, 2008 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-18089549

RESUMO

The Rice Annotation Project Database (RAP-DB) was created to provide the genome sequence assembly of the International Rice Genome Sequencing Project (IRGSP), manually curated annotation of the sequence, and other genomics information that could be useful for comprehensive understanding of the rice biology. Since the last publication of the RAP-DB, the IRGSP genome has been revised and reassembled. In addition, a large number of rice-expressed sequence tags have been released, and functional genomics resources have been produced worldwide. Thus, we have thoroughly updated our genome annotation by manual curation of all the functional descriptions of rice genes. The latest version of the RAP-DB contains a variety of annotation data as follows: clone positions, structures and functions of 31 439 genes validated by cDNAs, RNA genes detected by massively parallel signature sequencing (MPSS) technology and sequence similarity, flanking sequences of mutant lines, transposable elements, etc. Other annotation data such as Gnomon can be displayed along with those of RAP for comparison. We have also developed a new keyword search system to allow the user to access useful information. The RAP-DB is available at: http://rapdb.dna.affrc.go.jp/ and http://rapdb.lab.nig.ac.jp/.

Assuntos

Bases de Dados de Ácidos Nucleicos , Genoma de Planta , Oryza/genética , Genes de Plantas , Genômica , Internet , MicroRNAs/genética , RNA Interferente Pequeno/genética , Interface Usuário-Computador

The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts.

Yamasaki, Chisato; Murakami, Katsuhiko; Fujii, Yasuyuki; Sato, Yoshiharu; Harada, Erimi; Takeda, Jun-ichi; Taniya, Takayuki; Sakate, Ryuichi; Kikugawa, Shingo; Shimada, Makoto; Tanino, Motohiko; Koyanagi, Kanako O; Barrero, Roberto A; Gough, Craig; Chun, Hong-Woo; Habara, Takuya; Hanaoka, Hideki; Hayakawa, Yosuke; Hilton, Phillip B; Kaneko, Yayoi; Kanno, Masako; Kawahara, Yoshihiro; Kawamura, Toshiyuki; Matsuya, Akihiro; Nagata, Naoki; Nishikata, Kensaku; Noda, Akiko Ogura; Nurimoto, Shin; Saichi, Naomi; Sakai, Hiroaki; Sanbonmatsu, Ryoko; Shiba, Rie; Suzuki, Mami; Takabayashi, Kazuhiko; Takahashi, Aiko; Tamura, Takuro; Tanaka, Masayuki; Tanaka, Susumu; Todokoro, Fusano; Yamaguchi, Kaori; Yamamoto, Naoyuki; Okido, Toshihisa; Mashima, Jun; Hashizume, Aki; Jin, Lihua; Lee, Kyung-Bum; Lin, Yi-Chueh; Nozaki, Asami; Sakai, Katsunaga; Tada, Masahito.

Nucleic Acids Res ; 36(Database issue): D793-9, 2008 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-18089548

RESUMO

Here we report the new features and improvements in our latest release of the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/), a comprehensive annotation resource for human genes and transcripts. H-InvDB, originally developed as an integrated database of the human transcriptome based on extensive annotation of large sets of full-length cDNA (FLcDNA) clones, now provides annotation for 120 558 human mRNAs extracted from the International Nucleotide Sequence Databases (INSD), in addition to 54 978 human FLcDNAs, in the latest release H-InvDB_4.6. We mapped those human transcripts onto the human genome sequences (NCBI build 36.1) and determined 34 699 human gene clusters, which could define 34 057 (98.1%) protein-coding and 642 (1.9%) non-protein-coding loci; 858 (2.5%) transcribed loci overlapped with predicted pseudogenes. For all these transcripts and genes, we provide comprehensive annotation including gene structures, gene functions, alternative splicing variants, functional non-protein-coding RNAs, functional domains, predicted sub cellular localizations, metabolic pathways, predictions of protein 3D structure, mapping of SNPs and microsatellite repeat motifs, co-localization with orphan diseases, gene expression profiles, orthologous genes, protein-protein interactions (PPI) and annotation for gene families. The current H-InvDB annotation resources consist of two main views: Transcript view and Locus view and eight sub-databases: the DiseaseInfo Viewer, H-ANGEL, the Clustering Viewer, G-integra, the TOPO Viewer, Evola, the PPI view and the Gene family/group.

Assuntos

Bases de Dados Genéticas , Genes , RNA Mensageiro/química , Animais , Mapeamento Cromossômico , DNA Complementar/química , Humanos , Internet , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , RNA Mensageiro/genética , Interface Usuário-Computador

Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana.

Itoh, Takeshi; Tanaka, Tsuyoshi; Barrero, Roberto A; Yamasaki, Chisato; Fujii, Yasuyuki; Hilton, Phillip B; Antonio, Baltazar A; Aono, Hideo; Apweiler, Rolf; Bruskiewich, Richard; Bureau, Thomas; Burr, Frances; Costa de Oliveira, Antonio; Fuks, Galina; Habara, Takuya; Haberer, Georg; Han, Bin; Harada, Erimi; Hiraki, Aiko T; Hirochika, Hirohiko; Hoen, Douglas; Hokari, Hiroki; Hosokawa, Satomi; Hsing, Yue-ie; Ikawa, Hiroshi; Ikeo, Kazuho; Imanishi, Tadashi; Ito, Yukiyo; Jaiswal, Pankaj; Kanno, Masako; Kawahara, Yoshihiro; Kawamura, Toshiyuki; Kawashima, Hiroaki; Khurana, Jitendra P; Kikuchi, Shoshi; Komatsu, Setsuko; Koyanagi, Kanako O; Kubooka, Hiromi; Lieberherr, Damien; Lin, Yao-Cheng; Lonsdale, David; Matsumoto, Takashi; Matsuya, Akihiro; McCombie, W Richard; Messing, Joachim; Miyao, Akio; Mulder, Nicola; Nagamura, Yoshiaki; Nam, Jongmin; Namiki, Nobukazu.

Genome Res ; 17(2): 175-83, 2007 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-17210932

RESUMO

We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is approximately 32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene.

Assuntos

Arabidopsis/genética , Genoma de Planta , Oryza/genética , Proteínas de Arabidopsis/genética , Códon/genética , DNA Complementar/genética , DNA de Plantas/genética , Bases de Dados de Proteínas , Evolução Molecular , Variação Genética , Mutagênese Insercional , Fases de Leitura Aberta , Proteínas de Plantas/genética , RNA Mensageiro/genética , RNA de Plantas/genética , RNA de Transferência/genética , Especificidade da Espécie

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA