Búsqueda | Portal Regional de la BVS

1.

High sensitivity TSS prediction: estimates of locations where TSS cannot occur.

Schaefer, Ulf; Kodzius, Rimantas; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Bajic, Vladimir B.

PLoS One ; 5(11): e13934, 2010 Nov 15.

Artículo en Inglés | MEDLINE | ID: mdl-21085627

RESUMEN

BACKGROUND: Although transcription in mammalian genomes can initiate from various genomic positions (e.g., 3'UTR, coding exons, etc.), most locations on genomes are not prone to transcription initiation. It is of practical and theoretical interest to be able to estimate such collections of non-TSS locations (NTLs). The identification of large portions of NTLs can contribute to better focusing the search for TSS locations and thus contribute to promoter and gene finding. It can help in the assessment of 5' completeness of expressed sequences, contribute to more successful experimental designs, as well as more accurate gene annotation. METHODOLOGY: Using comprehensive collections of Cap Analysis of Gene Expression (CAGE) and other transcript data from mouse and human genomes, we developed a methodology that allows us, by performing computational TSS prediction with very high sensitivity, to annotate, with a high accuracy in a strand specific manner, locations of mammalian genomes that are highly unlikely to harbor transcription start sites (TSSs). The properties of the immediate genomic neighborhood of 98,682 accurately determined mouse and 113,814 human TSSs are used to determine features that distinguish genomic transcription initiation locations from those that are not likely to initiate transcription. In our algorithm we utilize various constraining properties of features identified in the upstream and downstream regions around TSSs, as well as statistical analyses of these surrounding regions. CONCLUSIONS: Our analysis of human chromosomes 4, 21 and 22 estimates â¼46%, â¼41% and â¼27% of these chromosomes, respectively, as being NTLs. This suggests that on average more than 40% of the human genome can be expected to be highly unlikely to initiate transcription. Our method represents the first one that utilizes high-sensitivity TSS prediction to identify, with high accuracy, large portions of mammalian genomes as NTLs. The server with our algorithm implemented is available at http://cbrc.kaust.edu.sa/ddm/.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Regiones Promotoras Genéticas/genética , Sitio de Iniciación de la Transcripción , Animales , Secuencia de Bases , Cromosomas Humanos Par 21/genética , Cromosomas Humanos Par 22/genética , Cromosomas Humanos Par 4/genética , Genoma/genética , Genoma Humano/genética , Humanos , Internet , Ratones , Datos de Secuencia Molecular , Receptores Opioides mu/genética , Reproducibilidad de los Resultados , Transcripción Genética

2.

An atlas of combinatorial transcriptional regulation in mouse and man.

Ravasi, Timothy; Suzuki, Harukazu; Cannistraci, Carlo Vittorio; Katayama, Shintaro; Bajic, Vladimir B; Tan, Kai; Akalin, Altuna; Schmeier, Sebastian; Kanamori-Katayama, Mutsumi; Bertin, Nicolas; Carninci, Piero; Daub, Carsten O; Forrest, Alistair R R; Gough, Julian; Grimmond, Sean; Han, Jung-Hoon; Hashimoto, Takehiro; Hide, Winston; Hofmann, Oliver; Kamburov, Atanas; Kaur, Mandeep; Kawaji, Hideya; Kubosaki, Atsutaka; Lassmann, Timo; van Nimwegen, Erik; MacPherson, Cameron Ross; Ogawa, Chihiro; Radovanovic, Aleksandar; Schwartz, Ariel; Teasdale, Rohan D; Tegnér, Jesper; Lenhard, Boris; Teichmann, Sarah A; Arakawa, Takahiro; Ninomiya, Noriko; Murakami, Kayoko; Tagami, Michihira; Fukuda, Shiro; Imamura, Kengo; Kai, Chikatoshi; Ishihara, Ryoko; Kitazume, Yayoi; Kawai, Jun; Hume, David A; Ideker, Trey; Hayashizaki, Yoshihide.

Cell ; 140(5): 744-52, 2010 Mar 05.

Artículo en Inglés | MEDLINE | ID: mdl-20211142

RESUMEN

Combinatorial interactions among transcription factors are critical to directing tissue-specific gene expression. To build a global atlas of these combinations, we have screened for physical interactions among the majority of human and mouse DNA-binding transcription factors (TFs). The complete networks contain 762 human and 877 mouse interactions. Analysis of the networks reveals that highly connected TFs are broadly expressed across tissues, and that roughly half of the measured interactions are conserved between mouse and human. The data highlight the importance of TF combinations for determining cell fate, and they lead to the identification of a SMAD3/FLI1 complex expressed during development of immunity. The availability of large TF combinatorial networks in both human and mouse will provide many opportunities to study gene regulation, tissue differentiation, and mammalian evolution.

Asunto(s)

Regulación de la Expresión Génica , Redes Reguladoras de Genes , Factores de Transcripción/metabolismo , Animales , Diferenciación Celular , Evolución Molecular , Humanos , Ratones , Monocitos/citología , Especificidad de Órganos , Proteína smad3/metabolismo , Transactivadores/metabolismo

3.

Assessment of adaptive evolution between wheat and rice as deduced from full-length common wheat cDNA sequence data and expression patterns.

Kawaura, Kanako; Mochida, Keiichi; Enju, Akiko; Totoki, Yasushi; Toyoda, Atsushi; Sakaki, Yoshiyuki; Kai, Chikatoshi; Kawai, Jun; Hayashizaki, Yoshihide; Seki, Motoaki; Shinozaki, Kazuo; Ogihara, Yasunari.

BMC Genomics ; 10: 271, 2009 Jun 18.

Artículo en Inglés | MEDLINE | ID: mdl-19534823

RESUMEN

BACKGROUND: Wheat is an allopolyploid plant that harbors a huge, complex genome. Therefore, accumulation of expressed sequence tags (ESTs) for wheat is becoming particularly important for functional genomics and molecular breeding. We prepared a comprehensive collection of ESTs from the various tissues that develop during the wheat life cycle and from tissues subjected to stress. We also examined their expression profiles in silico. As full-length cDNAs are indispensable to certify the collected ESTs and annotate the genes in the wheat genome, we performed a systematic survey and sequencing of the full-length cDNA clones. This sequence information is a valuable genetic resource for functional genomics and will enable carrying out comparative genomics in cereals. RESULTS: As part of the functional genomics and development of genomic wheat resources, we have generated a collection of full-length cDNAs from common wheat. By grouping the ESTs of recombinant clones randomly selected from the full-length cDNA library, we were able to sequence 6,162 independent clones with high accuracy. About 10% of the clones were wheat-unique genes, without any counterparts within the DNA database. Wheat clones that showed high homology to those of rice were selected in order to investigate their expression patterns in various tissues throughout the wheat life cycle and in response to abiotic-stress treatments. To assess the variability of genes that have evolved differently in wheat and rice, we calculated the substitution rate (Ka/Ks) of the counterparts in wheat and rice. Genes that were preferentially expressed in certain tissues or treatments had higher Ka/Ks values than those in other tissues and treatments, which suggests that the genes with the higher variability expressed in these tissues is under adaptive selection. CONCLUSION: We have generated a high-quality full-length cDNA resource for common wheat, which is essential for continuation of the ongoing curation and annotation of the wheat genome. The data for each clone's expression in various tissues and stress treatments and its variability in wheat and rice as a result of their diversification are valuable tools for functional genomics in wheat and for comparative genomics in cereals.

Asunto(s)

Adaptación Biológica/genética , Evolución Molecular , Oryza/genética , Plantas Tolerantes a la Sal/genética , Triticum/genética , ADN Complementario/genética , ADN de Plantas/genética , Etiquetas de Secuencia Expresada , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Biblioteca de Genes , Genes de Plantas , Genómica , Análisis de Secuencia de ADN , Estrés Fisiológico

4.

Hidden layers of human small RNAs.

Kawaji, Hideya; Nakamura, Mari; Takahashi, Yukari; Sandelin, Albin; Katayama, Shintaro; Fukuda, Shiro; Daub, Carsten O; Kai, Chikatoshi; Kawai, Jun; Yasuda, Jun; Carninci, Piero; Hayashizaki, Yoshihide.

BMC Genomics ; 9: 157, 2008 Apr 10.

Artículo en Inglés | MEDLINE | ID: mdl-18402656

RESUMEN

BACKGROUND: Small RNA attracts increasing interest based on the discovery of RNA silencing and the rapid progress of our understanding of these phenomena. Although recent studies suggest the possible existence of yet undiscovered types of small RNAs in higher organisms, many studies to profile small RNA have focused on miRNA and/or siRNA rather than on the exploration of additional classes of RNAs. RESULTS: Here, we explored human small RNAs by unbiased sequencing of RNAs with sizes of 19-40 nt. We provide substantial evidences for the existence of independent classes of small RNAs. Our data shows that well-characterized non-coding RNA, such as tRNA, snoRNA, and snRNA are cleaved at sites specific to the class of ncRNA. In particular, tRNA cleavage is regulated depending on tRNA type and tissue expression. We also found small RNAs mapped to genomic regions that are transcribed in both directions by bidirectional promoters, indicating that the small RNAs are a product of dsRNA formation and their subsequent cleavage. Their partial similarity with ribosomal RNAs (rRNAs) suggests unrevealed functions of ribosomal DNA or interstitial rRNA. Further examination revealed six novel miRNAs. CONCLUSION: Our results underscore the complexity of the small RNA world and the biogenesis of small RNAs.

Asunto(s)

Evolución Molecular , ARN/genética , ARN/metabolismo , Emparejamiento Base , Secuencia de Bases , Northern Blotting , Biblioteca de Genes , Humanos , Datos de Secuencia Molecular , Familia de Multigenes/genética , ARN/clasificación , Alineación de Secuencia , Análisis de Secuencia de ARN

5.

Towards defining the nuclear proteome.

Fink, J Lynn; Karunaratne, Seetha; Mittal, Amit; Gardiner, Donald M; Hamilton, Nicholas; Mahony, Donna; Kai, Chikatoshi; Suzuki, Harukazu; Hayashizaki, Yosihide; Teasdale, Rohan D.

Genome Biol ; 9(1): R15, 2008 Jan 23.

Artículo en Inglés | MEDLINE | ID: mdl-18211718

RESUMEN

BACKGROUND: The nucleus is a complex cellular organelle and accurately defining its protein content is essential before any systematic characterization can be considered. RESULTS: We report direct evidence for 2,568 mammalian proteins within the nuclear proteome: the nuclear subcellular localization of 1,529 proteins based on a high-throughput subcellular localization protocol of full-length proteins and an additional 1,039 proteins for which clear experimental evidence is documented in published literature. This is direct evidence that the nuclear proteome consists of at least 14% of the entire proteome. This dataset was used to evaluate computational approaches designed to identify additional nuclear proteins. CONCLUSION: This represents direct experimental evidence that the nuclear proteome consists of at least 14% of the entire proteome. This high-quality nuclear proteome dataset was used to evaluate computational approaches designed to identify additional nuclear proteins. Based on this analysis, researchers can determine the stringency and types of lines of evidence they consider to infer the size and complement of the nuclear proteome.

Asunto(s)

Núcleo Celular/química , Proteoma , Animales , Biología Computacional/métodos , Humanos , Proteínas Nucleares

6.

Splicing bypasses 3' end formation signals to allow complex gene architectures.

Frith, Martin C; Carninci, Piero; Kai, Chikatoshi; Kawai, Jun; Bailey, Timothy L; Hayashizaki, Yoshihide; Mattick, John S.

Gene ; 403(1-2): 188-93, 2007 Nov 15.

Artículo en Inglés | MEDLINE | ID: mdl-17897791

RESUMEN

Many genes are arranged in complex overlapping and interlaced patterns in eukaryotic genomes. It is unclear whether or how such genes can avoid interference from each other's RNA processing signals and retain distinct identities. This puzzle applies particularly to 3' end formation sites, which inherently terminate the transcript, and thus act as boundaries between adjacent genes. We hypothesise that the transcript processing machinery can bypass 3' end formation sites by splicing out an intron surrounding the site. We confirm a prediction of this hypothesis: the likelihood of transcripts extending beyond 3' end sites depends on the strength of 3' end formation signals located in exons in the mature transcript, but not of those in introns that are spliced out of the transcript. This bypassing mechanism permits nested and interleaved gene architectures, as well as fusion transcripts that combine exons from adjacent genes.

Asunto(s)

Regiones no Traducidas 3'/genética , Empalme Alternativo/genética , Modelos Genéticos , Animales , Cromosomas de los Mamíferos , ADN Complementario , Exones , Etiquetas de Secuencia Expresada , Genoma , Intrones , Ratones , Poliadenilación/genética , ARN Mensajero/metabolismo , Transcripción Genética

7.

Gemin2 plays an important role in stabilizing the survival of motor neuron complex.

Ogawa, Chihiro; Usui, Kengo; Aoki, Makoto; Ito, Fuyu; Itoh, Masayoshi; Kai, Chikatoshi; Kanamori-Katayama, Mutsumi; Hayashizaki, Yoshihide; Suzuki, Harukazu.

J Biol Chem ; 282(15): 11122-34, 2007 Apr 13.

Artículo en Inglés | MEDLINE | ID: mdl-17308308

RESUMEN

The survival of motor neuron (SMN) protein, responsible for the neurodegenerative disease spinal muscular atrophy (SMA), oligomerizes and forms a stable complex with seven other major components, the Gemin proteins. Besides the SMN protein, Gemin2 is a core protein that is essential for the formation of the SMN complex, although the mechanism by which it drives formation is unclear. We have found a novel interaction, a Gemin2 self-association, using the mammalian two-hybrid system and the in vitro pull-down assays. Using in vitro dissociation assays, we also found that the self-interaction of the amino-terminal SMN protein, which was confirmed in this study, became stable in the presence of Gemin2. In addition, Gemin2 knockdown using small interference RNA treatment revealed a drastic decrease in SMN oligomer formation and in the assembly activity of spliceosomal small nuclear ribonucleoprotein (snRNP). Taken together, these results indicate that Gemin2 plays an important role in snRNP assembly through the stabilization of the SMN oligomer/complex via novel self-interaction. Applying the results/techniques to amino-terminal SMN missense mutants that were recently identified from SMA patients, we successfully showed that amino-terminal self-association, Gemin2 binding, the stabilization effect of Gemin2, and snRNP assembly activity were all lowered in the mutant SMN(D44V), suggesting that instability of the amino-terminal SMN self-association may cause SMA in patients carrying this allele.

Asunto(s)

Proteína de Unión a Elemento de Respuesta al AMP Cíclico/metabolismo , Proteínas del Tejido Nervioso/metabolismo , Proteínas de Unión al ARN/metabolismo , Animales , Proteína de Unión a Elemento de Respuesta al AMP Cíclico/genética , Células HeLa , Humanos , Ratones , Mutación/genética , Proteínas del Tejido Nervioso/genética , Unión Proteica , Proteínas de Unión al ARN/genética , Ribonucleoproteínas Nucleares Pequeñas/metabolismo , Proteínas del Complejo SMN

8.

Diversity of Ca2+-activated K+ channel transcripts in inner ear hair cells.

Beisel, Kirk W; Rocha-Sanchez, Sonia M; Ziegenbein, Sylvia J; Morris, Ken A; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Davis, Robin L.

Gene ; 386(1-2): 11-23, 2007 Jan 15.

Artículo en Inglés | MEDLINE | ID: mdl-17097837

RESUMEN

Hair cells express a complement of ion channels, representing shared and distinct channels that confer distinct electrophysiological signatures for each cell. This diversity is generated by the use of alternative splicing in the alpha subunit, formation of heterotetrameric channels, and combinatorial association with beta subunits. These channels are thought to play a role in the tonotopic gradient observed in the mammalian cochlea. Mouse Kcnma1 transcripts, 5' and 3' ESTs, and genomic sequences were examined for the utilization of alternative splicing in the mouse transcriptome. Comparative genomic analyses investigated the conservation of KCNMA1 splice sites. Genomes of mouse, rat, human, opossum, chicken, frog and zebrafish established that the exon-intron structure and mechanism of KCNMA1 alternative splicing were highly conserved with 6-7 splice sites being utilized. The murine Kcnma1 utilized 6 out of 7 potential splice sites. RT-PCR experiments using murine gene-specific oligonucleotide primers analyzed the scope and variety of Kcnma1 and Kcnmb1-4 expression profiles in the cochlea and inner ear hair cells. In the cochlea splice variants were present representing sites 3, 4, 6, and 7, while site 1 was insertionless and site 2 utilized only exon 10. However, site 5 was not present. Detection of KCNMA1 transcripts and protein exhibited a quantitative longitudinal gradient with a reciprocal gradient found between inner and outer hair cells. Differential expression was also observed in the usage of the long form of the carboxy-terminus tail. These results suggest that a diversity of splice variants exist in rodent cochlear hair cells and this diversity is similar to that observed for non-mammalian vertebrate hair cells, such as chicken and turtle.

Asunto(s)

Perfilación de la Expresión Génica , Variación Genética , Células Ciliadas Auditivas Internas/metabolismo , Subunidades alfa de los Canales de Potasio de Gran Conductancia Activados por Calcio/genética , Transcripción Genética , Empalme Alternativo/genética , Animales , Secuencia Conservada , Humanos , Hibridación in Situ , Subunidades alfa de los Canales de Potasio de Gran Conductancia Activados por Calcio/biosíntesis , Ratones , Ratas

9.

Dynamic usage of transcription start sites within core promoters.

Kawaji, Hideya; Frith, Martin C; Katayama, Shintaro; Sandelin, Albin; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide.

Genome Biol ; 7(12): R118, 2006.

Artículo en Inglés | MEDLINE | ID: mdl-17156492

RESUMEN

BACKGROUND: Mammalian promoters do not initiate transcription at single, well defined base pairs, but rather at multiple, alternative start sites spread across a region. We previously characterized the static structures of transcription start site usage within promoters at the base pair level, based on large-scale sequencing of transcript 5' ends. RESULTS: In the present study we begin to explore the internal dynamics of mammalian promoters, and demonstrate that start site selection within many mouse core promoters varies among tissues. We also show that this dynamic usage of start sites is associated with CpG islands, broad and multimodal promoter structures, and imprinting. CONCLUSION: Our results reveal a new level of biologic complexity within promoters--fine-scale regulation of transcription starting events at the base pair level. These events are likely to be related to epigenetic transcriptional regulation.

Asunto(s)

Regiones Promotoras Genéticas , Transcripción Genética , Animales , Islas de CpG , Metilación de ADN , Ratones , Familia de Multigenes

10.

Discrimination of non-protein-coding transcripts from protein-coding mRNA.

Frith, Martin C; Bailey, Timothy L; Kasukawa, Takeya; Mignone, Flavio; Kummerfeld, Sarah K; Madera, Martin; Sunkara, Sirisha; Furuno, Masaaki; Bult, Carol J; Quackenbush, John; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Pesole, Graziano; Mattick, John S.

RNA Biol ; 3(1): 40-8, 2006.

Artículo en Inglés | MEDLINE | ID: mdl-17114936

RESUMEN

Several recent studies indicate that mammals and other organisms produce large numbers of RNA transcripts that do not correspond to known genes. It has been suggested that these transcripts do not encode proteins, but may instead function as RNAs. However, discrimination of coding and non-coding transcripts is not straightforward, and different laboratories have used different methods, whose ability to perform this discrimination is unclear. In this study, we examine ten bioinformatic methods that assess protein-coding potential and compare their ability and congruency in the discrimination of non-coding from coding sequences, based on four underlying principles: open reading frame size, sequence similarity to known proteins or protein domains, statistical models of protein-coding sequence, and synonymous versus non-synonymous substitution rates. Despite these different approaches, the methods show broad concordance, suggesting that coding and non-coding transcripts can, in general, be reliably discriminated, and that many of the recently discovered extra-genic transcripts are indeed non-coding. Comparison of the methods indicates reasons for unreliable predictions, and approaches to increase confidence further. Conversely and surprisingly, our analyses also provide evidence that as much as approximately 10% of entries in the manually curated protein database Swiss-Prot are erroneous translations of actually non-coding transcripts.

Asunto(s)

Bioquímica/métodos , Técnicas Genéticas , ARN Mensajero/química , ARN no Traducido/química , Algoritmos , Animales , Biología Computacional , ADN Complementario/metabolismo , Interpretación Estadística de Datos , Bases de Datos de Proteínas , Etiquetas de Secuencia Expresada , Ratones , Sistemas de Lectura Abierta , Estructura Terciaria de Proteína , Proteínas/química , ARN Mensajero/genética , ARN no Traducido/genética

11.

Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promoters.

Ponjavic, Jasmina; Lenhard, Boris; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Sandelin, Albin.

Genome Biol ; 7(8): R78, 2006.

Artículo en Inglés | MEDLINE | ID: mdl-16916456

RESUMEN

BACKGROUND: The TATA box, one of the most well studied core promoter elements, is associated with induced, context-specific expression. The lack of precise transcription start site (TSS) locations linked with expression information has impeded genome-wide characterization of the interaction between TATA and the pre-initiation complex. RESULTS: Using a comprehensive set of 5.66 x 10(6) sequenced 5' cDNA ends from diverse tissues mapped to the mouse genome, we found that the TATA-TSS distance is correlated with the tissue specificity of the downstream transcript. To achieve tissue-specific regulation, the TATA box position relative to the TSS is constrained to a narrow window (-32 to -29), where positions -31 and -30 are the optimal positions for achieving high tissue specificity. Slightly larger spacings can be accommodated only when there is no optimally spaced initiation signal; in contrast, the TATA box like motifs found downstream of position -28 are generally nonfunctional. The strength of the TATA binding protein-DNA interaction plays a subordinate role to spacing in terms of tissue specificity. Furthermore, promoters with different TATA-TSS spacings have distinct features in terms of consensus sequence around the initiation site and distribution of alternative TSSs. Unexpectedly, promoters that have two dominant, consecutive TSSs are TATA depleted and have a novel GGG initiation site consensus. CONCLUSION: In this report we present the most comprehensive characterization of TATA-TSS spacing and functionality to date. The coupling of spacing to tissue specificity at the transcriptome level provides important clues as to the function of core promoters and the choice of TSS by the pre-initiation complex.

Asunto(s)

Regulación de la Expresión Génica/genética , Regiones Promotoras Genéticas/genética , TATA Box/genética , Sitio de Iniciación de la Transcripción , Animales , Simulación por Computador , Etiquetas de Secuencia Expresada/metabolismo , Biblioteca de Genes , Genómica , Ratones , Modelos Genéticos

12.

Evolutionary turnover of mammalian transcription start sites.

Frith, Martin C; Ponjavic, Jasmina; Fredman, David; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Hayshizaki, Yoshihide; Sandelin, Albin.

Genome Res ; 16(6): 713-22, 2006 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-16687732

RESUMEN

Alignments of homologous genomic sequences are widely used to identify functional genetic elements and study their evolution. Most studies tacitly equate homology of functional elements with sequence homology. This assumption is violated by the phenomenon of turnover, in which functionally equivalent elements reside at locations that are nonorthologous at the sequence level. Turnover has been demonstrated previously for transcription-factor-binding sites. Here, we show that transcription start sites of equivalent genes do not always reside at equivalent locations in the human and mouse genomes. We also identify two types of partial turnover, illustrating evolutionary pathways that could lead to complete turnover. These findings suggest that the signals encoding transcription start sites are highly flexible and evolvable, and have cautionary implications for the use of sequence-level conservation to detect gene regulatory elements.

Asunto(s)

Evolución Molecular , Sitio de Iniciación de la Transcripción , Animales , Islas de CpG/genética , Biblioteca de Genes , Genoma , Humanos , Ratones , Regiones Promotoras Genéticas , Alineación de Secuencia

13.

Pseudo-messenger RNA: phantoms of the transcriptome.

Frith, Martin C; Wilming, Laurens G; Forrest, Alistair; Kawaji, Hideya; Tan, Sin Lam; Wahlestedt, Claes; Bajic, Vladimir B; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Bailey, Timothy L; Huminiecki, Lukasz.

PLoS Genet ; 2(4): e23, 2006 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-16683022

RESUMEN

The mammalian transcriptome harbours shadowy entities that resist classification and analysis. In analogy with pseudogenes, we define pseudo-messenger RNA to be RNA molecules that resemble protein-coding mRNA, but cannot encode full-length proteins owing to disruptions of the reading frame. Using a rigorous computational pipeline, which rules out sequencing errors, we identify 10,679 pseudo-messenger RNAs (approximately half of which are transposon-associated) among the 102,801 FANTOM3 mouse cDNAs: just over 10% of the FANTOM3 transcriptome. These comprise not only transcribed pseudogenes, but also disrupted splice variants of otherwise protein-coding genes. Some may encode truncated proteins, only a minority of which appear subject to nonsense-mediated decay. The presence of an excess of transcripts whose only disruptions are opal stop codons suggests that there are more selenoproteins than currently estimated. We also describe compensatory frameshifts, where a segment of the gene has changed frame but remains translatable. In summary, we survey a large class of non-standard but potentially functional transcripts that are likely to encode genetic information and effect biological processes in novel ways. Many of these transcripts do not correspond cleanly to any identifiable object in the genome, implying fundamental limits to the goal of annotating all functional elements at the genome sequence level.

Asunto(s)

ARN Mensajero/genética , Transcripción Genética , Animales , Elementos Transponibles de ADN , Evolución Molecular , Humanos , Ratones , Regiones Promotoras Genéticas , Proteínas/genética , Seudogenes , Reproducibilidad de los Resultados , Alineación de Secuencia

14.

Heterotachy in mammalian promoter evolution.

Taylor, Martin S; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Semple, Colin A M.

PLoS Genet ; 2(4): e30, 2006 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-16683025

RESUMEN

We have surveyed the evolutionary trends of mammalian promoters and upstream sequences, utilising large sets of experimentally supported transcription start sites (TSSs). With 30,969 well-defined TSSs from mouse and 26,341 from human, there are sufficient numbers to draw statistically meaningful conclusions and to consider differences between promoter types. Unlike previous smaller studies, we have considered the effects of insertions, deletions, and transposable elements as well as nucleotide substitutions. The rate of promoter evolution relative to that of control sequences has not been consistent between lineages nor within lineages over time. The most pronounced manifestation of this heterotachy is the increased rate of evolution in primate promoters. This increase is seen across different classes of mutation, including substitutions and micro-indel events. We investigated the relationship between promoter and coding sequence selective constraint and suggest that they are generally uncorrelated. This analysis also identified a small number of mouse promoters associated with the immune response that are under positive selection in rodents. We demonstrate significant differences in divergence between functional promoter categories and identify a category of promoters, not associated with conventional protein-coding genes, that has the highest rates of divergence across mammals. We find that evolutionary rates vary both on a fine scale within mammalian promoters and also between different functional classes of promoters. The discovery of heterotachy in promoter evolution, in particular the accelerated evolution of primate promoters, has important implications for our understanding of human evolution and for strategies to detect primate-specific regulatory elements.

Asunto(s)

Evolución Molecular , Primates/genética , Regiones Promotoras Genéticas , Transcripción Genética , Animales , Secuencia de Bases , Mapeo Cromosómico , Elementos Transponibles de ADN , Ingeniería Genética , Variación Genética , Genoma , Humanos , Ratones , Primates/anatomía & histología , Proteínas/genética , Análisis de Secuencia de ADN , Eliminación de Secuencia

15.

Clusters of internally primed transcripts reveal novel long noncoding RNAs.

Furuno, Masaaki; Pang, Ken C; Ninomiya, Noriko; Fukuda, Shiro; Frith, Martin C; Bult, Carol; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Mattick, John S; Suzuki, Harukazu.

PLoS Genet ; 2(4): e37, 2006 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-16683026

RESUMEN

Non-protein-coding RNAs (ncRNAs) are increasingly being recognized as having important regulatory roles. Although much recent attention has focused on tiny 22- to 25-nucleotide microRNAs, several functional ncRNAs are orders of magnitude larger in size. Examples of such macro ncRNAs include Xist and Air, which in mouse are 18 and 108 kilobases (Kb), respectively. We surveyed the 102,801 FANTOM3 mouse cDNA clones and found that Air and Xist were present not as single, full-length transcripts but as a cluster of multiple, shorter cDNAs, which were unspliced, had little coding potential, and were most likely primed from internal adenine-rich regions within longer parental transcripts. We therefore conducted a genome-wide search for regional clusters of such cDNAs to find novel macro ncRNA candidates. Sixty-six regions were identified, each of which mapped outside known protein-coding loci and which had a mean length of 92 Kb. We detected several known long ncRNAs within these regions, supporting the basic rationale of our approach. In silico analysis showed that many regions had evidence of imprinting and/or antisense transcription. These regions were significantly associated with microRNAs and transcripts from the central nervous system. We selected eight novel regions for experimental validation by northern blot and RT-PCR and found that the majority represent previously unrecognized noncoding transcripts that are at least 10 Kb in size and predominantly localized in the nucleus. Taken together, the data not only identify multiple new ncRNAs but also suggest the existence of many more macro ncRNAs like Xist and Air.

Asunto(s)

ARN no Traducido/genética , Transcripción Genética , Animales , Biología Computacional , ADN Complementario/genética , Etiquetas de Secuencia Expresada , Regulación de la Expresión Génica , Genoma , Genoma Humano , Humanos , Ratones , Familia de Multigenes , ARN Largo no Codificante , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa

16.

A method for similarity search of genomic positional expression using CAGE.

Seno, Shigeto; Takenaka, Yoichi; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Matsuda, Hideo.

PLoS Genet ; 2(4): e44, 2006 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-16683027

RESUMEN

With the advancement of genome research, it is becoming clear that genes are not distributed on the genome in random order. Clusters of genes distributed at localized genome positions have been reported in several eukaryotes. Various correlations have been observed between the expressions of genes in adjacent or nearby positions along the chromosomes depending on tissue type and developmental stage. Moreover, in several cases, their transcripts, which control epigenetic transcription via processes such as transcriptional interference and genomic imprinting, occur in clusters. It is reasonable that genomic regions that have similar mechanisms show similar expression patterns and that the characteristics of expression in the same genomic regions differ depending on tissue type and developmental stage. In this study, we analyzed gene expression patterns using the cap analysis gene expression (CAGE) method for exploring systematic views of the mouse transcriptome. Counting the number of mapped CAGE tags for fixed-length regions allowed us to determine genomic expression levels. These expression levels were normalized, quantified, and converted into four types of descriptors, allowing the expression patterns along the genome to be represented by character strings. We analyzed them using dynamic programming in the same manner as for sequence analysis. We have developed a novel algorithm that provides a novel view of the genome from the perspective of genomic positional expression. In a similarity search of expression patterns across chromosomes and tissues, we found regions that had clusters of genes that showed expression patterns similar to each other depending on tissue type. Our results suggest the possibility that the regions that have sense-antisense transcription show similar expression patterns between forward and reverse strands.

Asunto(s)

Mapeo Cromosómico/métodos , Genoma , Ratones/genética , Transcripción Genética , Algoritmos , Animales , Composición de Base , Regulación de la Expresión Génica , Genoma Humano , Humanos , Macrófagos/fisiología , MicroARNs/genética , Modelos Genéticos , ARN no Traducido/genética

17.

A simple physical model predicts small exon length variations.

Chern, Tzu-Ming; van Nimwegen, Erik; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Zavolan, Mihaela.

PLoS Genet ; 2(4): e45, 2006 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-16683028

RESUMEN

One of the most common splice variations are small exon length variations caused by the use of alternative donor or acceptor splice sites that are in very close proximity on the pre-mRNA. Among these, three-nucleotide variations at so-called NAGNAG tandem acceptor sites have recently attracted considerable attention, and it has been suggested that these variations are regulated and serve to fine-tune protein forms by the addition or removal of a single amino acid. In this paper we first show that in-frame exon length variations are generally overrepresented and that this overrepresentation can be quantitatively explained by the effect of nonsense-mediated decay. Our analysis allows us to estimate that about 50% of frame-shifted coding transcripts are targeted by nonsense-mediated decay. Second, we show that a simple physical model that assumes that the splicing machinery stochastically binds to nearby splice sites in proportion to the affinities of the sites correctly predicts the relative abundances of different small length variations at both boundaries. Finally, using the same simple physical model, we show that for NAGNAG sites, the difference in affinities of the neighboring sites for the splicing machinery accurately predicts whether splicing will occur only at the first site, splicing will occur only at the second site, or three-nucleotide splice variants are likely to occur. Our analysis thus suggests that small exon length variations are the result of stochastic binding of the spliceosome at neighboring splice sites. Small exon length variations occur when there are nearby alternative splice sites that have similar affinity for the splicing machinery.

Asunto(s)

Exones/genética , Variación Genética , Modelos Genéticos , Animales , Mapeo Cromosómico , Regulación de la Expresión Génica , Masculino , Ratones , Músculo Esquelético/fisiología , Especificidad de Órganos , Próstata/fisiología , Transcripción Genética

18.

Differential use of signal peptides and membrane domains is a common occurrence in the protein output of transcriptional units.

Davis, Melissa J; Hanson, Kelly A; Clark, Francis; Fink, J Lynn; Zhang, Fasheng; Kasukawa, Takeya; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Teasdale, Rohan D.

PLoS Genet ; 2(4): e46, 2006 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-16683029

RESUMEN

Membrane organization describes the orientation of a protein with respect to the membrane and can be determined by the presence, or absence, and organization within the protein sequence of two features: endoplasmic reticulum signal peptides and alpha-helical transmembrane domains. These features allow protein sequences to be classified into one of five membrane organization categories: soluble intracellular proteins, soluble secreted proteins, type I membrane proteins, type II membrane proteins, and multi-spanning membrane proteins. Generation of protein isoforms with variable membrane organizations can change a protein's subcellular localization or association with the membrane. Application of MemO, a membrane organization annotation pipeline, to the FANTOM3 Isoform Protein Sequence mouse protein set revealed that within the 8,032 transcriptional units (TUs) with multiple protein isoforms, 573 had variation in their use of signal peptides, 1,527 had variation in their use of transmembrane domains, and 615 generated protein isoforms from distinct membrane organization classes. The mechanisms underlying these transcript variations were analyzed. While TUs were identified encoding all pairwise combinations of membrane organization categories, the most common was conversion of membrane proteins to soluble proteins. Observed within our high-confidence set were 156 TUs predicted to generate both extracellular soluble and membrane proteins, and 217 TUs generating both intracellular soluble and membrane proteins. The differential use of endoplasmic reticulum signal peptides and transmembrane domains is a common occurrence within the variable protein output of TUs. The generation of protein isoforms that are targeted to multiple subcellular locations represents a major functional consequence of transcript variation within the mouse transcriptome.

Asunto(s)

Proteínas de la Membrana/genética , Señales de Clasificación de Proteína/genética , Transcripción Genética , Animales , Variación Genética , Isoformas de Proteínas/genética

19.

Complex Loci in human and mouse genomes.

Engström, Pär G; Suzuki, Harukazu; Ninomiya, Noriko; Akalin, Altuna; Sessa, Luca; Lavorgna, Giovanni; Brozzi, Alessandro; Luzi, Lucilla; Tan, Sin Lam; Yang, Liang; Kunarso, Galih; Ng, Edwin Lian-Chong; Batalov, Serge; Wahlestedt, Claes; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Wells, Christine; Bajic, Vladimir B; Orlando, Valerio; Reid, James F; Lenhard, Boris; Lipovich, Leonard.

PLoS Genet ; 2(4): e47, 2006 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-16683030

RESUMEN

Mammalian genomes harbor a larger than expected number of complex loci, in which multiple genes are coupled by shared transcribed regions in antisense orientation and/or by bidirectional core promoters. To determine the incidence, functional significance, and evolutionary context of mammalian complex loci, we identified and characterized 5,248 cis-antisense pairs, 1,638 bidirectional promoters, and 1,153 chains of multiple cis-antisense and/or bidirectionally promoted pairs from 36,606 mouse transcriptional units (TUs), along with 6,141 cis-antisense pairs, 2,113 bidirectional promoters, and 1,480 chains from 42,887 human TUs. In both human and mouse, 25% of TUs resided in cis-antisense pairs, only 17% of which were conserved between the two organisms, indicating frequent species specificity of antisense gene arrangements. A sampling approach indicated that over 40% of all TUs might actually be in cis-antisense pairs, and that only a minority of these arrangements are likely to be conserved between human and mouse. Bidirectional promoters were characterized by variable transcriptional start sites and an identifiable midpoint at which overall sequence composition changed strand and the direction of transcriptional initiation switched. In microarray data covering a wide range of mouse tissues, genes in cis-antisense and bidirectionally promoted arrangement showed a higher probability of being coordinately expressed than random pairs of genes. In a case study on homeotic loci, we observed extensive transcription of nonconserved sequences on the noncoding strand, implying that the presence rather than the sequence of these transcripts is of functional importance. Complex loci are ubiquitous, host numerous nonconserved gene structures and lineage-specific exonification events, and may have a cis-regulatory impact on the member genes.

Asunto(s)

Mapeo Cromosómico , Genoma , Ratones , Animales , Ratones/genética , Emparejamiento Base , Cartilla de ADN , Genoma Humano , Regiones Promotoras Genéticas , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Humanos

20.

The abundance of short proteins in the mammalian proteome.

Frith, Martin C; Forrest, Alistair R; Nourbakhsh, Ehsan; Pang, Ken C; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Bailey, Timothy L; Grimmond, Sean M.

PLoS Genet ; 2(4): e52, 2006 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-16683031

RESUMEN

Short proteins play key roles in cell signalling and other processes, but their abundance in the mammalian proteome is unknown. Current catalogues of mammalian proteins exhibit an artefactual discontinuity at a length of 100 aa, so that protein abundance peaks just above this length and falls off sharply below it. To clarify the abundance of short proteins, we identify proteins in the FANTOM collection of mouse cDNAs by analysing synonymous and non-synonymous substitutions with the computer program CRITICA. This analysis confirms that there is no real discontinuity at length 100. Roughly 10% of mouse proteins are shorter than 100 aa, although the majority of these are variants of proteins longer than 100 aa. We identify many novel short proteins, including a "dark matter" subset containing ones that lack detectable homology to other known proteins. Translation assays confirm that some of these novel proteins can be translated and localised to the secretory pathway.

Asunto(s)

Ratones/genética , Proteínas/genética , Proteoma , Secuencia de Aminoácidos , Animales , Artefactos , ADN Complementario/genética , Variación Genética , Peso Molecular , Sistemas de Lectura Abierta , Biosíntesis de Proteínas , Reproducibilidad de los Resultados , Homología de Secuencia de Aminoácido

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA