Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 14 de 14
1.
Sci Data ; 4: 170112, 2017 08 29.
Article En | MEDLINE | ID: mdl-28850106

In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities.


Gene Expression Profiling , Genome , Animals , Gene Expression Regulation , Humans , Mice , Promoter Regions, Genetic , Species Specificity
2.
Genome Res ; 24(4): 708-17, 2014 Apr.
Article En | MEDLINE | ID: mdl-24676093

CAGE (cap analysis gene expression) and RNA-seq are two major technologies used to identify transcript abundances as well as structures. They measure expression by sequencing from either the 5' end of capped molecules (CAGE) or tags randomly distributed along the length of a transcript (RNA-seq). Library protocols for clonally amplified (Illumina, SOLiD, 454 Life Sciences [Roche], Ion Torrent), second-generation sequencing platforms typically employ PCR preamplification prior to clonal amplification, while third-generation, single-molecule sequencers can sequence unamplified libraries. Although these transcriptome profiling platforms have been demonstrated to be individually reproducible, no systematic comparison has been carried out between them. Here we compare CAGE, using both second- and third-generation sequencers, and RNA-seq, using a second-generation sequencer based on a panel of RNA mixtures from two human cell lines to examine power in the discrimination of biological states, detection of differentially expressed genes, linearity of measurements, and quantification reproducibility. We found that the quantified levels of gene expression are largely comparable across platforms and conclude that CAGE and RNA-seq are complementary technologies that can be used to improve incomplete gene models. We also found systematic bias in the second- and third-generation platforms, which is likely due to steps such as linker ligation, cleavage by restriction enzymes, and PCR amplification. This study provides a perspective on the performance of these platforms, which will be a baseline in the design of further experiments to tackle complex transcriptomes uncovered in a wide range of cell types.


High-Throughput Nucleotide Sequencing/methods , RNA/genetics , Transcriptome/genetics , Gene Expression Profiling , Humans , Sequence Analysis, RNA/methods
3.
PLoS One ; 7(1): e30809, 2012.
Article En | MEDLINE | ID: mdl-22303458

BACKGROUND: Cap analysis of gene expression (CAGE) is a 5' sequence tag technology to globally determine transcriptional starting sites in the genome and their expression levels and has most recently been adapted to the HeliScope single molecule sequencer. Despite significant simplifications in the CAGE protocol, it has until now been a labour intensive protocol. METHODOLOGY: In this study we set out to adapt the protocol to a robotic workflow, which would increase throughput and reduce handling. The automated CAGE cDNA preparation system we present here can prepare 96 'HeliScope ready' CAGE cDNA libraries in 8 days, as opposed to 6 weeks by a manual operator.We compare the results obtained using the same RNA in manual libraries and across multiple automation batches to assess reproducibility. CONCLUSIONS: We show that the sequencing was highly reproducible and comparable to manual libraries with an 8 fold increase in productivity. The automated CAGE cDNA preparation system can prepare 96 CAGE sequencing samples simultaneously. Finally we discuss how the system could be used for CAGE on Illumina/SOLiD platforms, RNA-seq and full-length cDNA generation.


DNA, Complementary/metabolism , Gene Expression Regulation , Sequence Analysis, DNA/instrumentation , Sequence Analysis, DNA/methods , Workflow , Animals , Automation , Base Sequence , DNA, Complementary/genetics , Gene Library , Genome, Human/genetics , Humans , Mice , Reproducibility of Results
4.
PLoS One ; 6(10): e25391, 2011.
Article En | MEDLINE | ID: mdl-21984916

BACKGROUND: Mesothelioma is a highly malignant tumor that is primarily caused by occupational or environmental exposure to asbestos fibers. Despite worldwide restrictions on asbestos usage, further cases are expected as diagnosis is typically 20-40 years after exposure. Once diagnosed there is a very poor prognosis with a median survival rate of 9 months. Considering this the development of early pre clinical diagnostic markers may help improve clinical outcomes. METHODOLOGY: Microarray expression arrays on mesothelium and other tissues dissected from mice were used to identify candidate mesothelial lineage markers. Candidates were further tested by qRTPCR and in-situ hybridization across a mouse tissue panel. Two candidate biomarkers with the potential for secretion, uroplakin 3B (UPK3B), and leucine rich repeat neuronal 4 (LRRN4) and one commercialized mesothelioma marker, mesothelin (MSLN) were then chosen for validation across a panel of normal human primary cells, 16 established mesothelioma cell lines, 10 lung cancer lines, and a further set of 8 unrelated cancer cell lines. CONCLUSIONS: Within the primary cell panel, LRRN4 was only detected in primary mesothelial cells, but MSLN and UPK3B were also detected in other cell types. MSLN was detected in bronchial epithelial cells and alveolar epithelial cells and UPK3B was detected in retinal pigment epithelial cells and urothelial cells. Testing the cell line panel, MSLN was detected in 15 of the 16 mesothelioma cells lines, whereas LRRN4 was only detected in 8 and UPK3B in 6. Interestingly MSLN levels appear to be upregulated in the mesothelioma lines compared to the primary mesothelial cells, while LRRN4 and UPK3B, are either lost or down-regulated. Despite the higher fraction of mesothelioma lines positive for MSLN, it was also detected at high levels in 2 lung cancer lines and 3 other unrelated cancer lines derived from papillotubular adenocarcinoma, signet ring carcinoma and transitional cell carcinoma.


Epithelial Cells/metabolism , Membrane Proteins/metabolism , Nerve Tissue Proteins/metabolism , Animals , Antibodies, Neoplasm/immunology , Biomarkers/metabolism , Cell Lineage , Cells, Cultured , Epithelial Cells/pathology , Epithelium/metabolism , Gene Expression Regulation , Humans , Immunohistochemistry , In Situ Hybridization , Lung/cytology , Lung/metabolism , Male , Membrane Proteins/genetics , Mesothelin , Mesothelioma/genetics , Mesothelioma/immunology , Mesothelioma/pathology , Mice , Mice, Inbred C57BL , Nerve Tissue Proteins/genetics , Oligonucleotide Array Sequence Analysis , Organ Specificity , Reverse Transcriptase Polymerase Chain Reaction , Uroplakin III/genetics , Uroplakin III/metabolism
5.
Genome Res ; 21(7): 1150-9, 2011 Jul.
Article En | MEDLINE | ID: mdl-21596820

We report the development of a simplified cap analysis of gene expression (CAGE) protocol adapted for single-molecule sequencers that avoids second strand synthesis, ligation, digestion, and PCR. HeliScopeCAGE directly sequences the 3' end of cap trapped first-strand cDNAs. As with previous versions of CAGE, we better define transcription start sites (TSS) than known models, identify novel regions of transcription and alternative promoters, and find two major classes of TSS signal, sharp peaks and broad regions. However, using this protocol, we observe reproducible evidence of regulation at the much finer level of individual TSS positions. The libraries are quantitative over 5 orders of magnitude and highly reproducible (Pearson's correlation coefficient of 0.987). We have also scaled down the sample requirement to 5 µg of total RNA for a standard HeliScopeCAGE library and 100 ng for a low-quantity version. When the same RNA was run as 5-µg and 100-ng versions, the 100 ng was still able to detect expression for ∼60% of the 13,468 loci detected by a 5-µg library using the same threshold, allowing comparative analysis of even rare cell populations. Testing the protocol for differential gene expression measurements on triplicate HeLa and THP-1 samples, we find that the log fold change compared to Illumina microarray measurements is highly correlated (0.871). In addition, HeliScopeCAGE finds differential expression for thousands more loci including those with probes on the array. Finally, although the majority of tags are 5' associated, we also observe a low level of signal on exons that is useful for defining gene structures.


Gene Expression Profiling/methods , Gene Expression , Oligonucleotide Array Sequence Analysis/methods , Chromosome Mapping , DNA, Complementary/genetics , Exons , Gene Library , HeLa Cells , Humans , Polymerase Chain Reaction , Promoter Regions, Genetic , Sequence Analysis, RNA/methods , Transcription Initiation Site , Transcription, Genetic
6.
Mol Immunol ; 47(14): 2295-302, 2010 Aug.
Article En | MEDLINE | ID: mdl-20573402

Gene regulatory networks in living cells are controlled by the interaction of multiple cell type-specific transcription regulators with DNA binding sites in target genes. Interferon regulatory factor 8 (IRF8), also known as interferon consensus sequence binding protein (ICSBP), is a transcription factor expressed predominantly in myeloid and lymphoid cell lineages. To find the functional direct target genes of IRF8, the gene expression profiles of siRNA knockdown samples and genome-wide binding locations by ChIP-chip were analyzed in THP-1 myelomonocytic leukemia cells. Consequently, 84 genes were identified as functional direct targets. The ETS family transcription factor PU.1, also known as SPI1, binds to IRF8 and regulates basal transcription in macrophages. Using the same approach, we identified 53 direct target genes of PU.1; these overlapped with 19 IRF8 targets. These 19 genes included key molecules of IFN signaling such as OAS1 and IRF9, but excluded other IFN-related genes amongst the IRF8 functional direct target genes. We suggest that IRF8 and PU.1 can have both combined, and independent actions on different promoters in myeloid cells.


Interferon Regulatory Factors/genetics , Interferon Regulatory Factors/metabolism , Base Sequence , Binding Sites/genetics , Cell Line , Chromatin Immunoprecipitation , Gene Expression Profiling , Gene Knockdown Techniques , Gene Regulatory Networks , Genetic Techniques , Humans , Models, Biological , Myeloid Cells/metabolism , Promoter Regions, Genetic , Proto-Oncogene Proteins/genetics , Proto-Oncogene Proteins/metabolism , RNA, Small Interfering/genetics , Signal Transduction , Trans-Activators/genetics , Trans-Activators/metabolism
7.
Cell ; 140(5): 744-52, 2010 Mar 05.
Article En | MEDLINE | ID: mdl-20211142

Combinatorial interactions among transcription factors are critical to directing tissue-specific gene expression. To build a global atlas of these combinations, we have screened for physical interactions among the majority of human and mouse DNA-binding transcription factors (TFs). The complete networks contain 762 human and 877 mouse interactions. Analysis of the networks reveals that highly connected TFs are broadly expressed across tissues, and that roughly half of the measured interactions are conserved between mouse and human. The data highlight the importance of TF combinations for determining cell fate, and they lead to the identification of a SMAD3/FLI1 complex expressed during development of immunity. The availability of large TF combinatorial networks in both human and mouse will provide many opportunities to study gene regulation, tissue differentiation, and mammalian evolution.


Gene Expression Regulation , Gene Regulatory Networks , Transcription Factors/metabolism , Animals , Cell Differentiation , Evolution, Molecular , Humans , Mice , Monocytes/cytology , Organ Specificity , Smad3 Protein/metabolism , Trans-Activators/metabolism
8.
Genome Res ; 20(2): 257-64, 2010 Feb.
Article En | MEDLINE | ID: mdl-20051556

MicroRNAs (miRNAs) are short (20-23 nt) RNAs that are sequence-specific mediators of transcriptional and post-transcriptional regulation of gene expression. Modern high-throughput technologies enable deep sequencing of such RNA species on an unprecedented scale. We find that the analysis of small RNA deep-sequencing libraries can be affected by cross-mapping, in which RNA sequences originating from one locus are inadvertently mapped to another. Similar to cross-hybridization on microarrays, cross-mapping is prevalent among miRNAs, as they tend to occur in families, are similar or derived from repeat or structural RNAs, or are post-transcriptionally modified. Here, we develop a strategy to correct for cross-mapping, and apply it to the analysis of RNA editing in mature miRNAs. In contrast to previous reports, our analysis suggests that RNA editing in mature miRNAs is rare in animals.


Gene Library , MicroRNAs/genetics , RNA Editing/genetics , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Animals , Base Sequence , High-Throughput Screening Assays , Humans , Mice , MicroRNAs/metabolism
9.
Nat Genet ; 41(5): 553-62, 2009 May.
Article En | MEDLINE | ID: mdl-19377474

Using deep sequencing (deepCAGE), the FANTOM4 study measured the genome-wide dynamics of transcription-start-site usage in the human monocytic cell line THP-1 throughout a time course of growth arrest and differentiation. Modeling the expression dynamics in terms of predicted cis-regulatory sites, we identified the key transcription regulators, their time-dependent activities and target genes. Systematic siRNA knockdown of 52 transcription factors confirmed the roles of individual factors in the regulatory network. Our results indicate that cellular states are constrained by complex networks involving both positive and negative regulatory interactions among substantial numbers of transcription factors and that no single transcription factor is both necessary and sufficient to drive the differentiation process.


Cell Differentiation/genetics , Cell Proliferation , Gene Regulatory Networks , Transcription, Genetic , Base Sequence , Cell Line , Gene Expression Profiling , Humans , Leukemia, Myeloid/genetics , Leukemia, Myeloid/metabolism , Models, Genetic , Molecular Sequence Data , Oligonucleotide Array Sequence Analysis , Promoter Regions, Genetic , RNA, Small Interfering/metabolism
10.
J Biol Chem ; 282(15): 11122-34, 2007 Apr 13.
Article En | MEDLINE | ID: mdl-17308308

The survival of motor neuron (SMN) protein, responsible for the neurodegenerative disease spinal muscular atrophy (SMA), oligomerizes and forms a stable complex with seven other major components, the Gemin proteins. Besides the SMN protein, Gemin2 is a core protein that is essential for the formation of the SMN complex, although the mechanism by which it drives formation is unclear. We have found a novel interaction, a Gemin2 self-association, using the mammalian two-hybrid system and the in vitro pull-down assays. Using in vitro dissociation assays, we also found that the self-interaction of the amino-terminal SMN protein, which was confirmed in this study, became stable in the presence of Gemin2. In addition, Gemin2 knockdown using small interference RNA treatment revealed a drastic decrease in SMN oligomer formation and in the assembly activity of spliceosomal small nuclear ribonucleoprotein (snRNP). Taken together, these results indicate that Gemin2 plays an important role in snRNP assembly through the stabilization of the SMN oligomer/complex via novel self-interaction. Applying the results/techniques to amino-terminal SMN missense mutants that were recently identified from SMA patients, we successfully showed that amino-terminal self-association, Gemin2 binding, the stabilization effect of Gemin2, and snRNP assembly activity were all lowered in the mutant SMN(D44V), suggesting that instability of the amino-terminal SMN self-association may cause SMA in patients carrying this allele.


Cyclic AMP Response Element-Binding Protein/metabolism , Nerve Tissue Proteins/metabolism , RNA-Binding Proteins/metabolism , Animals , Cyclic AMP Response Element-Binding Protein/genetics , HeLa Cells , Humans , Mice , Mutation/genetics , Nerve Tissue Proteins/genetics , Protein Binding , RNA-Binding Proteins/genetics , Ribonucleoproteins, Small Nuclear/metabolism , SMN Complex Proteins
12.
PLoS Genet ; 2(4): e62, 2006 Apr.
Article En | MEDLINE | ID: mdl-16683036

The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.


DNA, Complementary/genetics , Databases, Genetic , Mice/genetics , Transcription, Genetic , Animals , Automation , DNA, Complementary/chemistry , Genome
13.
Nat Genet ; 38(6): 626-35, 2006 Jun.
Article En | MEDLINE | ID: mdl-16645617

Mammalian promoters can be separated into two classes, conserved TATA box-enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3' UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.


Evolution, Molecular , Promoter Regions, Genetic , 3' Untranslated Regions , Animals , Base Sequence , DNA , Genome , Proteome , TATA Box
14.
Genome Biol ; 6(12): R98, 2005.
Article En | MEDLINE | ID: mdl-16356270

BACKGROUND: Although 2,061 proteins of Pyrococcus horikoshii OT3, a hyperthermophilic archaeon, have been predicted from the recently completed genome sequence, the majority of proteins show no similarity to those from other organisms and are thus hypothetical proteins of unknown function. Because most proteins operate as parts of complexes to regulate biological processes, we systematically analyzed protein-protein interactions in Pyrococcus using the mammalian two-hybrid system to determine the function of the hypothetical proteins. RESULTS: We examined 960 soluble proteins from Pyrococcus and selected 107 interactions based on luciferase reporter activity, which was then evaluated using a computational approach to assess the reliability of the interactions. We also analyzed the expression of the assay samples by western blot, and a few interactions by in vitro pull-down assays. We identified 11 hetero-interactions that we considered to be located at the same operon, as observed in Helicobacter pylori. We annotated and classified proteins in the selected interactions according to their orthologous proteins. Many enzyme proteins showed self-interactions, similar to those seen in other organisms. CONCLUSION: We found 13 unannotated proteins that interacted with annotated proteins; this information is useful for predicting the functions of the hypothetical Pyrococcus proteins from the annotations of their interacting partners. Among the heterogeneous interactions, proteins were more likely to interact with proteins within the same ortholog class than with proteins of different classes. The analysis described here can provide global insights into the biological features of the protein-protein interactions in P. horikoshii.


Protein Interaction Mapping , Pyrococcus horikoshii/metabolism , Genes, Archaeal/genetics , Genome, Archaeal/genetics , Multigene Family/genetics , Open Reading Frames/genetics , Protein Binding , Pyrococcus horikoshii/classification , Pyrococcus horikoshii/genetics
...