Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
1.
Genome Res ; 24(1): 14-24, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24092820

ABSTRACT

Understanding the consequences of regulatory variation in the human genome remains a major challenge, with important implications for understanding gene regulation and interpreting the many disease-risk variants that fall outside of protein-coding regions. Here, we provide a direct window into the regulatory consequences of genetic variation by sequencing RNA from 922 genotyped individuals. We present a comprehensive description of the distribution of regulatory variation--by the specific expression phenotypes altered, the properties of affected genes, and the genomic characteristics of regulatory variants. We detect variants influencing expression of over ten thousand genes, and through the enhanced resolution offered by RNA-sequencing, for the first time we identify thousands of variants associated with specific phenotypes including splicing and allelic expression. Evaluating the effects of both long-range intra-chromosomal and trans (cross-chromosomal) regulation, we observe modularity in the regulatory network, with three-dimensional chromosomal configuration playing a particular role in regulatory modules within each chromosome. We also observe a significant depletion of regulatory variants affecting central and critical genes, along with a trend of reduced effect sizes as variant frequency increases, providing evidence that purifying selection and buffering have limited the deleterious impact of regulatory variation on the cell. Further, generalizing beyond observed variants, we have analyzed the genomic properties of variants associated with expression and splicing and developed a Bayesian model to predict regulatory consequences of genetic variants, applicable to the interpretation of individual genomes and disease studies. Together, these results represent a critical step toward characterizing the complete landscape of human regulatory variation.


Subject(s)
Genetic Variation , Quantitative Trait Loci , Sequence Analysis, RNA , Transcriptome , Bayes Theorem , Chromosomes, Human , Genome, Human , Genotype , Humans , Phenotype , Polymorphism, Single Nucleotide , Regulatory Sequences, Ribonucleic Acid
2.
Nature ; 463(7278): 191-6, 2010 Jan 14.
Article in English | MEDLINE | ID: mdl-20016485

ABSTRACT

All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.


Subject(s)
Genes, Neoplasm/genetics , Genome, Human/genetics , Mutation/genetics , Neoplasms/genetics , Adult , Cell Line, Tumor , DNA Damage/genetics , DNA Mutational Analysis , DNA Repair/genetics , Gene Dosage/genetics , Humans , Loss of Heterozygosity/genetics , Male , Melanoma/etiology , Melanoma/genetics , MicroRNAs/genetics , Mutagenesis, Insertional/genetics , Neoplasms/etiology , Polymorphism, Single Nucleotide/genetics , Precision Medicine , Sequence Deletion/genetics , Ultraviolet Rays
3.
Nature ; 452(7184): 215-9, 2008 Mar 13.
Article in English | MEDLINE | ID: mdl-18278030

ABSTRACT

Cytosine DNA methylation is important in regulating gene expression and in silencing transposons and other repetitive sequences. Recent genomic studies in Arabidopsis thaliana have revealed that many endogenous genes are methylated either within their promoters or within their transcribed regions, and that gene methylation is highly correlated with transcription levels. However, plants have different types of methylation controlled by different genetic pathways, and detailed information on the methylation status of each cytosine in any given genome is lacking. To this end, we generated a map at single-base-pair resolution of methylated cytosines for Arabidopsis, by combining bisulphite treatment of genomic DNA with ultra-high-throughput sequencing using the Illumina 1G Genome Analyser and Solexa sequencing technology. This approach, termed BS-Seq, unlike previous microarray-based methods, allows one to sensitively measure cytosine methylation on a genome-wide scale within specific sequence contexts. Here we describe methylation on previously inaccessible components of the genome and analyse the DNA methylation sequence composition and distribution. We also describe the effect of various DNA methylation mutants on genome-wide methylation patterns, and demonstrate that our newly developed library construction and computational methods can be applied to large genomes such as that of mouse.


Subject(s)
Arabidopsis/genetics , DNA Methylation , Genome, Plant/genetics , Sequence Analysis, DNA/methods , Sulfites/metabolism , 5-Methylcytosine/metabolism , Animals , Base Sequence , Computational Biology , Cytosine/metabolism , Gene Expression Regulation, Plant/genetics , Gene Library , Mice , Mutation/genetics , Promoter Regions, Genetic/genetics , Reproducibility of Results , Uracil/metabolism
4.
Nat Methods ; 5(3): 247-52, 2008 Mar.
Article in English | MEDLINE | ID: mdl-18297082

ABSTRACT

High-density single-nucleotide polymorphism (SNP) arrays have revolutionized the ability of genome-wide association studies to detect genomic regions harboring sequence variants that affect complex traits. Extensive numbers of validated SNPs with known allele frequencies are essential to construct genotyping assays with broad utility. We describe an economical, efficient, single-step method for SNP discovery, validation and characterization that uses deep sequencing of reduced representation libraries (RRLs) from specified target populations. Using nearly 50 million sequences generated on an Illumina Genome Analyzer from DNA of 66 cattle representing three populations, we identified 62,042 putative SNPs and predicted their allele frequencies. Genotype data for these 66 individuals validated 92% of 23,357 selected genome-wide SNPs, with a genotypic and sequence allele frequency correlation of r = 0.67. This approach for simultaneous de novo discovery of high-quality SNPs and population characterization of allele frequencies may be applied to any species with at least a partially sequenced genome.


Subject(s)
Computational Biology/methods , Gene Frequency , Polymorphism, Single Nucleotide , Sequence Analysis, DNA/methods , Animals , Cattle , Genomic Library , Genotype
5.
Methods Mol Biol ; 354: 105-19, 2007.
Article in English | MEDLINE | ID: mdl-17172749

ABSTRACT

Massively parallel signature sequencing is a sequencing-based method that provides quantitative gene expression data for nearly all transcripts in a particular ribonucleic acid sample. Although the sequencing technology is practiced as a service by a California-based company, we have developed methods for the handling and analysis of these data. This chapter describes the steps involved in obtaining data from massively parallel signature sequencing, aligning the signatures to genomic sequence, identifying novel transcripts, and performing quantitative analyses of genes expressed under conditions such as disease treatments.


Subject(s)
Arabidopsis/genetics , Arabidopsis/immunology , Gene Expression Regulation, Plant , Genes, Plant/genetics , Plant Diseases/genetics , Sequence Analysis, DNA/methods , Databases, Genetic , Gene Library , Plant Diseases/immunology , RNA, Messenger/analysis , RNA, Messenger/genetics , RNA, Plant/analysis , RNA, Plant/genetics , User-Computer Interface
6.
Nat Biotechnol ; 22(8): 1006-11, 2004 Aug.
Article in English | MEDLINE | ID: mdl-15247925

ABSTRACT

Large-scale sequencing of short mRNA-derived tags can establish the qualitative and quantitative characteristics of a complex transcriptome. We sequenced 12,304,362 tags from five diverse libraries of Arabidopsis thaliana using massively parallel signature sequencing (MPSS). A total of 48,572 distinct signatures, each representing a different transcript, were expressed at significant levels. These signatures were compared to the annotation of the A. thaliana genomic sequence; in the five libraries, this comparison yielded between 17,353 and 18,361 genes with sense expression, and between 5,487 and 8,729 genes with antisense expression. An additional 6,691 MPSS signatures mapped to unannotated regions of the genome. Expression was demonstrated for 1,168 genes for which expression data were previously unknown. Alternative polyadenylation was observed for more than 25% of A. thaliana genes transcribed in these libraries. The MPSS expression data suggest that the A. thaliana transcriptome is complex and contains many as-yet uncharacterized variants of normal coding transcripts.


Subject(s)
Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism , Arabidopsis/genetics , Arabidopsis/metabolism , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Transcription, Genetic/genetics , Computing Methodologies , Expressed Sequence Tags , Gene Expression Profiling/methods , Gene Expression Regulation, Plant/genetics , Genome, Plant , Peptide Library
7.
BMC Genomics ; 7: 310, 2006 Dec 08.
Article in English | MEDLINE | ID: mdl-17156450

ABSTRACT

BACKGROUND: Rice blast, caused by the fungal pathogen Magnaporthe grisea, is a devastating disease causing tremendous yield loss in rice production. The public availability of the complete genome sequence of M. grisea provides ample opportunities to understand the molecular mechanism of its pathogenesis on rice plants at the transcriptome level. To identify all the expressed genes encoded in the fungal genome, we have analyzed the mycelium and appressorium transcriptomes using massively parallel signature sequencing (MPSS), robust-long serial analysis of gene expression (RL-SAGE) and oligoarray methods. RESULTS: The MPSS analyses identified 12,531 and 12,927 distinct significant tags from mycelia and appressoria, respectively, while the RL-SAGE analysis identified 16,580 distinct significant tags from the mycelial library. When matching these 12,531 mycelial and 12,927 appressorial significant tags to the annotated CDS, 500 bp upstream and 500 bp downstream of CDS, 6,735 unique genes in mycelia and 7,686 unique genes in appressoria were identified. A total of 7,135 mycelium-specific and 7,531 appressorium-specific significant MPSS tags were identified, which correspond to 2,088 and 1,784 annotated genes, respectively, when matching to the same set of reference sequences. Nearly 85% of the significant MPSS tags from mycelia and appressoria and 65% of the significant tags from the RL-SAGE mycelium library matched to the M. grisea genome. MPSS and RL-SAGE methods supported the expression of more than 9,000 genes, representing over 80% of the predicted genes in M. grisea. About 40% of the MPSS tags and 55% of the RL-SAGE tags represent novel transcripts since they had no matches in the existing M. grisea EST collections. Over 19% of the annotated genes were found to produce both sense and antisense tags in the protein-coding region. The oligoarray analysis identified the expression of 3,793 mycelium-specific and 4,652 appressorium-specific genes. A total of 2,430 mycelial genes and 1,886 appressorial genes were identified by both MPSS and oligoarray. CONCLUSION: The comprehensive and deep transcriptome analysis by MPSS and RL-SAGE methods identified many novel sense and antisense transcripts in the M. grisea genome at two important growth stages. The differentially expressed transcripts that were identified, especially those specifically expressed in appressoria, represent a genomic resource useful for gaining a better understanding of the molecular basis of M. grisea pathogenicity. Further analysis of the novel antisense transcripts will provide new insights into the regulation and function of these genes in fungal growth, development and pathogenesis in the host plants.


Subject(s)
Gene Expression Regulation, Fungal , Magnaporthe/genetics , Oligonucleotide Array Sequence Analysis , Transcription, Genetic , DNA, Fungal/genetics , Expressed Sequence Tags , Genetic Techniques , Magnaporthe/pathogenicity , Mycelium/genetics , RNA, Antisense/genetics
8.
Methods Mol Biol ; 331: 285-311, 2006.
Article in English | MEDLINE | ID: mdl-16881523

ABSTRACT

Massively parallel signature sequencing is an ultra-high throughput sequencing technology. It can simultaneously sequence millions of sequence tags, and, therefore, is ideal for whole genome analysis. When applied to expression profiling, it reveals almost every transcript in the sample and provides its accurate expression level. This chapter describes the technology and its application in establishing stem cell transcriptome databases.


Subject(s)
Databases, Genetic , Gene Expression Profiling/methods , Genomics/methods , Pluripotent Stem Cells/physiology , Transcription, Genetic , Cell Culture Techniques/methods , Gene Library , Genome, Human , Humans , Pluripotent Stem Cells/cytology , Sequence Analysis, DNA/methods
9.
Nat Genet ; 44(7): 751-9, 2012 Jun 10.
Article in English | MEDLINE | ID: mdl-22683710

ABSTRACT

The molecular pathogenesis of renal cell carcinoma (RCC) is poorly understood. Whole-genome and exome sequencing followed by innovative tumorgraft analyses (to accurately determine mutant allele ratios) identified several putative two-hit tumor suppressor genes, including BAP1. The BAP1 protein, a nuclear deubiquitinase, is inactivated in 15% of clear cell RCCs. BAP1 cofractionates with and binds to HCF-1 in tumorgrafts. Mutations disrupting the HCF-1 binding motif impair BAP1-mediated suppression of cell proliferation but not deubiquitination of monoubiquitinated histone 2A lysine 119 (H2AK119ub1). BAP1 loss sensitizes RCC cells in vitro to genotoxic stress. Notably, mutations in BAP1 and PBRM1 anticorrelate in tumors (P = 3 × 10(-5)), [corrected] and combined loss of BAP1 and PBRM1 in a few RCCs was associated with rhabdoid features (q = 0.0007). BAP1 and PBRM1 regulate seemingly different gene expression programs, and BAP1 loss was associated with high tumor grade (q = 0.0005). Our results establish the foundation for an integrated pathological and molecular genetic classification of RCC, paving the way for subtype-specific treatments exploiting genetic vulnerabilities.


Subject(s)
Carcinoma, Renal Cell/genetics , Carcinoma, Renal Cell/pathology , Kidney Neoplasms/genetics , Kidney Neoplasms/pathology , Tumor Suppressor Proteins/deficiency , Tumor Suppressor Proteins/genetics , Ubiquitin Thiolesterase/deficiency , Ubiquitin Thiolesterase/genetics , Aged , Carcinoma, Renal Cell/metabolism , Cell Growth Processes/physiology , Cells, Cultured , DNA-Binding Proteins , Exome , Female , Gene Expression/genetics , Host Cell Factor C1/genetics , Host Cell Factor C1/metabolism , Humans , Kidney Neoplasms/metabolism , Male , Middle Aged , Mutation , Nuclear Proteins/genetics , Nuclear Proteins/metabolism , Protein Interaction Domains and Motifs , Transcription Factors/genetics , Transcription Factors/metabolism , Tumor Suppressor Proteins/metabolism , Ubiquitin Thiolesterase/metabolism
10.
Genome Biol ; 11(10): R102, 2010.
Article in English | MEDLINE | ID: mdl-20961407

ABSTRACT

BACKGROUND: A comprehensive transcriptome survey, or gene atlas, provides information essential for a complete understanding of the genomic biology of an organism. We present an atlas of RNA abundance for 92 adult, juvenile and fetal cattle tissues and three cattle cell lines. RESULTS: The Bovine Gene Atlas was generated from 7.2 million unique digital gene expression tag sequences (300.2 million total raw tag sequences), from which 1.59 million unique tag sequences were identified that mapped to the draft bovine genome accounting for 85% of the total raw tag abundance. Filtering these tags yielded 87,764 unique tag sequences that unambiguously mapped to 16,517 annotated protein-coding loci in the draft genome accounting for 45% of the total raw tag abundance. Clustering of tissues based on tag abundance profiles generally confirmed ontology classification based on anatomy. There were 5,429 constitutively expressed loci and 3,445 constitutively expressed unique tag sequences mapping outside annotated gene boundaries that represent a resource for enhancing current gene models. Physical measures such as inferred transcript length or antisense tag abundance identified tissues with atypical transcriptional tag profiles. We report for the first time the tissue-specific variation in the proportion of mitochondrial transcriptional tag abundance. CONCLUSIONS: The Bovine Gene Atlas is the deepest and broadest transcriptome survey of any livestock genome to date. Commonalities and variation in sense and antisense transcript tag profiles identified in different tissues facilitate the examination of the relationship between gene expression, tissue, and gene function.


Subject(s)
Cattle/genetics , Expressed Sequence Tags , Genome , Molecular Sequence Annotation , Animals , Cattle/classification , Cell Line , Chromosome Mapping , Female , Gene Expression , Gene Expression Profiling , Genes, Mitochondrial , Male , Molecular Sequence Annotation/methods , Proteomics
11.
Proc Natl Acad Sci U S A ; 104(7): 2313-8, 2007 Feb 13.
Article in English | MEDLINE | ID: mdl-17277080

ABSTRACT

Compared with understanding of biological shape and form, knowledge is sparse regarding what regulates growth and body size of a species. For example, the genetic and physiological causes of heterosis (hybrid vigor) have remained elusive for nearly a century. Here, we investigate gene-expression patterns underlying growth heterosis in the Pacific oyster (Crassostrea gigas) in two partially inbred (f = 0.375) and two hybrid larval populations produced by a reciprocal cross between the two inbred families. We cloned cDNA and generated 4.5 M sequence tags with massively parallel signature sequencing. The sequences contain 23,274 distinct signatures that are expressed at statistically nonzero levels and show a highly positively skewed distribution with median and modal counts of 9.25 million and 3 transcripts per million, respectively. For nearly half of these signatures, expression level depends on genotype and is predominantly nonadditive (hybrids deviate from the inbred average). Statistical contrasts suggest approximately 350 candidate genes for growth heterosis that exhibit concordant nonadditive expression in reciprocal hybrids; this represents only approximately 1.5% of the >20,000 transcripts. Patterns of gene expression, which include dominance for low expression and even underdominance of expression, are more complex than predicted from classical dominant or overdominant explanations of heterosis. Preliminary identification of ribosomal proteins among candidate genes supports the suggestion from previous studies that efficiency of protein metabolism plays a role in growth heterosis.


Subject(s)
Crassostrea/genetics , Gene Expression Regulation/physiology , Growth/genetics , Hybrid Vigor , Larva/genetics , RNA, Messenger/analysis , Animals , Genome , Molecular Sequence Data , Ribosomal Proteins/analysis , Ribosomal Proteins/genetics
12.
Proc Natl Acad Sci U S A ; 104(41): 16245-50, 2007 Oct 09.
Article in English | MEDLINE | ID: mdl-17913878

ABSTRACT

Transcription factors play a key role in integrating and modulating biological information. In this study, we comprehensively measured the changing abundances of mRNAs over a time course of activation of human peripheral-blood-derived mononuclear cells ("macrophages") with lipopolysaccharide. Global and dynamic analysis of transcription factors in response to a physiological stimulus has yet to be achieved in a human system, and our efforts significantly advanced this goal. We used multiple global high-throughput technologies for measuring mRNA levels, including massively parallel signature sequencing and GeneChip microarrays. We identified 92 of 1,288 known human transcription factors as having significantly measurable changes during our 24-h time course. At least 42 of these changes were previously unidentified in this system. Our data demonstrate that some transcription factors operate in a functional range below 10 transcripts per cell, whereas others operate in a range three orders of magnitude greater. The highly reproducible response of many mRNAs indicates feedback control. A broad range of activation kinetics was observed; thus, combinatorial regulation by small subsets of transcription factors would permit almost any timing input to cis-regulatory elements controlling gene transcription.


Subject(s)
Leukocytes, Mononuclear/drug effects , Leukocytes, Mononuclear/metabolism , Lipopolysaccharides/pharmacology , Transcription Factors/genetics , Gene Expression/drug effects , Humans , In Vitro Techniques , Macrophages/drug effects , Macrophages/metabolism , Oligonucleotide Array Sequence Analysis , RNA, Messenger/genetics , Systems Biology
13.
Science ; 309(5740): 1567-9, 2005 Sep 02.
Article in English | MEDLINE | ID: mdl-16141074

ABSTRACT

Small RNAs play important regulatory roles in most eukaryotes, but only a small proportion of these molecules have been identified. We sequenced more than two million small RNAs from seedlings and the inflorescence of the model plant Arabidopsis thaliana. Known and new microRNAs (miRNAs) were among the most abundant of the nonredundant set of more than 75,000 sequences, whereas more than half represented lower abundance small interfering RNAs (siRNAs) that match repetitive sequences, intergenic regions, and genes. Individual or clusters of highly regulated small RNAs were readily observed. Targets of antisense RNA or miRNA did not appear to be preferentially associated with siRNAs. Many genomic regions previously considered featureless were found to be sites of numerous small RNAs.


Subject(s)
Arabidopsis/genetics , Genome, Plant , MicroRNAs/biosynthesis , RNA, Plant/biosynthesis , RNA, Small Interfering/biosynthesis , Arabidopsis/metabolism , Chromosome Mapping , Gene Expression Regulation, Plant , MicroRNAs/chemistry , MicroRNAs/genetics , RNA, Plant/chemistry , RNA, Plant/genetics , RNA, Small Interfering/chemistry , RNA, Small Interfering/genetics , Sequence Analysis, RNA , Transcription, Genetic
14.
Genome Res ; 15(7): 1007-14, 2005 Jul.
Article in English | MEDLINE | ID: mdl-15998913

ABSTRACT

We have used massively parallel signature sequencing (MPSS) to sample the transcriptomes of 32 normal human tissues to an unprecedented depth, thus documenting the patterns of expression of almost 20,000 genes with high sensitivity and specificity. The data confirm the widely held belief that differences in gene expression between cell and tissue types are largely determined by transcripts derived from a limited number of tissue-specific genes, rather than by combinations of more promiscuously expressed genes. Expression of a little more than half of all known human genes seems to account for both the common requirements and the specific functions of the tissues sampled. A classification of tissues based on patterns of gene expression largely reproduces classifications based on anatomical and biochemical properties. The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized. This data set should prove useful for the identification of tissue-specific genes, for the study of global changes induced by pathological conditions, and for the definition of a minimal set of genes necessary for basic cell maintenance. The data are available on the Web at http://mpss.licr.org and http://sgb.lynxgen.com.


Subject(s)
Gene Expression , Algorithms , Expressed Sequence Tags , Gene Expression Profiling/methods , Genetic Techniques , Humans , Organ Specificity , RNA, Messenger/genetics
15.
Genome Res ; 14(8): 1641-53, 2004 Aug.
Article in English | MEDLINE | ID: mdl-15289482

ABSTRACT

We have generated 36,991,173 17-base sequence "signatures" representing transcripts from the model plant Arabidopsis. These data were derived by massively parallel signature sequencing (MPSS) from 14 libraries and comprised 268,132 distinct sequences. Comparable data were also obtained with 20-base signatures. We developed a method for handling these data and for comparing these signatures to the annotated Arabidopsis genome. As part of this procedure, 858,019 potential or "genomic" signatures were extracted from the Arabidopsis genome and classified based on the position and orientation of the signatures relative to annotated genes. A comparison of genomic and expressed signatures matched 67,735 signatures predicted to be derived from distinct transcripts and expressed at significant levels. Expressed signatures were derived from the sense strand of at least 19,088 of 29,084 annotated genes. A comparison of the genomic and expression signatures demonstrated that approximately 7.7% of genomic signatures were underrepresented in the expression data. These genomic signatures contained one of 20 four-base words that were consistently associated with reduced MPSS abundances. More than 89% of the sum of the expressed signature abundances matched the Arabidopsis genome, and many of the unmatched signatures found in high abundances were predicted to match to previously uncharacterized transcripts.


Subject(s)
Arabidopsis/genetics , Gene Expression Profiling/methods , Genome, Plant , Transcription, Genetic , Base Sequence , Computational Biology , Expressed Sequence Tags , Genomic Library , RNA, Messenger/genetics
SELECTION OF CITATIONS
SEARCH DETAIL