Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 118
Filter
1.
Cell ; 182(1): 145-161.e23, 2020 07 09.
Article in English | MEDLINE | ID: mdl-32553272

ABSTRACT

Structural variants (SVs) underlie important crop improvement and domestication traits. However, resolving the extent, diversity, and quantitative impact of SVs has been challenging. We used long-read nanopore sequencing to capture 238,490 SVs in 100 diverse tomato lines. This panSV genome, along with 14 new reference assemblies, revealed large-scale intermixing of diverse genotypes, as well as thousands of SVs intersecting genes and cis-regulatory regions. Hundreds of SV-gene pairs exhibit subtle and significant expression changes, which could broadly influence quantitative trait variation. By combining quantitative genetics with genome editing, we show how multiple SVs that changed gene dosage and expression levels modified fruit flavor, size, and production. In the last example, higher order epistasis among four SVs affecting three related transcription factors allowed introduction of an important harvesting trait in modern tomato. Our findings highlight the underexplored role of SVs in genotype-to-phenotype relationships and their widespread importance and utility in crop improvement.


Subject(s)
Crops, Agricultural/genetics , Gene Expression Regulation, Plant , Genomic Structural Variation , Solanum lycopersicum/genetics , Alleles , Cytochrome P-450 Enzyme System/genetics , Ecotype , Epistasis, Genetic , Fruit/genetics , Gene Duplication , Genome, Plant , Genotype , Inbreeding , Molecular Sequence Annotation , Phenotype , Plant Breeding , Quantitative Trait Loci/genetics
2.
Cell ; 146(6): 1029-41, 2011 Sep 16.
Article in English | MEDLINE | ID: mdl-21925323

ABSTRACT

During germ cell and preimplantation development, mammalian cells undergo nearly complete reprogramming of DNA methylation patterns. We profiled the methylomes of human and chimp sperm as a basis for comparison to methylation patterns of ESCs. Although the majority of promoters escape methylation in both ESCs and sperm, the corresponding hypomethylated regions show substantial structural differences. Repeat elements are heavily methylated in both germ and somatic cells; however, retrotransposons from several subfamilies evade methylation more effectively during male germ cell development, whereas other subfamilies show the opposite trend. Comparing methylomes of human and chimp sperm revealed a subset of differentially methylated promoters and strikingly divergent methylation in retrotransposon subfamilies, with an evolutionary impact that is apparent in the underlying genomic sequence. Thus, the features that determine DNA methylation patterns differ between male germ cells and somatic cells, and elements of these features have diverged between humans and chimpanzees.


Subject(s)
DNA Methylation , Epigenesis, Genetic , Pan troglodytes/genetics , Animals , Centromere/metabolism , Embryonic Stem Cells/metabolism , Genomics , Humans , Male , Primates/genetics , Promoter Regions, Genetic , Spermatozoa/metabolism
3.
Am J Hum Genet ; 109(4): 631-646, 2022 04 07.
Article in English | MEDLINE | ID: mdl-35290762

ABSTRACT

Studies of de novo mutation (DNM) have typically excluded some of the most repetitive and complex regions of the genome because these regions cannot be unambiguously mapped with short-read sequencing data. To better understand the genome-wide pattern of DNM, we generated long-read sequence data from an autism parent-child quad with an affected female where no pathogenic variant had been discovered in short-read Illumina sequence data. We deeply sequenced all four individuals by using three sequencing platforms (Illumina, Oxford Nanopore, and Pacific Biosciences) and three complementary technologies (Strand-seq, optical mapping, and 10X Genomics). Using long-read sequencing, we initially discovered and validated 171 DNMs across two children-a 20% increase in the number of de novo single-nucleotide variants (SNVs) and indels when compared to short-read callsets. The number of DNMs further increased by 5% when considering a more complete human reference (T2T-CHM13) because of the recovery of events in regions absent from GRCh38 (e.g., three DNMs in heterochromatic satellites). In total, we validated 195 de novo germline mutations and 23 potential post-zygotic mosaic mutations across both children; the overall true substitution rate based on this integrated callset is at least 1.41 × 10-8 substitutions per nucleotide per generation. We also identified six de novo insertions and deletions in tandem repeats, two of which represent structural variants. We demonstrate that long-read sequencing and assembly, especially when combined with a more complete reference genome, increases the number of DNMs by >25% compared to previous studies, providing a more complete catalog of DNM compared to short-read data alone.


Subject(s)
Genomics , High-Throughput Nucleotide Sequencing , Female , Humans , Mutation/genetics , Nucleotides , Sequence Analysis, DNA , Software
4.
Mol Psychiatry ; 2024 May 04.
Article in English | MEDLINE | ID: mdl-38704507

ABSTRACT

Schizophrenia affects approximately 1% of the world population. Genetics, epigenetics, and environmental factors are known to play a role in this psychiatric disorder. While there is a high concordance in monozygotic twins, about half of twin pairs are discordant for schizophrenia. To address the question of how and when concordance in monozygotic twins occur, we have obtained fibroblasts from two pairs of schizophrenia discordant twins (one sibling with schizophrenia while the second one is unaffected by schizophrenia) and three pairs of healthy twins (both of the siblings are healthy). We have prepared iPSC models for these 3 groups of patients with schizophrenia, unaffected co-twins, and the healthy twins. When the study started the co-twins were considered healthy and unaffected but both the co-twins were later diagnosed with a depressive disorder. The reprogrammed iPSCs were differentiated into hippocampal neurons to measure the neurophysiological abnormalities in the patients. We found that the neurons derived from the schizophrenia patients were less arborized, were hypoexcitable with immature spike features, and exhibited a significant reduction in synaptic activity with dysregulation in synapse-related genes. Interestingly, the neurons derived from the co-twin siblings who did not have schizophrenia formed another distinct group that was different from the neurons in the group of the affected twin siblings but also different from the neurons in the group of the control twins. Importantly, their synaptic activity was not affected. Our measurements that were obtained from schizophrenia patients and their monozygotic twin and compared also to control healthy twins point to hippocampal synaptic deficits as a central mechanism in schizophrenia.

5.
Cell ; 137(3): 522-35, 2009 May 01.
Article in English | MEDLINE | ID: mdl-19395010

ABSTRACT

In Drosophila gonads, Piwi proteins and associated piRNAs collaborate with additional factors to form a small RNA-based immune system that silences mobile elements. Here, we analyzed nine Drosophila piRNA pathway mutants for their impacts on both small RNA populations and the subcellular localization patterns of Piwi proteins. We find that distinct piRNA pathways with differing components function in ovarian germ and somatic cells. In the soma, Piwi acts singularly with the conserved flamenco piRNA cluster to enforce silencing of retroviral elements that may propagate by infecting neighboring germ cells. In the germline, silencing programs encoded within piRNA clusters are optimized via a slicer-dependent amplification loop to suppress a broad spectrum of elements. The classes of transposons targeted by germline and somatic piRNA clusters, though not the precise elements, are conserved among Drosophilids, demonstrating that the architecture of piRNA clusters has coevolved with the transposons that they are tasked to control.


Subject(s)
Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Ovary/metabolism , RNA, Small Interfering/genetics , RNA, Small Interfering/metabolism , Animals , Female , Gene Silencing , Mutation , Ovary/cytology , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Retroelements
6.
Genome Res ; 30(9): 1258-1273, 2020 09.
Article in English | MEDLINE | ID: mdl-32887686

ABSTRACT

Improved identification of structural variants (SVs) in cancer can lead to more targeted and effective treatment options as well as advance our basic understanding of the disease and its progression. We performed whole-genome sequencing of the SKBR3 breast cancer cell line and patient-derived tumor and normal organoids from two breast cancer patients using Illumina/10x Genomics, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT) sequencing. We then inferred SVs and large-scale allele-specific copy number variants (CNVs) using an ensemble of methods. Our findings show that long-read sequencing allows for substantially more accurate and sensitive SV detection, with between 90% and 95% of variants supported by each long-read technology also supported by the other. We also report high accuracy for long reads even at relatively low coverage (25×-30×). Furthermore, we integrated SV and CNV data into a unifying karyotype-graph structure to present a more accurate representation of the mutated cancer genomes. We find hundreds of variants within known cancer-related genes detectable only through long-read sequencing. These findings highlight the need for long-read sequencing of cancer genomes for the precise analysis of their genetic instability.


Subject(s)
Breast Neoplasms/genetics , Genomic Structural Variation , Whole Genome Sequencing/methods , Cell Line, Tumor , DNA Copy Number Variations , DNA Methylation , DNA, Neoplasm , Female , Humans , Nanopores , Organoids , RNA-Seq
7.
Cell ; 135(5): 852-64, 2008 Nov 28.
Article in English | MEDLINE | ID: mdl-19012953

ABSTRACT

Cancers are highly heterogeneous and contain many passenger and driver mutations. To functionally identify tumor suppressor genes relevant to human cancer, we compiled pools of short hairpin RNAs (shRNAs) targeting the mouse orthologs of genes recurrently deleted in a series of human hepatocellular carcinomas and tested their ability to promote tumorigenesis in a mosaic mouse model. In contrast to randomly selected shRNA pools, many deletion-specific pools accelerated hepatocarcinogenesis in mice. Through further analysis, we identified and validated 13 tumor suppressor genes, 12 of which had not been linked to cancer before. One gene, XPO4, encodes a nuclear export protein whose substrate, EIF5A2, is amplified in human tumors, is required for proliferation of XPO4-deficient tumor cells, and promotes hepatocellular carcinoma in mice. Our results establish the feasibility of in vivo RNAi screens and illustrate how combining cancer genomics, RNA interference, and mosaic mouse models can facilitate the functional annotation of the cancer genome.


Subject(s)
Carcinoma, Hepatocellular/genetics , Genes, Tumor Suppressor , Genomics , Liver Neoplasms/genetics , RNA Interference , Animals , Humans , Karyopherins/genetics , Karyopherins/metabolism , Mice , Peptide Initiation Factors/genetics , RNA, Untranslated/genetics , RNA-Binding Proteins/genetics , Smad3 Protein/metabolism , Eukaryotic Translation Initiation Factor 5A
8.
Nature ; 546(7659): 524-527, 2017 06 22.
Article in English | MEDLINE | ID: mdl-28605751

ABSTRACT

Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.


Subject(s)
Genome, Plant/genetics , High-Throughput Nucleotide Sequencing/methods , Single Molecule Imaging/methods , Zea mays/genetics , Centromere/genetics , Chromosomes, Plant/genetics , Contig Mapping , Crops, Agricultural/genetics , DNA Transposable Elements/genetics , DNA, Intergenic/genetics , Genes, Plant/genetics , Molecular Sequence Annotation , Optics and Photonics , Phylogeny , RNA, Messenger/analysis , RNA, Messenger/genetics , Reference Standards , Sorghum/genetics
9.
Nat Rev Genet ; 17(6): 333-51, 2016 05 17.
Article in English | MEDLINE | ID: mdl-27184599

ABSTRACT

Since the completion of the human genome project in 2003, extraordinary progress has been made in genome sequencing technologies, which has led to a decreased cost per megabase and an increase in the number and diversity of sequenced genomes. An astonishing complexity of genome architecture has been revealed, bringing these sequencing technologies to even greater advancements. Some approaches maximize the number of bases sequenced in the least amount of time, generating a wealth of data that can be used to understand increasingly complex phenotypes. Alternatively, other approaches now aim to sequence longer contiguous pieces of DNA, which are essential for resolving structurally complex regions. These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.


Subject(s)
Genetic Variation/genetics , Genome, Human , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Phenotype
10.
Genes Dev ; 28(14): 1544-9, 2014 Jul 15.
Article in English | MEDLINE | ID: mdl-25030694

ABSTRACT

During development, mammalian germ cells reprogram their epigenomes via a genome-wide erasure and de novo rewriting of DNA methylation marks. We know little of how methylation patterns are specifically determined. The piRNA pathway is thought to target the bulk of retrotransposon methylation. Here we show that most retrotransposon sequences are modified by default de novo methylation. However, potentially active retrotransposon copies evade this initial wave, likely mimicking features of protein-coding genes. These elements remain transcriptionally active and become targets of piRNA-mediated methylation. Thus, we posit that these two waves play essential roles in resetting germ cell epigenomes at each generation.


Subject(s)
DNA Methylation , Retroelements/genetics , Spermatocytes/cytology , Spermatogenesis/genetics , Animals , Cellular Reprogramming/genetics , Epigenesis, Genetic/genetics , Male , Mice , RNA, Small Interfering/metabolism , Spermatocytes/metabolism , Transcription, Genetic
11.
Genome Res ; 28(6): 921-932, 2018 06.
Article in English | MEDLINE | ID: mdl-29712755

ABSTRACT

Maize and sorghum are both important crops with similar overall plant architectures, but they have key differences, especially in regard to their inflorescences. To better understand these two organisms at the molecular level, we compared expression profiles of both protein-coding and noncoding transcripts in 11 matched tissues using single-molecule, long-read, deep RNA sequencing. This comparative analysis revealed large numbers of novel isoforms in both species. Evolutionarily young genes were likely to be generated in reproductive tissues and usually had fewer isoforms than old genes. We also observed similarities and differences in alternative splicing patterns and activities, both among tissues and between species. The maize subgenomes exhibited no bias in isoform generation; however, genes in the B genome were more highly expressed in pollen tissue, whereas genes in the A genome were more highly expressed in endosperm. We also identified a number of splicing events conserved between maize and sorghum. In addition, we generated comprehensive and high-resolution maps of poly(A) sites, revealing similarities and differences in mRNA cleavage between the two species. Overall, our results reveal considerable splicing and expression diversity between sorghum and maize, well beyond what was reported in previous studies, likely reflecting the differences in architecture between these two species.


Subject(s)
Alternative Splicing/genetics , Sorghum/genetics , Zea mays/genetics , Endosperm/genetics , Endosperm/growth & development , Gene Expression Regulation, Plant , Genome, Plant/genetics
12.
Genome Res ; 28(8): 1126-1135, 2018 08.
Article in English | MEDLINE | ID: mdl-29954844

ABSTRACT

The SK-BR-3 cell line is one of the most important models for HER2+ breast cancers, which affect one in five breast cancer patients. SK-BR-3 is known to be highly rearranged, although much of the variation is in complex and repetitive regions that may be underreported. Addressing this, we sequenced SK-BR-3 using long-read single molecule sequencing from Pacific Biosciences and develop one of the most detailed maps of structural variations (SVs) in a cancer genome available, with nearly 20,000 variants present, most of which were missed by short-read sequencing. Surrounding the important ERBB2 oncogene (also known as HER2), we discover a complex sequence of nested duplications and translocations, suggesting a punctuated progression. Full-length transcriptome sequencing further revealed several novel gene fusions within the nested genomic variants. Combining long-read genome and transcriptome sequencing enables an in-depth analysis of how SVs disrupt the genome and sheds new light on the complex mechanisms involved in cancer genome evolution.


Subject(s)
Breast Neoplasms/genetics , Gene Amplification/genetics , Gene Rearrangement/genetics , Oncogenes/genetics , Breast Neoplasms/pathology , Female , Genome, Human , Genomic Structural Variation , High-Throughput Nucleotide Sequencing , Humans , MCF-7 Cells , Receptor, ErbB-2/genetics , Repetitive Sequences, Nucleic Acid/genetics , Transcriptome/genetics
13.
Plant Physiol ; 182(1): 215-227, 2020 01.
Article in English | MEDLINE | ID: mdl-31641075

ABSTRACT

Chromatin modification has gained increased attention for its role in the regulation of plant responses to environmental changes, but the specific mechanisms and molecular players remain elusive. Here, we show that the Arabidopsis (Arabidopsis thaliana) histone methyltransferase SET DOMAIN GROUP8 (SDG8) mediates genome-wide changes in H3K36 methylation at specific genomic loci functionally relevant to nitrate treatments. Moreover, we show that the specific H3K36 methyltransferase encoded by SDG8 is required for canonical RNA processing, and that RNA isoform switching is more prominent in the sdg8-5 deletion mutant than in the wild type. To demonstrate that SDG8-mediated regulation of RNA isoform expression is functionally relevant, we examined a putative regulatory gene, CONSTANS, CO-like, and TOC1 101 (CCT101), whose nitrogen-responsive isoform-specific RNA expression is mediated by SDG8. We show by functional expression in shoot cells that the different RNA isoforms of CCT101 encode distinct regulatory proteins with different effects on genome-wide transcription. We conclude that SDG8 is involved in plant responses to environmental nitrogen supply, affecting multiple gene regulatory processes including genome-wide histone modification, transcriptional regulation, and RNA processing, and thereby mediating developmental and metabolic processes related to nitrogen use.


Subject(s)
Arabidopsis Proteins/metabolism , Arabidopsis/metabolism , Histone-Lysine N-Methyltransferase/metabolism , Histones/metabolism , Nitrates/pharmacology , RNA, Plant/metabolism , Arabidopsis/drug effects , Arabidopsis/genetics , Arabidopsis Proteins/genetics , Gene Expression Regulation, Plant/drug effects , Gene Expression Regulation, Plant/genetics , Histone-Lysine N-Methyltransferase/genetics , Methylation/drug effects , RNA, Plant/genetics
14.
Proc Natl Acad Sci U S A ; 115(25): 6494-6499, 2018 06 19.
Article in English | MEDLINE | ID: mdl-29769331

ABSTRACT

This study exploits time, the relatively unexplored fourth dimension of gene regulatory networks (GRNs), to learn the temporal transcriptional logic underlying dynamic nitrogen (N) signaling in plants. Our "just-in-time" analysis of time-series transcriptome data uncovered a temporal cascade of cis elements underlying dynamic N signaling. To infer transcription factor (TF)-target edges in a GRN, we applied a time-based machine learning method to 2,174 dynamic N-responsive genes. We experimentally determined a network precision cutoff, using TF-regulated genome-wide targets of three TF hubs (CRF4, SNZ, and CDF1), used to "prune" the network to 155 TFs and 608 targets. This network precision was reconfirmed using genome-wide TF-target regulation data for four additional TFs (TGA1, HHO5/6, and PHL1) not used in network pruning. These higher-confidence edges in the GRN were further filtered by independent TF-target binding data, used to calculate a TF "N-specificity" index. This refined GRN identifies the temporal relationship of known/validated regulators of N signaling (NLP7/8, TGA1/4, NAC4, HRS1, and LBD37/38/39) and 146 additional regulators. Six TFs-CRF4, SNZ, CDF1, HHO5/6, and PHL1-validated herein regulate a significant number of genes in the dynamic N response, targeting 54% of N-uptake/assimilation pathway genes. Phenotypically, inducible overexpression of CRF4 in planta regulates genes resulting in altered biomass, root development, and 15NO3- uptake, specifically under low-N conditions. This dynamic N-signaling GRN now provides the temporal "transcriptional logic" for 155 candidate TFs to improve nitrogen use efficiency with potential agricultural applications. Broadly, these time-based approaches can uncover the temporal transcriptional logic for any biological response system in biology, agriculture, or medicine.


Subject(s)
Arabidopsis/genetics , Arabidopsis/metabolism , Gene Expression Regulation, Plant/genetics , Gene Regulatory Networks/genetics , Nitrogen/metabolism , Transcription, Genetic/genetics , Arabidopsis Proteins/genetics , Gene Expression Profiling/methods , Logic , Protein Binding/genetics , Signal Transduction/genetics , Transcription Factors/genetics
15.
Nature ; 515(7526): 216-21, 2014 Nov 13.
Article in English | MEDLINE | ID: mdl-25363768

ABSTRACT

Whole exome sequencing has proven to be a powerful tool for understanding the genetic architecture of human disease. Here we apply it to more than 2,500 simplex families, each having a child with an autistic spectrum disorder. By comparing affected to unaffected siblings, we show that 13% of de novo missense mutations and 43% of de novo likely gene-disrupting (LGD) mutations contribute to 12% and 9% of diagnoses, respectively. Including copy number variants, coding de novo mutations contribute to about 30% of all simplex and 45% of female diagnoses. Almost all LGD mutations occur opposite wild-type alleles. LGD targets in affected females significantly overlap the targets in males of lower intelligence quotient (IQ), but neither overlaps significantly with targets in males of higher IQ. We estimate that LGD mutation in about 400 genes can contribute to the joint class of affected females and males of lower IQ, with an overlapping and similar number of genes vulnerable to contributory missense mutation. LGD targets in the joint class overlap with published targets for intellectual disability and schizophrenia, and are enriched for chromatin modifiers, FMRP-associated genes and embryonically expressed genes. Most of the significance for the latter comes from affected females.


Subject(s)
Child Development Disorders, Pervasive/genetics , Genetic Predisposition to Disease/genetics , Mutation/genetics , Open Reading Frames/genetics , Child , Cluster Analysis , Exome/genetics , Female , Genes , Humans , Intelligence Tests , Male , Reproducibility of Results
16.
Mol Psychiatry ; 23(12): 2254-2265, 2018 12.
Article in English | MEDLINE | ID: mdl-29880880

ABSTRACT

Psychiatric disorders are a group of genetically related diseases with highly polygenic architectures. Genome-wide association analyses have made substantial progress towards understanding the genetic architecture of these disorders. More recently, exome- and whole-genome sequencing of cases and families have identified rare, high penetrant variants that provide direct functional insight. There remains, however, a gap in the heritability explained by these complementary approaches. To understand how multiple genetic variants combine to modify both severity and penetrance of a highly penetrant variant, we sequenced 48 whole genomes from a family with a high loading of psychiatric disorder linked to a balanced chromosomal translocation. The (1;11)(q42;q14.3) translocation directly disrupts three genes: DISC1, DISC2, DISC1FP and has been linked to multiple brain imaging and neurocognitive outcomes in the family. Using DNA sequence-level linkage analysis, functional annotation and population-based association, we identified common and rare variants in GRM5 (minor allele frequency (MAF) > 0.05), PDE4D (MAF > 0.2) and CNTN5 (MAF < 0.01) that may help explain the individual differences in phenotypic expression in the family. We suggest that whole-genome sequencing in large families will improve the understanding of the combined effects of the rare and common sequence variation underlying psychiatric phenotypes.


Subject(s)
Mental Disorders/genetics , Sequence Analysis, DNA/methods , Adult , Alleles , Contactins/genetics , Cyclic Nucleotide Phosphodiesterases, Type 4/genetics , Family/psychology , Female , Gene Frequency/genetics , Genetic Linkage/genetics , Genetic Predisposition to Disease/genetics , Genetic Testing , Genome-Wide Association Study , Genomics , Genotype , Humans , Lod Score , Male , Mental Disorders/physiopathology , Middle Aged , Mood Disorders/genetics , Multifactorial Inheritance , Nerve Tissue Proteins/genetics , Pedigree , Phenotype , RNA, Long Noncoding , RNA, Messenger/genetics , Receptor, Metabotropic Glutamate 5/genetics , Recombinant Fusion Proteins/genetics , Translocation, Genetic
17.
Mol Cell ; 44(1): 17-28, 2011 Oct 07.
Article in English | MEDLINE | ID: mdl-21924933

ABSTRACT

DNA methylation has been implicated as an epigenetic component of mechanisms that stabilize cell-fate decisions. Here, we have characterized the methylomes of human female hematopoietic stem/progenitor cells (HSPCs) and mature cells from the myeloid and lymphoid lineages. Hypomethylated regions (HMRs) associated with lineage-specific genes were often methylated in the opposing lineage. In HSPCs, these sites tended to show intermediate, complex patterns that resolve to uniformity upon differentiation, by increased or decreased methylation. Promoter HMRs shared across diverse cell types typically display a constitutive core that expands and contracts in a lineage-specific manner to fine-tune the expression of associated genes. Many newly identified intergenic HMRs, both constitutive and lineage specific, were enriched for factor binding sites with an implied role in genome organization and regulation of gene expression, respectively. Overall, our studies represent an important reference data set and provide insights into directional changes in DNA methylation as cells adopt terminal fates.


Subject(s)
DNA Methylation , Hematopoietic Stem Cells/cytology , Adult , Binding Sites , Cell Differentiation , Cell Lineage , Comparative Genomic Hybridization , Epigenesis, Genetic , Female , Gene Expression Regulation , Genome, Human , Hematopoietic System , Humans , Models, Biological , Promoter Regions, Genetic
18.
Genome Res ; 25(11): 1750-6, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26447147

ABSTRACT

Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available, and we used this for sequencing the Saccharomyces cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr specifically for Oxford Nanopore reads, because existing packages were incapable of assembling the long read lengths (5-50 kbp) at such high error rates (between ∼5% and 40% error). With this new method, we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: The contig N50 length is more than ten times greater than an Illumina-only assembly (678 kb versus 59.9 kbp) and has >99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly.


Subject(s)
DNA, Fungal/isolation & purification , Nanopores , Saccharomyces cerevisiae/genetics , Sequence Analysis, DNA/methods , DNA Transposable Elements , DNA, Fungal/genetics , Escherichia coli/genetics , Genomics , Sequence Alignment
19.
Nat Methods ; 12(8): 780-6, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26121404

ABSTRACT

We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.


Subject(s)
Computational Biology/methods , Genome, Human , High-Throughput Nucleotide Sequencing/methods , Polymorphism, Single Nucleotide , Algorithms , Chromosome Mapping , Diploidy , Gene Library , Genetic Variation , Genome , Haplotypes , Humans , Nucleotides/genetics , Reproducibility of Results , Sequence Analysis, DNA , Tandem Repeat Sequences
20.
Nature ; 491(7426): 705-10, 2012 Nov 29.
Article in English | MEDLINE | ID: mdl-23192148

ABSTRACT

Bread wheat (Triticum aestivum) is a globally important crop, accounting for 20 per cent of the calories consumed by humans. Major efforts are underway worldwide to increase wheat production by extending genetic diversity and analysing key traits, and genomic resources can accelerate progress. But so far the very large size and polyploid complexity of the bread wheat genome have been substantial barriers to genome analysis. Here we report the sequencing of its large, 17-gigabase-pair, hexaploid genome using 454 pyrosequencing, and comparison of this with the sequences of diploid ancestral and progenitor genomes. We identified between 94,000 and 96,000 genes, and assigned two-thirds to the three component genomes (A, B and D) of hexaploid wheat. High-resolution synteny maps identified many small disruptions to conserved gene order. We show that the hexaploid genome is highly dynamic, with significant loss of gene family members on polyploidization and domestication, and an abundance of gene fragments. Several classes of genes involved in energy harvesting, metabolism and growth are among expanded gene families that could be associated with crop productivity. Our analyses, coupled with the identification of extensive genetic variation, provide a resource for accelerating gene discovery and improving this major crop.


Subject(s)
Bread , Genome, Plant/genetics , Triticum/genetics , Brachypodium/genetics , Chromosomes, Plant/genetics , Crops, Agricultural/genetics , DNA, Complementary/genetics , DNA, Plant/genetics , Evolution, Molecular , Genes, Plant/genetics , Genomics , Multigene Family/genetics , Oryza/genetics , Polymorphism, Single Nucleotide/genetics , Polyploidy , Pseudogenes/genetics , Sequence Alignment , Sequence Analysis, DNA , Triticum/classification , Zea mays/genetics
SELECTION OF CITATIONS
SEARCH DETAIL