Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 85
Filter
1.
Cell ; 162(2): 375-390, 2015 Jul 16.
Article in English | MEDLINE | ID: mdl-26186191

ABSTRACT

Autism spectrum disorder (ASD) is a disorder of brain development. Most cases lack a clear etiology or genetic basis, and the difficulty of re-enacting human brain development has precluded understanding of ASD pathophysiology. Here we use three-dimensional neural cultures (organoids) derived from induced pluripotent stem cells (iPSCs) to investigate neurodevelopmental alterations in individuals with severe idiopathic ASD. While no known underlying genomic mutation could be identified, transcriptome and gene network analyses revealed upregulation of genes involved in cell proliferation, neuronal differentiation, and synaptic assembly. ASD-derived organoids exhibit an accelerated cell cycle and overproduction of GABAergic inhibitory neurons. Using RNA interference, we show that overexpression of the transcription factor FOXG1 is responsible for the overproduction of GABAergic neurons. Altered expression of gene network modules and FOXG1 are positively correlated with symptom severity. Our data suggest that a shift toward GABAergic neuron fate caused by FOXG1 is a developmental precursor of ASD.


Subject(s)
Child Development Disorders, Pervasive/genetics , Child Development Disorders, Pervasive/pathology , Forkhead Transcription Factors/metabolism , Nerve Tissue Proteins/metabolism , Neurogenesis , Telencephalon/embryology , Female , Gene Expression Profiling , Humans , Induced Pluripotent Stem Cells , Male , Megalencephaly/genetics , Megalencephaly/pathology , Models, Biological , Neurons/cytology , Neurons/metabolism , Organoids/pathology , Telencephalon/pathology
2.
Nucleic Acids Res ; 51(10): e57, 2023 06 09.
Article in English | MEDLINE | ID: mdl-37026484

ABSTRACT

Mosaic mutations can be used to track cell ancestries and reconstruct high-resolution lineage trees during cancer progression and during development, starting from the first cell divisions of the zygote. However, this approach requires sampling and analyzing the genomes of multiple cells, which can be redundant in lineage representation, limiting the scalability of the approach. We describe a strategy for cost- and time-efficient lineage reconstruction using clonal induced pluripotent stem cell lines from human skin fibroblasts. The approach leverages shallow sequencing coverage to assess the clonality of the lines, clusters redundant lines and sums their coverage to accurately discover mutations in the corresponding lineages. Only a fraction of lines needs to be sequenced to high coverage. We demonstrate the effectiveness of this approach for reconstructing lineage trees during development and in hematologic malignancies. We discuss and propose an optimal experimental design for reconstructing lineage trees.


Subject(s)
Cell Lineage , Neoplasms , Software , Humans , Germ Cells , Mutation , Neoplasms/pathology
3.
Annu Rev Genomics Hum Genet ; 21: 101-116, 2020 08 31.
Article in English | MEDLINE | ID: mdl-32413272

ABSTRACT

Tracing cell lineages is fundamental for understanding the rules governing development in multicellular organisms and delineating complex biological processes involving the differentiation of multiple cell types with distinct lineage hierarchies. In humans, experimental lineage tracing is unethical, and one has to rely on natural-mutation markers that are created within cells as they proliferate and age. Recent studies have demonstrated that it is now possible to trace lineages in normal, noncancerous cells with a variety of data types using natural variations in the nuclear and mitochondrial DNA as well as variations in DNA methylation status. It is also apparent that the scientific community is on the verge of being able to make a comprehensive and detailed cell lineage map of human embryonic and fetal development. In this review, we discuss the advantages and disadvantages of different approaches and markers for lineage tracing. We also describe the general conceptual design for how to derive a lineage map for humans.


Subject(s)
Cell Differentiation , Cell Lineage , Cell Nucleus/genetics , DNA Methylation , DNA, Mitochondrial/analysis , Embryo, Mammalian/cytology , DNA, Mitochondrial/genetics , Developmental Biology , Embryo, Mammalian/metabolism , Humans , Single-Cell Analysis
4.
Genome Res ; 30(12): 1695-1704, 2020 12.
Article in English | MEDLINE | ID: mdl-33122304

ABSTRACT

Somatic mosaicism, manifesting as single nucleotide variants (SNVs), mobile element insertions, and structural changes in the DNA, is a common phenomenon in human brain cells, with potential functional consequences. Using a clonal approach, we previously detected 200-400 mosaic SNVs per cell in three human fetal brains (15-21 wk postconception). However, structural variation in the human fetal brain has not yet been investigated. Here, we discover and validate four mosaic structural variants (SVs) in the same brains and resolve their precise breakpoints. The SVs were of kilobase scale and complex, consisting of deletion(s) and rearranged genomic fragments, which sometimes originated from different chromosomes. Sequences at the breakpoints of these rearrangements had microhomologies, suggesting their origin from replication errors. One SV was found in two clones, and we timed its origin to ∼14 wk postconception. No large scale mosaic copy number variants (CNVs) were detectable in normal fetal human brains, suggesting that previously reported megabase-scale CNVs in neurons arise at later stages of development. By reanalysis of public single nuclei data from adult brain neurons, we detected an extrachromosomal circular DNA event. Our study reveals the existence of mosaic SVs in the developing human brain, likely arising from cell proliferation during mid-neurogenesis. Although relatively rare compared to SNVs and present in ∼10% of neurons, SVs in developing human brain affect a comparable number of bases in the genome (∼6200 vs. ∼4000 bp), implying that they may have similar functional consequences.


Subject(s)
Brain/embryology , DNA, Circular/genetics , Genomic Structural Variation , Sequence Analysis, DNA/methods , Clonal Evolution , Female , Genotyping Techniques , Gestational Age , Humans , Mosaicism , Neurogenesis , Pregnancy
5.
PLoS Comput Biol ; 18(4): e1009487, 2022 04.
Article in English | MEDLINE | ID: mdl-35442945

ABSTRACT

Accurate discovery of somatic mutations in a cell is a challenge that partially lays in immaturity of dedicated analytical approaches. Approaches comparing a cell's genome to a control bulk sample miss common mutations, while approaches to find such mutations from bulk suffer from low sensitivity. We developed a tool, All2, which enables accurate filtering of mutations in a cell without the need for data from bulk(s). It is based on pair-wise comparisons of all cells to each other where every call for base pair substitution and indel is classified as either a germline variant, mosaic mutation, or false positive. As All2 allows for considering dropped-out regions, it is applicable to whole genome and exome analysis of cloned and amplified cells. By applying the approach to a variety of available data, we showed that its application reduces false positives, enables sensitive discovery of high frequency mutations, and is indispensable for conducting high resolution cell lineage tracing.


Subject(s)
Exome , Software , High-Throughput Nucleotide Sequencing , INDEL Mutation/genetics , Mutation/genetics , Exome Sequencing
7.
Genome Res ; 29(3): 472-484, 2019 03.
Article in English | MEDLINE | ID: mdl-30737237

ABSTRACT

K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.


Subject(s)
Genome, Human , Humans , K562 Cells , Karyotype , Polymorphism, Genetic , Whole Genome Sequencing
8.
Bioinformatics ; 37(7): 1015-1017, 2021 05 17.
Article in English | MEDLINE | ID: mdl-32777815

ABSTRACT

SUMMARY: Defining the precise location of structural variations (SVs) at single-nucleotide breakpoint resolution is a challenging problem due to large gaps in alignment. Previously, Alignment with Gap Excision (AGE) enabled us to define breakpoints of SVs at single-nucleotide resolution; however, AGE requires a vast amount of memory when aligning a pair of long sequences. To address this, we developed a memory-efficient implementation-LongAGE-based on the classical Hirschberg algorithm. We demonstrate an application of LongAGE for resolving breakpoints of SVs embedded into segmental duplications on Pacific Biosciences (PacBio) reads that can be longer than 10 kb. Furthermore, we observed different breakpoints for a deletion and a duplication in the same locus, providing direct evidence that such multi-allelic copy number variants (mCNVs) arise from two or more independent ancestral mutations. AVAILABILITY AND IMPLEMENTATION: LongAGE is implemented in C++ and available on Github at https://github.com/Coaxecva/LongAGE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomic Structural Variation , Software , Algorithms , DNA Copy Number Variations , High-Throughput Nucleotide Sequencing , Segmental Duplications, Genomic , Sequence Analysis, DNA
9.
Nature ; 526(7571): 75-81, 2015 Oct 01.
Article in English | MEDLINE | ID: mdl-26432246

ABSTRACT

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.


Subject(s)
Genetic Variation/genetics , Genome, Human/genetics , Physical Chromosome Mapping , Amino Acid Sequence , Genetic Predisposition to Disease , Genetics, Medical , Genetics, Population , Genome-Wide Association Study , Genomics , Genotype , Haplotypes/genetics , Homozygote , Humans , Molecular Sequence Data , Mutation Rate , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Sequence Analysis, DNA , Sequence Deletion/genetics
10.
Nucleic Acids Res ; 47(6): 2766-2777, 2019 04 08.
Article in English | MEDLINE | ID: mdl-30773596

ABSTRACT

Structural variations (SVs) in the human genome originate from different mechanisms related to DNA repair, replication errors, and retrotransposition. Our analyses of 26 927 SVs from the 1000 Genomes Project revealed differential distributions and consequences of SVs of different origin, e.g. deletions from non-allelic homologous recombination (NAHR) are more prone to disrupt chromatin organization while processed pseudogenes can create accessible chromatin. Spontaneous double stranded breaks (DSBs) are the best predictor of enrichment of NAHR deletions in open chromatin. This evidence, along with strong physical interaction of NAHR breakpoints belonging to the same deletion suggests that majority of NAHR deletions are non-meiotic i.e. originate from errors during homology directed repair (HDR) of spontaneous DSBs. In turn, the origin of the spontaneous DSBs is associated with transcription factor binding in accessible chromatin revealing the vulnerability of functional, open chromatin. The chromatin itself is enriched with repeats, particularly fixed Alu elements that provide the homology required to maintain stability via HDR. Through co-localization of fixed Alus and NAHR deletions in open chromatin we hypothesize that old Alu expansion had a stabilizing role on the human genome.


Subject(s)
Chromatin/chemistry , Genome, Human , Genomic Structural Variation/genetics , Quantitative Trait, Heritable , Chromatin/metabolism , Chromosome Mapping , Computational Biology , DNA Breaks, Double-Stranded , DNA Damage/physiology , DNA Repair , Homologous Recombination , Humans , Recombinational DNA Repair
11.
Nucleic Acids Res ; 47(8): 3846-3861, 2019 05 07.
Article in English | MEDLINE | ID: mdl-30864654

ABSTRACT

HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.


Subject(s)
Chromosome Mapping/methods , Genome, Human , Genomics/methods , Haplotypes , Sequence Analysis, DNA/statistics & numerical data , Alleles , Aneuploidy , DNA Methylation , Genomic Structural Variation , Hep G2 Cells , High-Throughput Nucleotide Sequencing , Humans , INDEL Mutation , Karyotyping , Loss of Heterozygosity , Polymorphism, Single Nucleotide , Retroelements
12.
BMC Bioinformatics ; 21(1): 521, 2020 Nov 12.
Article in English | MEDLINE | ID: mdl-33183232

ABSTRACT

BACKGROUND: The study of mosaic mutation is important since it has been linked to cancer and various disorders. Single cell sequencing has become a powerful tool to study the genome of individual cells for the detection of mosaic mutations. The amount of DNA in a single cell needs to be amplified before sequencing and multiple displacement amplification (MDA) is widely used owing to its low error rate and long fragment length of amplified DNA. However, the phi29 polymerase used in MDA is sensitive to template fragmentation and presence of sites with DNA damage that can lead to biases such as allelic imbalance, uneven coverage and over representation of C to T mutations. It is therefore important to select cells with uniform amplification to decrease false positives and increase sensitivity for mosaic mutation detection. RESULTS: We propose a method, Scellector (single cell selector), which uses haplotype information to detect amplification quality in shallow coverage sequencing data. We tested Scellector on single human neuronal cells, obtained in vitro and amplified by MDA. Qualities were estimated from shallow sequencing with coverage as low as 0.3× per cell and then confirmed using 30× deep coverage sequencing. The high concordance between shallow and high coverage data validated the method. CONCLUSION: Scellector can potentially be used to rank amplifications obtained from single cell platforms relying on a MDA-like amplification step, such as Chromium Single Cell profiling solution.


Subject(s)
Nucleic Acid Amplification Techniques/methods , Single-Cell Analysis/methods , Cell Differentiation , DNA/chemistry , DNA/metabolism , High-Throughput Nucleotide Sequencing , Humans , Induced Pluripotent Stem Cells/cytology , Induced Pluripotent Stem Cells/metabolism , Neurons/cytology , Neurons/metabolism , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
13.
Genome Res ; 27(4): 512-523, 2017 04.
Article in English | MEDLINE | ID: mdl-28235832

ABSTRACT

Few studies have been conducted to understand post-zygotic accumulation of mutations in cells of the healthy human body. We reprogrammed 32 skin fibroblast cells from families of donors into human induced pluripotent stem cell (hiPSC) lines. The clonal nature of hiPSC lines allows a high-resolution analysis of the genomes of the founder fibroblast cells without being confounded by the artifacts of single-cell whole-genome amplification. We estimate that on average a fibroblast cell in children has 1035 mostly benign mosaic SNVs. On average, 235 SNVs could be directly confirmed in the original fibroblast population by ultradeep sequencing, down to an allele frequency (AF) of 0.1%. More sensitive droplet digital PCR experiments confirmed more SNVs as mosaic with AF as low as 0.01%, suggesting that 1035 mosaic SNVs per fibroblast cell is the true average. Similar analyses in adults revealed no significant increase in the number of SNVs per cell, suggesting that a major fraction of mosaic SNVs in fibroblasts arises during development. Mosaic SNVs were distributed uniformly across the genome and were enriched in a mutational signature previously observed in cancers and in de novo variants and which, we hypothesize, is a hallmark of normal cell proliferation. Finally, AF distribution of mosaic SNVs had distinct narrow peaks, which could be a characteristic of clonal cell selection, clonal expansion, or both. These findings reveal a large degree of somatic mosaicism in healthy human tissues, link de novo and cancer mutations to somatic mosaicism, and couple somatic mosaicism with cell proliferation.


Subject(s)
Clonal Evolution , DNA Copy Number Variations , Fibroblasts/cytology , Mosaicism , Mutation Accumulation , Cell Proliferation , Cells, Cultured , Fibroblasts/metabolism , Humans , Induced Pluripotent Stem Cells/cytology , Induced Pluripotent Stem Cells/metabolism , Skin/cytology
14.
Gynecol Oncol ; 156(2): 387-392, 2020 02.
Article in English | MEDLINE | ID: mdl-31787246

ABSTRACT

OBJECTIVE: We aimed to assess whether endometrial cancer (EC) can be detected in shed DNA collected with vaginal tampon by analyzing copy number, methylation markers, and mutations. METHODS: Tampons were collected prior to hysterectomy from 38 EC patients and 28 women with benign indications. Extracted tampon DNA underwent the following: 1) low-coverage whole genome sequencing (LC-WGS) to assess copy number, 2) pyrosequencing to measure percent promotor methylation of HOXA9, RASSF1, and CDH13 and 3) next generation sequencing (NGS) to identify mutations in 19 genes associated with EC identified through The Cancer Genome Atlas. Sensitivity and specificity for each test and test combinations were calculated. RESULTS: Methylation analysis yielded the highest specificities but lowest sensitivities (37-40% sensitivity; 100% specificity for HOXA9, RASSF1 and HTR1B) while mutation analysis had improved sensitivity (50% sensitivity; 83% specificity). Only one "false positive" result for copy number variants was identified among women with benign surgical indications, which was based on detection of copy number changes, and associated with a leiomyosarcoma that was only recognized at hysterectomy. Considering any of the 3 biomarker classes as a positive, resulted in a sensitivity of 92% and specificity of 86%. Mutation analysis did not add sensitivity to the combination of analysis of copy number and methylation. CONCLUSIONS: This study demonstrates a proof-of-principle for non-invasive yet precise detection of endometrial cancer. We propose that with improved biomarker testing, it may be possible to develop a clinically useful test for detecting EC.


Subject(s)
DNA Methylation , Endometrial Neoplasms/genetics , Gene Dosage , Menstrual Hygiene Products , Biomarkers, Tumor/genetics , Diagnosis, Differential , Endometrial Neoplasms/diagnosis , Endometrial Neoplasms/pathology , Female , Humans , Middle Aged , Mutation , Uterine Diseases/diagnosis , Uterine Diseases/genetics , Uterine Diseases/pathology , Vaginal Smears/methods
15.
Genome Res ; 26(7): 874-81, 2016 07.
Article in English | MEDLINE | ID: mdl-27216746

ABSTRACT

Copy number variants (CNVs) are a class of structural variants that may involve complex genomic rearrangements (CGRs) and are hypothesized to have additional mutations around their breakpoints. Understanding the mechanisms underlying CNV formation is fundamental for understanding the repair and mutation mechanisms in cells, thereby shedding light on evolution, genomic disorders, cancer, and complex human traits. In this study, we used data from the 1000 Genomes Project to analyze hundreds of loci harboring heterozygous germline deletions in the subjects NA12878 and NA19240. By utilizing synthetic long-read data (longer than 2 kbp) in combination with high coverage short-read data and, in parallel, by comparing with parental genomes, we interrogated the phasing of these deletions with the flanking tens of thousands of heterozygous SNPs and indels. We found that the density of SNPs/indels flanking the breakpoints of deletions (in-phase variants) is approximately twice as high as the corresponding density for the variants on the haplotype without deletion (out-of-phase variants). This fold change was even larger for the subset of deletions with signatures of replication-based mechanism of formation. The allele frequency (AF) spectrum for deletions is enriched for rare events; and the AF spectrum for in-phase SNPs is shifted toward this deletion spectrum, thus offering evidence consistent with the concomitance of the in-phase SNPs/indels with the deletion events. These findings therefore lend support to the hypothesis that the mutational mechanisms underlying CNV formation are error prone. Our results could also be relevant for resolving mutation-rate discrepancies in human and to explain kataegis.


Subject(s)
Chromosome Breakpoints , DNA Copy Number Variations , Mutagenesis , DNA Replication , Female , Gene Frequency , Genome, Human , Haplotypes , Humans , INDEL Mutation , Male , Models, Genetic , Polymorphism, Single Nucleotide , Sequence Analysis, DNA , Sequence Deletion
16.
PLoS Comput Biol ; 13(6): e1005567, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28662076

ABSTRACT

Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage whole-genome sequencing data, utilizing information from both exon-exon junctions and discordant paired-end reads. We found 503 parent genes having novel retroduplications absent from the reference genome. Based solely on retroduplication variation, we built phylogenetic trees of human populations; these represent superpopulation structure well and indicate that variable retroduplications are effective population markers. We further identified 43 retroduplication parent genes differentiating superpopulations. This group contains several interesting insertion events, including a SLMO2 retroduplication and insertion into CAV3, which has a potential disease association. We also found retroduplications to be associated with a variety of genomic features: (1) Insertion sites were correlated with regular nucleosome positioning. (2) They, predictably, tend to avoid conserved functional regions, such as exons, but, somewhat surprisingly, also avoid introns. (3) Retroduplications tend to be co-inserted with young L1 elements, indicating recent retrotranspositional activity, and (4) they have a weak tendency to originate from highly expressed parent genes. Our investigation provides insight into the functional impact and association with genomic elements of retroduplications. We anticipate our approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of retroduplications can potentially improve the accuracy of SNP calling.


Subject(s)
Gene Duplication/genetics , Genetic Variation/genetics , Genome, Human/genetics , High-Throughput Nucleotide Sequencing/methods , Retroelements/genetics , Sequence Analysis, DNA/methods , DNA Transposable Elements/genetics , Exome/genetics , Humans , Species Specificity
17.
Nature ; 492(7429): 438-42, 2012 Dec 20.
Article in English | MEDLINE | ID: mdl-23160490

ABSTRACT

Reprogramming somatic cells into induced pluripotent stem cells (iPSCs) has been suspected of causing de novo copy number variation. To explore this issue, here we perform a whole-genome and transcriptome analysis of 20 human iPSC lines derived from the primary skin fibroblasts of seven individuals using next-generation sequencing. We find that, on average, an iPSC line manifests two copy number variants (CNVs) not apparent in the fibroblasts from which the iPSC was derived. Using PCR and digital droplet PCR, we show that at least 50% of those CNVs are present as low-frequency somatic genomic variants in parental fibroblasts (that is, the fibroblasts from which each corresponding human iPSC line is derived), and are manifested in iPSC lines owing to their clonal origin. Hence, reprogramming does not necessarily lead to de novo CNVs in iPSCs, because most of the line-manifested CNVs reflect somatic mosaicism in the human skin. Moreover, our findings demonstrate that clonal expansion, and iPSC lines in particular, can be used as a discovery tool to reliably detect low-frequency CNVs in the tissue of origin. Overall, we estimate that approximately 30% of the fibroblast cells have somatic CNVs in their genomes, suggesting widespread somatic mosaicism in the human body. Our study paves the way to understanding the fundamental question of the extent to which cells of the human body normally acquire structural alterations in their DNA post-zygotically.


Subject(s)
DNA Copy Number Variations/genetics , Induced Pluripotent Stem Cells/metabolism , Mosaicism , Skin/metabolism , Cell Differentiation , Cells, Cultured , Cellular Reprogramming , Clone Cells , Fibroblasts/cytology , Gene Expression Profiling , Genome, Human/genetics , Humans , Induced Pluripotent Stem Cells/cytology , Male , Neurons/cytology , Polymerase Chain Reaction , Reproducibility of Results , Skin/cytology
18.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Article in English | MEDLINE | ID: mdl-22955619

ABSTRACT

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


Subject(s)
DNA/genetics , Encyclopedias as Topic , Gene Regulatory Networks/genetics , Genome, Human/genetics , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Alleles , Cell Line , GATA1 Transcription Factor/metabolism , Gene Expression Profiling , Genomics , Humans , K562 Cells , Organ Specificity , Phosphorylation/genetics , Polymorphism, Single Nucleotide/genetics , Protein Interaction Maps , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Selection, Genetic/genetics , Transcription Initiation Site
19.
BMC Genomics ; 18(1): 321, 2017 04 24.
Article in English | MEDLINE | ID: mdl-28438122

ABSTRACT

BACKGROUND: High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data. RESULTS: The arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4-489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0-86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters. CONCLUSIONS: High-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies.


Subject(s)
DNA Copy Number Variations , Genome, Human/genetics , Oligonucleotide Array Sequence Analysis/methods , Cytogenetics , Genomics , Humans
20.
Nature ; 470(7332): 59-65, 2011 Feb 03.
Article in English | MEDLINE | ID: mdl-21293372

ABSTRACT

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.


Subject(s)
DNA Copy Number Variations/genetics , Genetics, Population , Genome, Human/genetics , Genomics , Gene Duplication/genetics , Genetic Predisposition to Disease/genetics , Genotype , Humans , Mutagenesis, Insertional/genetics , Reproducibility of Results , Sequence Analysis, DNA , Sequence Deletion/genetics
SELECTION OF CITATIONS
SEARCH DETAIL