RESUMEN
Autism spectrum disorder (ASD) is a disorder of brain development. Most cases lack a clear etiology or genetic basis, and the difficulty of re-enacting human brain development has precluded understanding of ASD pathophysiology. Here we use three-dimensional neural cultures (organoids) derived from induced pluripotent stem cells (iPSCs) to investigate neurodevelopmental alterations in individuals with severe idiopathic ASD. While no known underlying genomic mutation could be identified, transcriptome and gene network analyses revealed upregulation of genes involved in cell proliferation, neuronal differentiation, and synaptic assembly. ASD-derived organoids exhibit an accelerated cell cycle and overproduction of GABAergic inhibitory neurons. Using RNA interference, we show that overexpression of the transcription factor FOXG1 is responsible for the overproduction of GABAergic neurons. Altered expression of gene network modules and FOXG1 are positively correlated with symptom severity. Our data suggest that a shift toward GABAergic neuron fate caused by FOXG1 is a developmental precursor of ASD.
Asunto(s)
Trastornos Generalizados del Desarrollo Infantil/genética , Trastornos Generalizados del Desarrollo Infantil/patología , Factores de Transcripción Forkhead/metabolismo , Proteínas del Tejido Nervioso/metabolismo , Neurogénesis , Telencéfalo/embriología , Femenino , Perfilación de la Expresión Génica , Humanos , Células Madre Pluripotentes Inducidas , Masculino , Megalencefalia/genética , Megalencefalia/patología , Modelos Biológicos , Neuronas/citología , Neuronas/metabolismo , Organoides/patología , Telencéfalo/patologíaRESUMEN
We developed a generally applicable method, CRISPR/Cas9-targeted long-read sequencing (CTLR-Seq), to resolve, haplotype-specifically, the large and complex regions in the human genome that had been previously impenetrable to sequencing analysis, such as large segmental duplications (SegDups) and their associated genome rearrangements. CTLR-Seq combines in vitro Cas9-mediated cutting of the genome and pulse-field gel electrophoresis to isolate intact large (i.e., up to 2,000 kb) genomic regions that encompass previously unresolvable genomic sequences. These targets are then sequenced (amplification-free) at high on-target coverage using long-read sequencing, allowing for their complete sequence assembly. We applied CTLR-Seq to the SegDup-mediated rearrangements that constitute the boundaries of, and give rise to, the 22q11.2 Deletion Syndrome (22q11DS), the most common human microdeletion disorder. We then performed de novo assembly to resolve, at base-pair resolution, the full sequence rearrangements and exact chromosomal breakpoints of 22q11.2DS (including all common subtypes). Across multiple patients, we found a high degree of variability for both the rearranged SegDup sequences and the exact chromosomal breakpoint locations, which coincide with various transposons within the 22q11.2 SegDups, suggesting that 22q11DS can be driven by transposon-mediated genome recombination. Guided by CTLR-Seq results from two 22q11DS patients, we performed three-dimensional chromosomal folding analysis for the 22q11.2 SegDups from patient-derived neurons and astrocytes and found chromosome interactions anchored within the SegDups to be both cell type-specific and patient-specific. Lastly, we demonstrated that CTLR-Seq enables cell-type specific analysis of DNA methylation patterns within the deletion haplotype of 22q11DS.
Asunto(s)
Síndrome de DiGeorge , Humanos , Síndrome de DiGeorge/genética , Sistemas CRISPR-Cas , Puntos de Rotura del Cromosoma , Cromosomas Humanos Par 22/genética , Genoma Humano , Reordenamiento Génico , Análisis de Secuencia de ADN/métodos , Deleción CromosómicaRESUMEN
SUMMARY: Copy number variation (CNV) and alteration (CNA) analysis is a crucial component in many genomic studies and its applications span from basic research to clinic diagnostics and personalized medicine. CNVpytor is a tool featuring a read depth-based caller and combined read depth and B-allele frequency (BAF) based 2D caller to find CNVs and CNAs. The tool stores processed intermediate data and CNV/CNA calls in a compact HDF5 file-pytor file. Here, we describe a new track in igv.js that utilizes pytor and whole genome variant files as input for on-the-fly read depth and BAF visualization, CNV/CNA calling and analysis. Embedding into HTML pages and Jupiter Notebooks enables convenient remote data access and visualization simplifying interpretation and analysis of omics data. AVAILABILITY AND IMPLEMENTATION: The CNVpytor track is integrated with igv.js and available at https://github.com/igvteam/igv.js. The documentation is available at https://github.com/igvteam/igv.js/wiki/cnvpytor. Usage can be tested in the IGV-Web app at https://igv.org/app and also on https://github.com/abyzovlab/CNVpytor.
Asunto(s)
Variaciones en el Número de Copia de ADN , Genómica , Programas Informáticos , Genómica/métodos , HumanosRESUMEN
Mosaic mutations can be used to track cell ancestries and reconstruct high-resolution lineage trees during cancer progression and during development, starting from the first cell divisions of the zygote. However, this approach requires sampling and analyzing the genomes of multiple cells, which can be redundant in lineage representation, limiting the scalability of the approach. We describe a strategy for cost- and time-efficient lineage reconstruction using clonal induced pluripotent stem cell lines from human skin fibroblasts. The approach leverages shallow sequencing coverage to assess the clonality of the lines, clusters redundant lines and sums their coverage to accurately discover mutations in the corresponding lineages. Only a fraction of lines needs to be sequenced to high coverage. We demonstrate the effectiveness of this approach for reconstructing lineage trees during development and in hematologic malignancies. We discuss and propose an optimal experimental design for reconstructing lineage trees.
Asunto(s)
Linaje de la Célula , Neoplasias , Programas Informáticos , Humanos , Células Germinativas , Mutación , Neoplasias/patologíaRESUMEN
Tracing cell lineages is fundamental for understanding the rules governing development in multicellular organisms and delineating complex biological processes involving the differentiation of multiple cell types with distinct lineage hierarchies. In humans, experimental lineage tracing is unethical, and one has to rely on natural-mutation markers that are created within cells as they proliferate and age. Recent studies have demonstrated that it is now possible to trace lineages in normal, noncancerous cells with a variety of data types using natural variations in the nuclear and mitochondrial DNA as well as variations in DNA methylation status. It is also apparent that the scientific community is on the verge of being able to make a comprehensive and detailed cell lineage map of human embryonic and fetal development. In this review, we discuss the advantages and disadvantages of different approaches and markers for lineage tracing. We also describe the general conceptual design for how to derive a lineage map for humans.
Asunto(s)
Diferenciación Celular , Linaje de la Célula , Núcleo Celular/genética , Metilación de ADN , ADN Mitocondrial/análisis , Embrión de Mamíferos/citología , ADN Mitocondrial/genética , Biología Evolutiva , Embrión de Mamíferos/metabolismo , Humanos , Análisis de la Célula IndividualRESUMEN
Somatic mosaicism, manifesting as single nucleotide variants (SNVs), mobile element insertions, and structural changes in the DNA, is a common phenomenon in human brain cells, with potential functional consequences. Using a clonal approach, we previously detected 200-400 mosaic SNVs per cell in three human fetal brains (15-21 wk postconception). However, structural variation in the human fetal brain has not yet been investigated. Here, we discover and validate four mosaic structural variants (SVs) in the same brains and resolve their precise breakpoints. The SVs were of kilobase scale and complex, consisting of deletion(s) and rearranged genomic fragments, which sometimes originated from different chromosomes. Sequences at the breakpoints of these rearrangements had microhomologies, suggesting their origin from replication errors. One SV was found in two clones, and we timed its origin to â¼14 wk postconception. No large scale mosaic copy number variants (CNVs) were detectable in normal fetal human brains, suggesting that previously reported megabase-scale CNVs in neurons arise at later stages of development. By reanalysis of public single nuclei data from adult brain neurons, we detected an extrachromosomal circular DNA event. Our study reveals the existence of mosaic SVs in the developing human brain, likely arising from cell proliferation during mid-neurogenesis. Although relatively rare compared to SNVs and present in â¼10% of neurons, SVs in developing human brain affect a comparable number of bases in the genome (â¼6200 vs. â¼4000 bp), implying that they may have similar functional consequences.
Asunto(s)
Encéfalo/embriología , ADN Circular/genética , Variación Estructural del Genoma , Análisis de Secuencia de ADN/métodos , Evolución Clonal , Femenino , Técnicas de Genotipaje , Edad Gestacional , Humanos , Mosaicismo , Neurogénesis , EmbarazoRESUMEN
[This corrects the article DOI: 10.1371/journal.pcbi.1009487.].
RESUMEN
Accurate discovery of somatic mutations in a cell is a challenge that partially lays in immaturity of dedicated analytical approaches. Approaches comparing a cell's genome to a control bulk sample miss common mutations, while approaches to find such mutations from bulk suffer from low sensitivity. We developed a tool, All2, which enables accurate filtering of mutations in a cell without the need for data from bulk(s). It is based on pair-wise comparisons of all cells to each other where every call for base pair substitution and indel is classified as either a germline variant, mosaic mutation, or false positive. As All2 allows for considering dropped-out regions, it is applicable to whole genome and exome analysis of cloned and amplified cells. By applying the approach to a variety of available data, we showed that its application reduces false positives, enables sensitive discovery of high frequency mutations, and is indispensable for conducting high resolution cell lineage tracing.
Asunto(s)
Exoma , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento , Mutación INDEL/genética , Mutación/genética , Secuenciación del ExomaRESUMEN
K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.
Asunto(s)
Genoma Humano , Humanos , Células K562 , Cariotipo , Polimorfismo Genético , Secuenciación Completa del GenomaRESUMEN
SUMMARY: Defining the precise location of structural variations (SVs) at single-nucleotide breakpoint resolution is a challenging problem due to large gaps in alignment. Previously, Alignment with Gap Excision (AGE) enabled us to define breakpoints of SVs at single-nucleotide resolution; however, AGE requires a vast amount of memory when aligning a pair of long sequences. To address this, we developed a memory-efficient implementation-LongAGE-based on the classical Hirschberg algorithm. We demonstrate an application of LongAGE for resolving breakpoints of SVs embedded into segmental duplications on Pacific Biosciences (PacBio) reads that can be longer than 10 kb. Furthermore, we observed different breakpoints for a deletion and a duplication in the same locus, providing direct evidence that such multi-allelic copy number variants (mCNVs) arise from two or more independent ancestral mutations. AVAILABILITY AND IMPLEMENTATION: LongAGE is implemented in C++ and available on Github at https://github.com/Coaxecva/LongAGE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Variación Estructural del Genoma , Programas Informáticos , Algoritmos , Variaciones en el Número de Copia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADNRESUMEN
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.
Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Mapeo Físico de Cromosoma , Secuencia de Aminoácidos , Predisposición Genética a la Enfermedad , Genética Médica , Genética de Población , Estudio de Asociación del Genoma Completo , Genómica , Genotipo , Haplotipos/genética , Homocigoto , Humanos , Datos de Secuencia Molecular , Tasa de Mutación , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Análisis de Secuencia de ADN , Eliminación de Secuencia/genéticaRESUMEN
Structural variations (SVs) in the human genome originate from different mechanisms related to DNA repair, replication errors, and retrotransposition. Our analyses of 26 927 SVs from the 1000 Genomes Project revealed differential distributions and consequences of SVs of different origin, e.g. deletions from non-allelic homologous recombination (NAHR) are more prone to disrupt chromatin organization while processed pseudogenes can create accessible chromatin. Spontaneous double stranded breaks (DSBs) are the best predictor of enrichment of NAHR deletions in open chromatin. This evidence, along with strong physical interaction of NAHR breakpoints belonging to the same deletion suggests that majority of NAHR deletions are non-meiotic i.e. originate from errors during homology directed repair (HDR) of spontaneous DSBs. In turn, the origin of the spontaneous DSBs is associated with transcription factor binding in accessible chromatin revealing the vulnerability of functional, open chromatin. The chromatin itself is enriched with repeats, particularly fixed Alu elements that provide the homology required to maintain stability via HDR. Through co-localization of fixed Alus and NAHR deletions in open chromatin we hypothesize that old Alu expansion had a stabilizing role on the human genome.
Asunto(s)
Cromatina/química , Genoma Humano , Variación Estructural del Genoma/genética , Carácter Cuantitativo Heredable , Cromatina/metabolismo , Mapeo Cromosómico , Biología Computacional , Roturas del ADN de Doble Cadena , Daño del ADN/fisiología , Reparación del ADN , Recombinación Homóloga , Humanos , Reparación del ADN por RecombinaciónRESUMEN
HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.
Asunto(s)
Mapeo Cromosómico/métodos , Genoma Humano , Genómica/métodos , Haplotipos , Análisis de Secuencia de ADN/estadística & datos numéricos , Alelos , Aneuploidia , Metilación de ADN , Variación Estructural del Genoma , Células Hep G2 , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Cariotipificación , Pérdida de Heterocigocidad , Polimorfismo de Nucleótido Simple , RetroelementosRESUMEN
BACKGROUND: The study of mosaic mutation is important since it has been linked to cancer and various disorders. Single cell sequencing has become a powerful tool to study the genome of individual cells for the detection of mosaic mutations. The amount of DNA in a single cell needs to be amplified before sequencing and multiple displacement amplification (MDA) is widely used owing to its low error rate and long fragment length of amplified DNA. However, the phi29 polymerase used in MDA is sensitive to template fragmentation and presence of sites with DNA damage that can lead to biases such as allelic imbalance, uneven coverage and over representation of C to T mutations. It is therefore important to select cells with uniform amplification to decrease false positives and increase sensitivity for mosaic mutation detection. RESULTS: We propose a method, Scellector (single cell selector), which uses haplotype information to detect amplification quality in shallow coverage sequencing data. We tested Scellector on single human neuronal cells, obtained in vitro and amplified by MDA. Qualities were estimated from shallow sequencing with coverage as low as 0.3× per cell and then confirmed using 30× deep coverage sequencing. The high concordance between shallow and high coverage data validated the method. CONCLUSION: Scellector can potentially be used to rank amplifications obtained from single cell platforms relying on a MDA-like amplification step, such as Chromium Single Cell profiling solution.
Asunto(s)
Técnicas de Amplificación de Ácido Nucleico/métodos , Análisis de la Célula Individual/métodos , Diferenciación Celular , ADN/química , ADN/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Células Madre Pluripotentes Inducidas/citología , Células Madre Pluripotentes Inducidas/metabolismo , Neuronas/citología , Neuronas/metabolismo , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADNRESUMEN
Few studies have been conducted to understand post-zygotic accumulation of mutations in cells of the healthy human body. We reprogrammed 32 skin fibroblast cells from families of donors into human induced pluripotent stem cell (hiPSC) lines. The clonal nature of hiPSC lines allows a high-resolution analysis of the genomes of the founder fibroblast cells without being confounded by the artifacts of single-cell whole-genome amplification. We estimate that on average a fibroblast cell in children has 1035 mostly benign mosaic SNVs. On average, 235 SNVs could be directly confirmed in the original fibroblast population by ultradeep sequencing, down to an allele frequency (AF) of 0.1%. More sensitive droplet digital PCR experiments confirmed more SNVs as mosaic with AF as low as 0.01%, suggesting that 1035 mosaic SNVs per fibroblast cell is the true average. Similar analyses in adults revealed no significant increase in the number of SNVs per cell, suggesting that a major fraction of mosaic SNVs in fibroblasts arises during development. Mosaic SNVs were distributed uniformly across the genome and were enriched in a mutational signature previously observed in cancers and in de novo variants and which, we hypothesize, is a hallmark of normal cell proliferation. Finally, AF distribution of mosaic SNVs had distinct narrow peaks, which could be a characteristic of clonal cell selection, clonal expansion, or both. These findings reveal a large degree of somatic mosaicism in healthy human tissues, link de novo and cancer mutations to somatic mosaicism, and couple somatic mosaicism with cell proliferation.
Asunto(s)
Evolución Clonal , Variaciones en el Número de Copia de ADN , Fibroblastos/citología , Mosaicismo , Acumulación de Mutaciones , Proliferación Celular , Células Cultivadas , Fibroblastos/metabolismo , Humanos , Células Madre Pluripotentes Inducidas/citología , Células Madre Pluripotentes Inducidas/metabolismo , Piel/citologíaRESUMEN
OBJECTIVE: We aimed to assess whether endometrial cancer (EC) can be detected in shed DNA collected with vaginal tampon by analyzing copy number, methylation markers, and mutations. METHODS: Tampons were collected prior to hysterectomy from 38 EC patients and 28 women with benign indications. Extracted tampon DNA underwent the following: 1) low-coverage whole genome sequencing (LC-WGS) to assess copy number, 2) pyrosequencing to measure percent promotor methylation of HOXA9, RASSF1, and CDH13 and 3) next generation sequencing (NGS) to identify mutations in 19 genes associated with EC identified through The Cancer Genome Atlas. Sensitivity and specificity for each test and test combinations were calculated. RESULTS: Methylation analysis yielded the highest specificities but lowest sensitivities (37-40% sensitivity; 100% specificity for HOXA9, RASSF1 and HTR1B) while mutation analysis had improved sensitivity (50% sensitivity; 83% specificity). Only one "false positive" result for copy number variants was identified among women with benign surgical indications, which was based on detection of copy number changes, and associated with a leiomyosarcoma that was only recognized at hysterectomy. Considering any of the 3 biomarker classes as a positive, resulted in a sensitivity of 92% and specificity of 86%. Mutation analysis did not add sensitivity to the combination of analysis of copy number and methylation. CONCLUSIONS: This study demonstrates a proof-of-principle for non-invasive yet precise detection of endometrial cancer. We propose that with improved biomarker testing, it may be possible to develop a clinically useful test for detecting EC.
Asunto(s)
Metilación de ADN , Neoplasias Endometriales/genética , Dosificación de Gen , Productos para la Higiene Menstrual , Biomarcadores de Tumor/genética , Diagnóstico Diferencial , Neoplasias Endometriales/diagnóstico , Neoplasias Endometriales/patología , Femenino , Humanos , Persona de Mediana Edad , Mutación , Enfermedades Uterinas/diagnóstico , Enfermedades Uterinas/genética , Enfermedades Uterinas/patología , Frotis Vaginal/métodosRESUMEN
Copy number variants (CNVs) are a class of structural variants that may involve complex genomic rearrangements (CGRs) and are hypothesized to have additional mutations around their breakpoints. Understanding the mechanisms underlying CNV formation is fundamental for understanding the repair and mutation mechanisms in cells, thereby shedding light on evolution, genomic disorders, cancer, and complex human traits. In this study, we used data from the 1000 Genomes Project to analyze hundreds of loci harboring heterozygous germline deletions in the subjects NA12878 and NA19240. By utilizing synthetic long-read data (longer than 2 kbp) in combination with high coverage short-read data and, in parallel, by comparing with parental genomes, we interrogated the phasing of these deletions with the flanking tens of thousands of heterozygous SNPs and indels. We found that the density of SNPs/indels flanking the breakpoints of deletions (in-phase variants) is approximately twice as high as the corresponding density for the variants on the haplotype without deletion (out-of-phase variants). This fold change was even larger for the subset of deletions with signatures of replication-based mechanism of formation. The allele frequency (AF) spectrum for deletions is enriched for rare events; and the AF spectrum for in-phase SNPs is shifted toward this deletion spectrum, thus offering evidence consistent with the concomitance of the in-phase SNPs/indels with the deletion events. These findings therefore lend support to the hypothesis that the mutational mechanisms underlying CNV formation are error prone. Our results could also be relevant for resolving mutation-rate discrepancies in human and to explain kataegis.
Asunto(s)
Puntos de Rotura del Cromosoma , Variaciones en el Número de Copia de ADN , Mutagénesis , Replicación del ADN , Femenino , Frecuencia de los Genes , Genoma Humano , Haplotipos , Humanos , Mutación INDEL , Masculino , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Eliminación de SecuenciaRESUMEN
Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage whole-genome sequencing data, utilizing information from both exon-exon junctions and discordant paired-end reads. We found 503 parent genes having novel retroduplications absent from the reference genome. Based solely on retroduplication variation, we built phylogenetic trees of human populations; these represent superpopulation structure well and indicate that variable retroduplications are effective population markers. We further identified 43 retroduplication parent genes differentiating superpopulations. This group contains several interesting insertion events, including a SLMO2 retroduplication and insertion into CAV3, which has a potential disease association. We also found retroduplications to be associated with a variety of genomic features: (1) Insertion sites were correlated with regular nucleosome positioning. (2) They, predictably, tend to avoid conserved functional regions, such as exons, but, somewhat surprisingly, also avoid introns. (3) Retroduplications tend to be co-inserted with young L1 elements, indicating recent retrotranspositional activity, and (4) they have a weak tendency to originate from highly expressed parent genes. Our investigation provides insight into the functional impact and association with genomic elements of retroduplications. We anticipate our approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of retroduplications can potentially improve the accuracy of SNP calling.
Asunto(s)
Duplicación de Gen/genética , Variación Genética/genética , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Retroelementos/genética , Análisis de Secuencia de ADN/métodos , Elementos Transponibles de ADN/genética , Exoma/genética , Humanos , Especificidad de la EspecieRESUMEN
Reprogramming somatic cells into induced pluripotent stem cells (iPSCs) has been suspected of causing de novo copy number variation. To explore this issue, here we perform a whole-genome and transcriptome analysis of 20 human iPSC lines derived from the primary skin fibroblasts of seven individuals using next-generation sequencing. We find that, on average, an iPSC line manifests two copy number variants (CNVs) not apparent in the fibroblasts from which the iPSC was derived. Using PCR and digital droplet PCR, we show that at least 50% of those CNVs are present as low-frequency somatic genomic variants in parental fibroblasts (that is, the fibroblasts from which each corresponding human iPSC line is derived), and are manifested in iPSC lines owing to their clonal origin. Hence, reprogramming does not necessarily lead to de novo CNVs in iPSCs, because most of the line-manifested CNVs reflect somatic mosaicism in the human skin. Moreover, our findings demonstrate that clonal expansion, and iPSC lines in particular, can be used as a discovery tool to reliably detect low-frequency CNVs in the tissue of origin. Overall, we estimate that approximately 30% of the fibroblast cells have somatic CNVs in their genomes, suggesting widespread somatic mosaicism in the human body. Our study paves the way to understanding the fundamental question of the extent to which cells of the human body normally acquire structural alterations in their DNA post-zygotically.
Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Células Madre Pluripotentes Inducidas/metabolismo , Mosaicismo , Piel/metabolismo , Diferenciación Celular , Células Cultivadas , Reprogramación Celular , Células Clonales , Fibroblastos/citología , Perfilación de la Expresión Génica , Genoma Humano/genética , Humanos , Células Madre Pluripotentes Inducidas/citología , Masculino , Neuronas/citología , Reacción en Cadena de la Polimerasa , Reproducibilidad de los Resultados , Piel/citologíaRESUMEN
Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.