Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 89
Filtrar
1.
Cell ; 162(2): 375-390, 2015 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-26186191

RESUMO

Autism spectrum disorder (ASD) is a disorder of brain development. Most cases lack a clear etiology or genetic basis, and the difficulty of re-enacting human brain development has precluded understanding of ASD pathophysiology. Here we use three-dimensional neural cultures (organoids) derived from induced pluripotent stem cells (iPSCs) to investigate neurodevelopmental alterations in individuals with severe idiopathic ASD. While no known underlying genomic mutation could be identified, transcriptome and gene network analyses revealed upregulation of genes involved in cell proliferation, neuronal differentiation, and synaptic assembly. ASD-derived organoids exhibit an accelerated cell cycle and overproduction of GABAergic inhibitory neurons. Using RNA interference, we show that overexpression of the transcription factor FOXG1 is responsible for the overproduction of GABAergic neurons. Altered expression of gene network modules and FOXG1 are positively correlated with symptom severity. Our data suggest that a shift toward GABAergic neuron fate caused by FOXG1 is a developmental precursor of ASD.


Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Transtornos Globais do Desenvolvimento Infantil/patologia , Fatores de Transcrição Forkhead/metabolismo , Proteínas do Tecido Nervoso/metabolismo , Neurogênese , Telencéfalo/embriologia , Feminino , Perfilação da Expressão Gênica , Humanos , Células-Tronco Pluripotentes Induzidas , Masculino , Megalencefalia/genética , Megalencefalia/patologia , Modelos Biológicos , Neurônios/citologia , Neurônios/metabolismo , Organoides/patologia , Telencéfalo/patologia
2.
Proc Natl Acad Sci U S A ; 121(31): e2322834121, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39042694

RESUMO

We developed a generally applicable method, CRISPR/Cas9-targeted long-read sequencing (CTLR-Seq), to resolve, haplotype-specifically, the large and complex regions in the human genome that had been previously impenetrable to sequencing analysis, such as large segmental duplications (SegDups) and their associated genome rearrangements. CTLR-Seq combines in vitro Cas9-mediated cutting of the genome and pulse-field gel electrophoresis to isolate intact large (i.e., up to 2,000 kb) genomic regions that encompass previously unresolvable genomic sequences. These targets are then sequenced (amplification-free) at high on-target coverage using long-read sequencing, allowing for their complete sequence assembly. We applied CTLR-Seq to the SegDup-mediated rearrangements that constitute the boundaries of, and give rise to, the 22q11.2 Deletion Syndrome (22q11DS), the most common human microdeletion disorder. We then performed de novo assembly to resolve, at base-pair resolution, the full sequence rearrangements and exact chromosomal breakpoints of 22q11.2DS (including all common subtypes). Across multiple patients, we found a high degree of variability for both the rearranged SegDup sequences and the exact chromosomal breakpoint locations, which coincide with various transposons within the 22q11.2 SegDups, suggesting that 22q11DS can be driven by transposon-mediated genome recombination. Guided by CTLR-Seq results from two 22q11DS patients, we performed three-dimensional chromosomal folding analysis for the 22q11.2 SegDups from patient-derived neurons and astrocytes and found chromosome interactions anchored within the SegDups to be both cell type-specific and patient-specific. Lastly, we demonstrated that CTLR-Seq enables cell-type specific analysis of DNA methylation patterns within the deletion haplotype of 22q11DS.


Assuntos
Síndrome de DiGeorge , Humanos , Síndrome de DiGeorge/genética , Sistemas CRISPR-Cas , Pontos de Quebra do Cromossomo , Cromossomos Humanos Par 22/genética , Genoma Humano , Rearranjo Gênico , Análise de Sequência de DNA/métodos , Deleção Cromossômica
3.
Bioinformatics ; 40(8)2024 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-39018173

RESUMO

SUMMARY: Copy number variation (CNV) and alteration (CNA) analysis is a crucial component in many genomic studies and its applications span from basic research to clinic diagnostics and personalized medicine. CNVpytor is a tool featuring a read depth-based caller and combined read depth and B-allele frequency (BAF) based 2D caller to find CNVs and CNAs. The tool stores processed intermediate data and CNV/CNA calls in a compact HDF5 file-pytor file. Here, we describe a new track in igv.js that utilizes pytor and whole genome variant files as input for on-the-fly read depth and BAF visualization, CNV/CNA calling and analysis. Embedding into HTML pages and Jupiter Notebooks enables convenient remote data access and visualization simplifying interpretation and analysis of omics data. AVAILABILITY AND IMPLEMENTATION: The CNVpytor track is integrated with igv.js and available at https://github.com/igvteam/igv.js. The documentation is available at https://github.com/igvteam/igv.js/wiki/cnvpytor. Usage can be tested in the IGV-Web app at https://igv.org/app and also on https://github.com/abyzovlab/CNVpytor.


Assuntos
Variações do Número de Cópias de DNA , Genômica , Software , Genômica/métodos , Humanos
4.
Nucleic Acids Res ; 51(10): e57, 2023 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-37026484

RESUMO

Mosaic mutations can be used to track cell ancestries and reconstruct high-resolution lineage trees during cancer progression and during development, starting from the first cell divisions of the zygote. However, this approach requires sampling and analyzing the genomes of multiple cells, which can be redundant in lineage representation, limiting the scalability of the approach. We describe a strategy for cost- and time-efficient lineage reconstruction using clonal induced pluripotent stem cell lines from human skin fibroblasts. The approach leverages shallow sequencing coverage to assess the clonality of the lines, clusters redundant lines and sums their coverage to accurately discover mutations in the corresponding lineages. Only a fraction of lines needs to be sequenced to high coverage. We demonstrate the effectiveness of this approach for reconstructing lineage trees during development and in hematologic malignancies. We discuss and propose an optimal experimental design for reconstructing lineage trees.


Assuntos
Linhagem da Célula , Neoplasias , Software , Humanos , Células Germinativas , Mutação , Neoplasias/patologia
5.
Annu Rev Genomics Hum Genet ; 21: 101-116, 2020 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-32413272

RESUMO

Tracing cell lineages is fundamental for understanding the rules governing development in multicellular organisms and delineating complex biological processes involving the differentiation of multiple cell types with distinct lineage hierarchies. In humans, experimental lineage tracing is unethical, and one has to rely on natural-mutation markers that are created within cells as they proliferate and age. Recent studies have demonstrated that it is now possible to trace lineages in normal, noncancerous cells with a variety of data types using natural variations in the nuclear and mitochondrial DNA as well as variations in DNA methylation status. It is also apparent that the scientific community is on the verge of being able to make a comprehensive and detailed cell lineage map of human embryonic and fetal development. In this review, we discuss the advantages and disadvantages of different approaches and markers for lineage tracing. We also describe the general conceptual design for how to derive a lineage map for humans.


Assuntos
Diferenciação Celular , Linhagem da Célula , Núcleo Celular/genética , Metilação de DNA , DNA Mitocondrial/análise , Embrião de Mamíferos/citologia , DNA Mitocondrial/genética , Biologia do Desenvolvimento , Embrião de Mamíferos/metabolismo , Humanos , Análise de Célula Única
6.
Genome Res ; 30(12): 1695-1704, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33122304

RESUMO

Somatic mosaicism, manifesting as single nucleotide variants (SNVs), mobile element insertions, and structural changes in the DNA, is a common phenomenon in human brain cells, with potential functional consequences. Using a clonal approach, we previously detected 200-400 mosaic SNVs per cell in three human fetal brains (15-21 wk postconception). However, structural variation in the human fetal brain has not yet been investigated. Here, we discover and validate four mosaic structural variants (SVs) in the same brains and resolve their precise breakpoints. The SVs were of kilobase scale and complex, consisting of deletion(s) and rearranged genomic fragments, which sometimes originated from different chromosomes. Sequences at the breakpoints of these rearrangements had microhomologies, suggesting their origin from replication errors. One SV was found in two clones, and we timed its origin to ∼14 wk postconception. No large scale mosaic copy number variants (CNVs) were detectable in normal fetal human brains, suggesting that previously reported megabase-scale CNVs in neurons arise at later stages of development. By reanalysis of public single nuclei data from adult brain neurons, we detected an extrachromosomal circular DNA event. Our study reveals the existence of mosaic SVs in the developing human brain, likely arising from cell proliferation during mid-neurogenesis. Although relatively rare compared to SNVs and present in ∼10% of neurons, SVs in developing human brain affect a comparable number of bases in the genome (∼6200 vs. ∼4000 bp), implying that they may have similar functional consequences.


Assuntos
Encéfalo/embriologia , DNA Circular/genética , Variação Estrutural do Genoma , Análise de Sequência de DNA/métodos , Evolução Clonal , Feminino , Técnicas de Genotipagem , Idade Gestacional , Humanos , Mosaicismo , Neurogênese , Gravidez
8.
PLoS Comput Biol ; 18(4): e1009487, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35442945

RESUMO

Accurate discovery of somatic mutations in a cell is a challenge that partially lays in immaturity of dedicated analytical approaches. Approaches comparing a cell's genome to a control bulk sample miss common mutations, while approaches to find such mutations from bulk suffer from low sensitivity. We developed a tool, All2, which enables accurate filtering of mutations in a cell without the need for data from bulk(s). It is based on pair-wise comparisons of all cells to each other where every call for base pair substitution and indel is classified as either a germline variant, mosaic mutation, or false positive. As All2 allows for considering dropped-out regions, it is applicable to whole genome and exome analysis of cloned and amplified cells. By applying the approach to a variety of available data, we showed that its application reduces false positives, enables sensitive discovery of high frequency mutations, and is indispensable for conducting high resolution cell lineage tracing.


Assuntos
Exoma , Software , Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL/genética , Mutação/genética , Sequenciamento do Exoma
9.
Genome Res ; 29(3): 472-484, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30737237

RESUMO

K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.


Assuntos
Genoma Humano , Humanos , Células K562 , Cariótipo , Polimorfismo Genético , Sequenciamento Completo do Genoma
10.
Bioinformatics ; 37(7): 1015-1017, 2021 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-32777815

RESUMO

SUMMARY: Defining the precise location of structural variations (SVs) at single-nucleotide breakpoint resolution is a challenging problem due to large gaps in alignment. Previously, Alignment with Gap Excision (AGE) enabled us to define breakpoints of SVs at single-nucleotide resolution; however, AGE requires a vast amount of memory when aligning a pair of long sequences. To address this, we developed a memory-efficient implementation-LongAGE-based on the classical Hirschberg algorithm. We demonstrate an application of LongAGE for resolving breakpoints of SVs embedded into segmental duplications on Pacific Biosciences (PacBio) reads that can be longer than 10 kb. Furthermore, we observed different breakpoints for a deletion and a duplication in the same locus, providing direct evidence that such multi-allelic copy number variants (mCNVs) arise from two or more independent ancestral mutations. AVAILABILITY AND IMPLEMENTATION: LongAGE is implemented in C++ and available on Github at https://github.com/Coaxecva/LongAGE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Variação Estrutural do Genoma , Software , Algoritmos , Variações do Número de Cópias de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Duplicações Segmentares Genômicas , Análise de Sequência de DNA
11.
Nature ; 526(7571): 75-81, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26432246

RESUMO

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Mapeamento Físico do Cromossomo , Sequência de Aminoácidos , Predisposição Genética para Doença , Genética Médica , Genética Populacional , Estudo de Associação Genômica Ampla , Genômica , Genótipo , Haplótipos/genética , Homozigoto , Humanos , Dados de Sequência Molecular , Taxa de Mutação , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Análise de Sequência de DNA , Deleção de Sequência/genética
12.
Nucleic Acids Res ; 47(6): 2766-2777, 2019 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-30773596

RESUMO

Structural variations (SVs) in the human genome originate from different mechanisms related to DNA repair, replication errors, and retrotransposition. Our analyses of 26 927 SVs from the 1000 Genomes Project revealed differential distributions and consequences of SVs of different origin, e.g. deletions from non-allelic homologous recombination (NAHR) are more prone to disrupt chromatin organization while processed pseudogenes can create accessible chromatin. Spontaneous double stranded breaks (DSBs) are the best predictor of enrichment of NAHR deletions in open chromatin. This evidence, along with strong physical interaction of NAHR breakpoints belonging to the same deletion suggests that majority of NAHR deletions are non-meiotic i.e. originate from errors during homology directed repair (HDR) of spontaneous DSBs. In turn, the origin of the spontaneous DSBs is associated with transcription factor binding in accessible chromatin revealing the vulnerability of functional, open chromatin. The chromatin itself is enriched with repeats, particularly fixed Alu elements that provide the homology required to maintain stability via HDR. Through co-localization of fixed Alus and NAHR deletions in open chromatin we hypothesize that old Alu expansion had a stabilizing role on the human genome.


Assuntos
Cromatina/química , Genoma Humano , Variação Estrutural do Genoma/genética , Característica Quantitativa Herdável , Cromatina/metabolismo , Mapeamento Cromossômico , Biologia Computacional , Quebras de DNA de Cadeia Dupla , Dano ao DNA/fisiologia , Reparo do DNA , Recombinação Homóloga , Humanos , Reparo de DNA por Recombinação
13.
Nucleic Acids Res ; 47(8): 3846-3861, 2019 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-30864654

RESUMO

HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.


Assuntos
Mapeamento Cromossômico/métodos , Genoma Humano , Genômica/métodos , Haplótipos , Análise de Sequência de DNA/estatística & dados numéricos , Alelos , Aneuploidia , Metilação de DNA , Variação Estrutural do Genoma , Células Hep G2 , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Cariotipagem , Perda de Heterozigosidade , Polimorfismo de Nucleotídeo Único , Retroelementos
14.
BMC Bioinformatics ; 21(1): 521, 2020 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-33183232

RESUMO

BACKGROUND: The study of mosaic mutation is important since it has been linked to cancer and various disorders. Single cell sequencing has become a powerful tool to study the genome of individual cells for the detection of mosaic mutations. The amount of DNA in a single cell needs to be amplified before sequencing and multiple displacement amplification (MDA) is widely used owing to its low error rate and long fragment length of amplified DNA. However, the phi29 polymerase used in MDA is sensitive to template fragmentation and presence of sites with DNA damage that can lead to biases such as allelic imbalance, uneven coverage and over representation of C to T mutations. It is therefore important to select cells with uniform amplification to decrease false positives and increase sensitivity for mosaic mutation detection. RESULTS: We propose a method, Scellector (single cell selector), which uses haplotype information to detect amplification quality in shallow coverage sequencing data. We tested Scellector on single human neuronal cells, obtained in vitro and amplified by MDA. Qualities were estimated from shallow sequencing with coverage as low as 0.3× per cell and then confirmed using 30× deep coverage sequencing. The high concordance between shallow and high coverage data validated the method. CONCLUSION: Scellector can potentially be used to rank amplifications obtained from single cell platforms relying on a MDA-like amplification step, such as Chromium Single Cell profiling solution.


Assuntos
Técnicas de Amplificação de Ácido Nucleico/métodos , Análise de Célula Única/métodos , Diferenciação Celular , DNA/química , DNA/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Células-Tronco Pluripotentes Induzidas/metabolismo , Neurônios/citologia , Neurônios/metabolismo , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA
15.
Genome Res ; 27(4): 512-523, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28235832

RESUMO

Few studies have been conducted to understand post-zygotic accumulation of mutations in cells of the healthy human body. We reprogrammed 32 skin fibroblast cells from families of donors into human induced pluripotent stem cell (hiPSC) lines. The clonal nature of hiPSC lines allows a high-resolution analysis of the genomes of the founder fibroblast cells without being confounded by the artifacts of single-cell whole-genome amplification. We estimate that on average a fibroblast cell in children has 1035 mostly benign mosaic SNVs. On average, 235 SNVs could be directly confirmed in the original fibroblast population by ultradeep sequencing, down to an allele frequency (AF) of 0.1%. More sensitive droplet digital PCR experiments confirmed more SNVs as mosaic with AF as low as 0.01%, suggesting that 1035 mosaic SNVs per fibroblast cell is the true average. Similar analyses in adults revealed no significant increase in the number of SNVs per cell, suggesting that a major fraction of mosaic SNVs in fibroblasts arises during development. Mosaic SNVs were distributed uniformly across the genome and were enriched in a mutational signature previously observed in cancers and in de novo variants and which, we hypothesize, is a hallmark of normal cell proliferation. Finally, AF distribution of mosaic SNVs had distinct narrow peaks, which could be a characteristic of clonal cell selection, clonal expansion, or both. These findings reveal a large degree of somatic mosaicism in healthy human tissues, link de novo and cancer mutations to somatic mosaicism, and couple somatic mosaicism with cell proliferation.


Assuntos
Evolução Clonal , Variações do Número de Cópias de DNA , Fibroblastos/citologia , Mosaicismo , Acúmulo de Mutações , Proliferação de Células , Células Cultivadas , Fibroblastos/metabolismo , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Células-Tronco Pluripotentes Induzidas/metabolismo , Pele/citologia
16.
Gynecol Oncol ; 156(2): 387-392, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31787246

RESUMO

OBJECTIVE: We aimed to assess whether endometrial cancer (EC) can be detected in shed DNA collected with vaginal tampon by analyzing copy number, methylation markers, and mutations. METHODS: Tampons were collected prior to hysterectomy from 38 EC patients and 28 women with benign indications. Extracted tampon DNA underwent the following: 1) low-coverage whole genome sequencing (LC-WGS) to assess copy number, 2) pyrosequencing to measure percent promotor methylation of HOXA9, RASSF1, and CDH13 and 3) next generation sequencing (NGS) to identify mutations in 19 genes associated with EC identified through The Cancer Genome Atlas. Sensitivity and specificity for each test and test combinations were calculated. RESULTS: Methylation analysis yielded the highest specificities but lowest sensitivities (37-40% sensitivity; 100% specificity for HOXA9, RASSF1 and HTR1B) while mutation analysis had improved sensitivity (50% sensitivity; 83% specificity). Only one "false positive" result for copy number variants was identified among women with benign surgical indications, which was based on detection of copy number changes, and associated with a leiomyosarcoma that was only recognized at hysterectomy. Considering any of the 3 biomarker classes as a positive, resulted in a sensitivity of 92% and specificity of 86%. Mutation analysis did not add sensitivity to the combination of analysis of copy number and methylation. CONCLUSIONS: This study demonstrates a proof-of-principle for non-invasive yet precise detection of endometrial cancer. We propose that with improved biomarker testing, it may be possible to develop a clinically useful test for detecting EC.


Assuntos
Metilação de DNA , Neoplasias do Endométrio/genética , Dosagem de Genes , Produtos de Higiene Menstrual , Biomarcadores Tumorais/genética , Diagnóstico Diferencial , Neoplasias do Endométrio/diagnóstico , Neoplasias do Endométrio/patologia , Feminino , Humanos , Pessoa de Meia-Idade , Mutação , Doenças Uterinas/diagnóstico , Doenças Uterinas/genética , Doenças Uterinas/patologia , Esfregaço Vaginal/métodos
17.
Genome Res ; 26(7): 874-81, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27216746

RESUMO

Copy number variants (CNVs) are a class of structural variants that may involve complex genomic rearrangements (CGRs) and are hypothesized to have additional mutations around their breakpoints. Understanding the mechanisms underlying CNV formation is fundamental for understanding the repair and mutation mechanisms in cells, thereby shedding light on evolution, genomic disorders, cancer, and complex human traits. In this study, we used data from the 1000 Genomes Project to analyze hundreds of loci harboring heterozygous germline deletions in the subjects NA12878 and NA19240. By utilizing synthetic long-read data (longer than 2 kbp) in combination with high coverage short-read data and, in parallel, by comparing with parental genomes, we interrogated the phasing of these deletions with the flanking tens of thousands of heterozygous SNPs and indels. We found that the density of SNPs/indels flanking the breakpoints of deletions (in-phase variants) is approximately twice as high as the corresponding density for the variants on the haplotype without deletion (out-of-phase variants). This fold change was even larger for the subset of deletions with signatures of replication-based mechanism of formation. The allele frequency (AF) spectrum for deletions is enriched for rare events; and the AF spectrum for in-phase SNPs is shifted toward this deletion spectrum, thus offering evidence consistent with the concomitance of the in-phase SNPs/indels with the deletion events. These findings therefore lend support to the hypothesis that the mutational mechanisms underlying CNV formation are error prone. Our results could also be relevant for resolving mutation-rate discrepancies in human and to explain kataegis.


Assuntos
Pontos de Quebra do Cromossomo , Variações do Número de Cópias de DNA , Mutagênese , Replicação do DNA , Feminino , Frequência do Gene , Genoma Humano , Haplótipos , Humanos , Mutação INDEL , Masculino , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Deleção de Sequência
18.
PLoS Comput Biol ; 13(6): e1005567, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28662076

RESUMO

Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage whole-genome sequencing data, utilizing information from both exon-exon junctions and discordant paired-end reads. We found 503 parent genes having novel retroduplications absent from the reference genome. Based solely on retroduplication variation, we built phylogenetic trees of human populations; these represent superpopulation structure well and indicate that variable retroduplications are effective population markers. We further identified 43 retroduplication parent genes differentiating superpopulations. This group contains several interesting insertion events, including a SLMO2 retroduplication and insertion into CAV3, which has a potential disease association. We also found retroduplications to be associated with a variety of genomic features: (1) Insertion sites were correlated with regular nucleosome positioning. (2) They, predictably, tend to avoid conserved functional regions, such as exons, but, somewhat surprisingly, also avoid introns. (3) Retroduplications tend to be co-inserted with young L1 elements, indicating recent retrotranspositional activity, and (4) they have a weak tendency to originate from highly expressed parent genes. Our investigation provides insight into the functional impact and association with genomic elements of retroduplications. We anticipate our approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of retroduplications can potentially improve the accuracy of SNP calling.


Assuntos
Duplicação Gênica/genética , Variação Genética/genética , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Retroelementos/genética , Análise de Sequência de DNA/métodos , Elementos de DNA Transponíveis/genética , Exoma/genética , Humanos , Especificidade da Espécie
19.
Nature ; 492(7429): 438-42, 2012 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-23160490

RESUMO

Reprogramming somatic cells into induced pluripotent stem cells (iPSCs) has been suspected of causing de novo copy number variation. To explore this issue, here we perform a whole-genome and transcriptome analysis of 20 human iPSC lines derived from the primary skin fibroblasts of seven individuals using next-generation sequencing. We find that, on average, an iPSC line manifests two copy number variants (CNVs) not apparent in the fibroblasts from which the iPSC was derived. Using PCR and digital droplet PCR, we show that at least 50% of those CNVs are present as low-frequency somatic genomic variants in parental fibroblasts (that is, the fibroblasts from which each corresponding human iPSC line is derived), and are manifested in iPSC lines owing to their clonal origin. Hence, reprogramming does not necessarily lead to de novo CNVs in iPSCs, because most of the line-manifested CNVs reflect somatic mosaicism in the human skin. Moreover, our findings demonstrate that clonal expansion, and iPSC lines in particular, can be used as a discovery tool to reliably detect low-frequency CNVs in the tissue of origin. Overall, we estimate that approximately 30% of the fibroblast cells have somatic CNVs in their genomes, suggesting widespread somatic mosaicism in the human body. Our study paves the way to understanding the fundamental question of the extent to which cells of the human body normally acquire structural alterations in their DNA post-zygotically.


Assuntos
Variações do Número de Cópias de DNA/genética , Células-Tronco Pluripotentes Induzidas/metabolismo , Mosaicismo , Pele/metabolismo , Diferenciação Celular , Células Cultivadas , Reprogramação Celular , Células Clonais , Fibroblastos/citologia , Perfilação da Expressão Gênica , Genoma Humano/genética , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Masculino , Neurônios/citologia , Reação em Cadeia da Polimerase , Reprodutibilidade dos Testes , Pele/citologia
20.
Nature ; 489(7414): 91-100, 2012 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-22955619

RESUMO

Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.


Assuntos
DNA/genética , Enciclopédias como Assunto , Redes Reguladoras de Genes/genética , Genoma Humano/genética , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo , Alelos , Linhagem Celular , Fator de Transcrição GATA1/metabolismo , Perfilação da Expressão Gênica , Genômica , Humanos , Células K562 , Especificidade de Órgãos , Fosforilação/genética , Polimorfismo de Nucleotídeo Único/genética , Mapas de Interação de Proteínas , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Seleção Genética/genética , Sítio de Iniciação de Transcrição
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA