Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
1.
Genome Res ; 34(1): 7-19, 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38176712

RESUMO

High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features.


Assuntos
Algoritmos , Genoma Humano , Humanos , Análise de Sequência , Variação Estrutural do Genoma , Viés , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala
2.
bioRxiv ; 2023 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-38077078

RESUMO

Starch digestion is a cornerstone of human nutrition. The amylase enzyme, which digests starch, plays a key role in starch metabolism. Indeed, the copy number of the human amylase gene has been associated with metabolic diseases and adaptation to agricultural diets. Previous studies suggested that duplications of the salivary amylase gene are of recent origin. In the course of characterizing 51 distinct amylase haplotypes across 98 individuals employing long-read DNA sequencing and optical mapping methods, we detected four 31mers linked to duplication of the amylase locus. Analyses with these 31mers suggest that the first duplication of the amylase locus occurred more than 700,000 years ago before the split between modern humans and Neanderthals. After the original duplication events, amplification of the AMY1 genes likely occurred via nonallelic homologous recombination in a manner that consistently results in an odd number of copies per chromosome. These findings suggest that amylase haplotypes may have been primed for bursts of natural-selection associated duplications that coincided with the incorporation of starch into human diets.

3.
Nature ; 621(7978): 355-364, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37612510

RESUMO

The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.


Assuntos
Cromossomos Humanos Y , Evolução Molecular , Humanos , Masculino , Cromossomos Humanos Y/genética , Genoma Humano/genética , Genômica , Taxa de Mutação , Fenótipo , Eucromatina/genética , Pseudogenes , Variação Genética/genética , Cromossomos Humanos X/genética , Regiões Pseudoautossômicas/genética
4.
bioRxiv ; 2023 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-37425850

RESUMO

High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥ 50 bp) has improved to near basepair precision. Despite these advances, many SVs in unique regions of the genome are subject to systematic bias that affects breakpoint location. This ambiguity leads to less accurate variant comparisons across samples, and it obscures true breakpoint features needed for mechanistic inferences. To understand why SVs are not consistently placed, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identified variable breakpoints for 882 SV insertions and 180 SV deletions not anchored in tandem repeats (TRs) or segmental duplications (SDs). While this is unexpectedly high for genome assemblies in unique loci, we find read-based callsets from the same sequencing data yielded 1,566 insertions and 986 deletions with inconsistent breakpoints also not anchored in TRs or SDs. When we investigated causes for breakpoint inaccuracy, we found sequence and assembly errors had minimal impact, but we observed a strong effect of ancestry. We confirmed that polymorphic mismatches and small indels are enriched at shifted breakpoints and that these polymorphisms are generally lost when breakpoints shift. Long tracts of homology, such as SVs mediated by transposable elements, increase the likelihood of imprecise SV calls and the distance they are shifted. Tandem Duplication (TD) breakpoints are the most heavily affected SV class with 14% of TDs placed at different locations across haplotypes. While graph genome methods normalize SV calls across many samples, the resulting breakpoints are sometimes incorrect, highlighting a need to tune graph methods for breakpoint accuracy. The breakpoint inconsistencies we characterize collectively affect ~5% of the SVs called in a human genome and underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoint placement, and increase the value of callsets for investigating mutational processes.

5.
Cell Rep ; 42(6): 112625, 2023 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-37294634

RESUMO

Endogenous retroviruses (ERVs) have rewired host gene networks. To explore the origins of co-option, we employed an active murine ERV, IAPEz, and an embryonic stem cell (ESC) to neural progenitor cell (NPC) differentiation model. Transcriptional silencing via TRIM28 maps to a 190 bp sequence encoding the intracisternal A-type particle (IAP) signal peptide, which confers retrotransposition activity. A subset of "escapee" IAPs (∼15%) exhibits significant genetic divergence from this sequence. Canonical repressed IAPs succumb to a previously undocumented demarcation by H3K9me3 and H3K27me3 in NPCs. Escapee IAPs, in contrast, evade repression in both cell types, resulting in their transcriptional derepression, particularly in NPCs. We validate the enhancer function of a 47 bp sequence within the U3 region of the long terminal repeat (LTR) and show that escapee IAPs convey an activating effect on nearby neural genes. In sum, co-opted ERVs stem from genetic escapees that have lost vital sequences required for both TRIM28 restriction and autonomous retrotransposition.


Assuntos
Retrovirus Endógenos , Proteína 28 com Motivo Tripartido , Animais , Camundongos , Diferenciação Celular , Células-Tronco Embrionárias/metabolismo , Retrovirus Endógenos/genética , Retrovirus Endógenos/metabolismo , Histonas/metabolismo , Proteína 28 com Motivo Tripartido/metabolismo , Sequências Repetidas Terminais/genética
6.
Cell Genom ; 3(5): 100291, 2023 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-37228752

RESUMO

Diverse inbred mouse strains are important biomedical research models, yet genome characterization of many strains is fundamentally lacking in comparison with humans. In particular, catalogs of structural variants (SVs) (variants ≥ 50 bp) are incomplete, limiting the discovery of causative alleles for phenotypic variation. Here, we resolve genome-wide SVs in 20 genetically distinct inbred mice with long-read sequencing. We report 413,758 site-specific SVs affecting 13% (356 Mbp) of the mouse reference assembly, including 510 previously unannotated coding variants. We substantially improve the Mus musculus transposable element (TE) callset, and we find that TEs comprise 39% of SVs and account for 75% of altered bases. We further utilize this callset to investigate how TE heterogeneity affects mouse embryonic stem cells and find multiple TE classes that influence chromatin accessibility. Our work provides a comprehensive analysis of SVs found in diverse mouse genomes and illustrates the role of TEs in epigenetic differences.

7.
bioRxiv ; 2023 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-37205567

RESUMO

Advances in long-read sequencing (LRS) technology continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant calling precision and recall of Oxford Nanopore Technologies (ONT) and PacBio HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant calling precision and recall of SVs and indels in HiFi datasets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant callsets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.

8.
Genome Res ; 33(12): 2029-2040, 2023 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-38190646

RESUMO

Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.


Assuntos
Genômica , Nanoporos , Mutação INDEL , Sequenciamento Completo do Genoma
9.
Nat Commun ; 13(1): 7115, 2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36402840

RESUMO

Transposable elements constitute about half of human genomes, and their role in generating human variation through retrotransposition is broadly studied and appreciated. Structural variants mediated by transposons, which we call transposable element-mediated rearrangements (TEMRs), are less well studied, and the mechanisms leading to their formation as well as their broader impact on human diversity are poorly understood. Here, we identify 493 unique TEMRs across the genomes of three individuals. While homology directed repair is the dominant driver of TEMRs, our sequence-resolved TEMR resource allows us to identify complex inversion breakpoints, triplications or other high copy number polymorphisms, and additional complexities. TEMRs are enriched in genic loci and can create potentially important risk alleles such as a deletion in TRIM65, a known cancer biomarker and therapeutic target. These findings expand our understanding of this important class of structural variation, the mechanisms responsible for their formation, and establish them as an important driver of human diversity.


Assuntos
Elementos de DNA Transponíveis , Genoma Humano , Humanos , Elementos de DNA Transponíveis/genética , Genoma Humano/genética , Rearranjo Gênico/genética , Variações do Número de Cópias de DNA , Proteínas com Motivo Tripartido/genética , Ubiquitina-Proteína Ligases/genética
10.
Nat Methods ; 19(10): 1230-1233, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36109679

RESUMO

Complex structural variants (CSVs) encompass multiple breakpoints and are often missed or misinterpreted. We developed SVision, a deep-learning-based multi-object-recognition framework, to automatically detect and haracterize CSVs from long-read sequencing data. SVision outperforms current callers at identifying the internal structure of complex events and has revealed 80 high-quality CSVs with 25 distinct structures from an individual genome. SVision directly detects CSVs without matching known structures, allowing sensitive detection of both common and previously uncharacterized complex rearrangements.


Assuntos
Aprendizado Profundo , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA
11.
Cell ; 185(11): 1986-2005.e26, 2022 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-35525246

RESUMO

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.


Assuntos
Inversão Cromossômica , Duplicações Segmentares Genômicas , Inversão Cromossômica/genética , Variações do Número de Cópias de DNA/genética , Genoma Humano , Genômica , Humanos
12.
Genome Med ; 14(1): 44, 2022 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-35484572

RESUMO

Structural variants (SVs) are implicated in the etiology of Mendelian diseases but have been systematically underascertained owing to sequencing technology limitations. Long-read sequencing enables comprehensive detection of SVs, but approaches for prioritization of candidate SVs are needed. Structural variant Annotation and analysis (SvAnna) assesses all classes of SVs and their intersection with transcripts and regulatory sequences, relating predicted effects on gene function with clinical phenotype data. SvAnna places 87% of deleterious SVs in the top ten ranks. The interpretable prioritizations offered by SvAnna will facilitate the widespread adoption of long-read sequencing in diagnostic genomics. SvAnna is available at https://github.com/TheJacksonLaboratory/SvAnn a .


Assuntos
Genômica , Sequência de Bases , Mapeamento Cromossômico , Humanos , Análise de Sequência de DNA , Virulência
13.
Sci Adv ; 8(3): eabg6711, 2022 01 21.
Artigo em Inglês | MEDLINE | ID: mdl-35044822

RESUMO

Tumors display widespread transcriptome alterations, but the full repertoire of isoform-level alternative splicing in cancer is unknown. We developed a long-read (LR) RNA sequencing and analytical platform that identifies and annotates full-length isoforms and infers tumor-specific splicing events. Application of this platform to breast cancer samples identifies thousands of previously unannotated isoforms; ~30% affect protein coding exons and are predicted to alter protein localization and function. We performed extensive cross-validation with -omics datasets to support transcription and translation of novel isoforms. We identified 3059 breast tumor­specific splicing events, including 35 that are significantly associated with patient survival. Of these, 21 are absent from GENCODE and 10 are enriched in specific breast cancer subtypes. Together, our results demonstrate the complexity, cancer subtype specificity, and clinical relevance of previously unidentified isoforms and splicing events in breast cancer that are only annotatable by LR-seq and provide a rich resource of immuno-oncology therapeutic targets.


Assuntos
Neoplasias da Mama , Processamento Alternativo , Neoplasias da Mama/genética , Feminino , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Análise de Sequência de RNA/métodos , Transcriptoma
14.
Trends Genet ; 37(8): 717-729, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-33199048

RESUMO

Mutation of the human genome results in three classes of genomic variation: single nucleotide variants; short insertions or deletions; and large structural variants (SVs). Some mutations occur during normal processes, such as meiotic recombination or B cell development, and others result from DNA replication or aberrant repair of breaks in sequence-specific contexts. Regardless of mechanism, mutations are subject to selection, and some hotspots can manifest in disease. Here, we discuss genomic regions prone to mutation, mechanisms contributing to mutation susceptibility, and the processes leading to their accumulation in normal and somatic genomes. With further, more accurate human genome sequencing, additional mutation hotspots, mechanistic details of their formation, and the relevance of hotspots to evolution and disease are likely to be discovered.


Assuntos
Genoma Humano/genética , Genômica , Mutação/genética , Replicação do DNA/genética , Variação Estrutural do Genoma/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Recombinação Genética/genética
15.
Nat Genet ; 52(9): 891-897, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32807987

RESUMO

Extrachromosomal DNA (ecDNA) amplification promotes intratumoral genetic heterogeneity and accelerated tumor evolution1-3; however, its frequency and clinical impact are unclear. Using computational analysis of whole-genome sequencing data from 3,212 cancer patients, we show that ecDNA amplification frequently occurs in most cancer types but not in blood or normal tissue. Oncogenes were highly enriched on amplified ecDNA, and the most common recurrent oncogene amplifications arose on ecDNA. EcDNA amplifications resulted in higher levels of oncogene transcription compared to copy number-matched linear DNA, coupled with enhanced chromatin accessibility, and more frequently resulted in transcript fusions. Patients whose cancers carried ecDNA had significantly shorter survival, even when controlled for tissue type, than patients whose cancers were not driven by ecDNA-based oncogene amplification. The results presented here demonstrate that ecDNA-based oncogene amplification is common in cancer, is different from chromosomal amplification and drives poor outcome for patients across many cancer types.


Assuntos
Cromossomos/genética , DNA/genética , Amplificação de Genes/genética , Neoplasias/genética , Oncogenes/genética , Linhagem Celular Tumoral , Cromatina/genética , Humanos
16.
Viruses ; 12(7)2020 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-32708087

RESUMO

Insertions of endogenous retroviruses cause a significant fraction of mutations in inbred mice but not all strains are equally susceptible. Notably, most new Intracisternal A particle (IAP) ERV mutagenic insertions have occurred in C3H mice. We show here that strain-specific insertional polymorphic IAPs accumulate faster in C3H/HeJ mice, relative to other sequenced strains, and that IAP transcript levels are higher in C3H/HeJ embryonic stem (ES) cells compared to other ES cells. To investigate the mechanism for high IAP activity in C3H mice, we identified 61 IAP copies in C3H/HeJ ES cells enriched with H3K4me3 (a mark of active promoters) and, among those tested, all are unmethylated in C3H/HeJ ES cells. Notably, 13 of the 61 are specific to C3H/HeJ and are members of the non-autonomous 1Δ1 IAP subfamily that is responsible for nearly all new insertions in C3H. One copy is full length with intact open reading frames and hence potentially capable of providing proteins in trans to other 1Δ1 elements. This potential "master copy" is present in other strains, including 129, but its 5' long terminal repeat (LTR) is methylated in 129 ES cells. Thus, the unusual IAP activity in C3H may be due to reduced epigenetic repression coupled with the presence of a master copy.


Assuntos
Epigenômica , Genes de Partícula A Intracisternal/genética , Genes de Partícula A Intracisternal/fisiologia , Camundongos Endogâmicos C3H/genética , Animais , Células Cultivadas , Células-Tronco Embrionárias , Metilação , Camundongos , Camundongos Endogâmicos C57BL/genética , Regiões Promotoras Genéticas , Especificidade da Espécie , Sequências Repetidas Terminais
17.
Chromosome Res ; 28(1): 31-47, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31907725

RESUMO

Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.


Assuntos
Genoma Humano , Variação Estrutural do Genoma , Genômica/métodos , Aberrações Cromossômicas , Bandeamento Cromossômico , Mapeamento Cromossômico , Hibridização Genômica Comparativa , Biologia Computacional/métodos , Genômica/normas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Reprodutibilidade dos Testes
18.
Genome Med ; 11(1): 80, 2019 12 09.
Artigo em Inglês | MEDLINE | ID: mdl-31818324

RESUMO

BACKGROUND: We investigated the features of the genomic rearrangements in a cohort of 50 male individuals with proteolipid protein 1 (PLP1) copy number gain events who were ascertained with Pelizaeus-Merzbacher disease (PMD; MIM: 312080). We then compared our new data to previous structural variant mutagenesis studies involving the Xq22 region of the human genome. The aggregate data from 159 sequenced join-points (discontinuous sequences in the reference genome that are joined during the rearrangement process) were studied. Analysis of these data from 150 individuals enabled the spectrum and relative distribution of the underlying genomic mutational signatures to be delineated. METHODS: Genomic rearrangements in PMD individuals with PLP1 copy number gain events were investigated by high-density customized array or clinical chromosomal microarray analysis and breakpoint junction sequence analysis. RESULTS: High-density customized array showed that the majority of cases (33/50; ~ 66%) present with single duplications, although complex genomic rearrangements (CGRs) are also frequent (17/50; ~ 34%). Breakpoint mapping to nucleotide resolution revealed further previously unknown structural and sequence complexities, even in single duplications. Meta-analysis of all studied rearrangements that occur at the PLP1 locus showed that single duplications were found in ~ 54% of individuals and that, among all CGR cases, triplication flanked by duplications is the most frequent CGR array CGH pattern observed. Importantly, in ~ 32% of join-points, there is evidence for a mutational signature of microhomeology (highly similar yet imperfect sequence matches). CONCLUSIONS: These data reveal a high frequency of CGRs at the PLP1 locus and support the assertion that replication-based mechanisms are prominent contributors to the formation of CGRs at Xq22. We propose that microhomeology can facilitate template switching, by stabilizing strand annealing of the primer using W-C base complementarity, and is a mutational signature for replicative repair.


Assuntos
Variações do Número de Cópias de DNA , Rearranjo Gênico , Mutação , Proteína Proteolipídica de Mielina/genética , Pontos de Quebra do Cromossomo , Hibridização Genômica Comparativa , Duplicação Gênica , Estudos de Associação Genética , Predisposição Genética para Doença , Genoma Humano , Instabilidade Genômica , Genômica/métodos , Humanos , Polimorfismo de Nucleotídeo Único
19.
Cell ; 176(6): 1310-1324.e10, 2019 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-30827684

RESUMO

DNA rearrangements resulting in human genome structural variants (SVs) are caused by diverse mutational mechanisms. We used long- and short-read sequencing technologies to investigate end products of de novo chromosome 17p11.2 rearrangements and query the molecular mechanisms underlying both recurrent and non-recurrent events. Evidence for an increased rate of clustered single-nucleotide variant (SNV) mutation in cis with non-recurrent rearrangements was found. Indel and SNV formation are associated with both copy-number gains and losses of 17p11.2, occur up to ∼1 Mb away from the breakpoint junctions, and favor C > G transversion substitutions; results suggest that single-stranded DNA is formed during the genesis of the SV and provide compelling support for a microhomology-mediated break-induced replication (MMBIR) mechanism for SV formation. Our data show an additional mutational burden of MMBIR consisting of hypermutation confined to the locus and manifesting as SNVs and indels predominantly within genes.


Assuntos
Cromossomos Humanos Par 17 , Mutação , Anormalidades Múltiplas/genética , Pontos de Quebra do Cromossomo , Transtornos Cromossômicos/genética , Duplicação Cromossômica/genética , Variações do Número de Cópias de DNA , Reparo do DNA/genética , Replicação do DNA , Rearranjo Gênico , Genoma Humano , Variação Estrutural do Genoma , Humanos , Mutação INDEL , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Recombinação Genética , Análise de Sequência de DNA/métodos , Síndrome de Smith-Magenis/genética
20.
Genome Res ; 28(8): 1228-1242, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-29907612

RESUMO

Alu elements, the short interspersed element numbering more than 1 million copies per human genome, can mediate the formation of copy number variants (CNVs) between substrate pairs. These Alu/Alu-mediated rearrangements (AAMRs) can result in pathogenic variants that cause diseases. To investigate the impact of AAMR on gene variation and human health, we first characterized Alus that are involved in mediating CNVs (CNV-Alus) and observed that these Alus tend to be evolutionarily younger. We then computationally generated, with the assistance of a supercomputer, a test data set consisting of 78 million Alu pairs and predicted ∼18% of them are potentially susceptible to AAMR. We further determined the relative risk of AAMR in 12,074 OMIM genes using the count of predicted CNV-Alu pairs and experimentally validated the predictions with 89 samples selected by correlating predicted hotspots with a database of CNVs identified by clinical chromosomal microarrays (CMAs) on the genomes of approximately 54,000 subjects. We fine-mapped 47 duplications, 40 deletions, and two complex rearrangements and examined a total of 52 breakpoint junctions of simple CNVs. Overall, 94% of the candidate breakpoints were at least partially Alu mediated. We successfully predicted all (100%) of Alu pairs that mediated deletions (n = 21) and achieved an 87% positive predictive value overall when including AAMR-generated deletions and duplications. We provided a tool, AluAluCNVpredictor, for assessing AAMR hotspots and their role in human disease. These results demonstrate the utility of our predictive model and provide insights into the genomic features and molecular mechanisms underlying AAMR.


Assuntos
Elementos Alu/genética , Variações do Número de Cópias de DNA/genética , Instabilidade Genômica/genética , Duplicação Gênica/genética , Genoma Humano/genética , Humanos , Deleção de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA