Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
1.
Genome Res ; 34(1): 7-19, 2024 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-38176712

RESUMEN

High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥50 bp) has improved to near base pair precision. Despite these advances, many SV breakpoint locations are subject to systematic bias affecting variant representation. To understand why SV breakpoints are inconsistent across samples, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identify 882 SV insertions and 180 SV deletions with variable breakpoints not anchored in tandem repeats (TRs) or segmental duplications (SDs). SVs called from aligned sequencing reads increase breakpoint disagreements by 2×-16×. Sequence accuracy had a minimal impact on breakpoints, but we observe a strong effect of ancestry. We confirm that SNP and indel polymorphisms are enriched at shifted breakpoints and are also absent from variant callsets. Breakpoint homology increases the likelihood of imprecise SV calls and the distance they are shifted, and tandem duplications are the most heavily affected SVs. Because graph genome methods normalize SV calls across samples, we investigated graphs generated by two different methods and find the resulting breakpoints are subject to other technical biases affecting breakpoint accuracy. The breakpoint inconsistencies we characterize affect ∼5% of the SVs called in a human genome and can impact variant interpretation and annotation. These limitations underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoints, and increase the value of callsets for investigating breakpoint features.


Asunto(s)
Algoritmos , Genoma Humano , Humanos , Análisis de Secuencia , Variación Estructural del Genoma , Sesgo , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento
2.
bioRxiv ; 2023 Nov 28.
Artículo en Inglés | MEDLINE | ID: mdl-38077078

RESUMEN

Starch digestion is a cornerstone of human nutrition. The amylase enzyme, which digests starch, plays a key role in starch metabolism. Indeed, the copy number of the human amylase gene has been associated with metabolic diseases and adaptation to agricultural diets. Previous studies suggested that duplications of the salivary amylase gene are of recent origin. In the course of characterizing 51 distinct amylase haplotypes across 98 individuals employing long-read DNA sequencing and optical mapping methods, we detected four 31mers linked to duplication of the amylase locus. Analyses with these 31mers suggest that the first duplication of the amylase locus occurred more than 700,000 years ago before the split between modern humans and Neanderthals. After the original duplication events, amplification of the AMY1 genes likely occurred via nonallelic homologous recombination in a manner that consistently results in an odd number of copies per chromosome. These findings suggest that amylase haplotypes may have been primed for bursts of natural-selection associated duplications that coincided with the incorporation of starch into human diets.

3.
Nature ; 621(7978): 355-364, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37612510

RESUMEN

The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.


Asunto(s)
Cromosomas Humanos Y , Evolución Molecular , Humanos , Masculino , Cromosomas Humanos Y/genética , Genoma Humano/genética , Genómica , Tasa de Mutación , Fenotipo , Eucromatina/genética , Seudogenes , Variación Genética/genética , Cromosomas Humanos X/genética , Regiones Pseudoautosómicas/genética
4.
bioRxiv ; 2023 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-37425850

RESUMEN

High-quality genome assemblies and sophisticated algorithms have increased sensitivity for a wide range of variant types, and breakpoint accuracy for structural variants (SVs, ≥ 50 bp) has improved to near basepair precision. Despite these advances, many SVs in unique regions of the genome are subject to systematic bias that affects breakpoint location. This ambiguity leads to less accurate variant comparisons across samples, and it obscures true breakpoint features needed for mechanistic inferences. To understand why SVs are not consistently placed, we reanalyzed 64 phased haplotypes constructed from long-read assemblies released by the Human Genome Structural Variation Consortium (HGSVC). We identified variable breakpoints for 882 SV insertions and 180 SV deletions not anchored in tandem repeats (TRs) or segmental duplications (SDs). While this is unexpectedly high for genome assemblies in unique loci, we find read-based callsets from the same sequencing data yielded 1,566 insertions and 986 deletions with inconsistent breakpoints also not anchored in TRs or SDs. When we investigated causes for breakpoint inaccuracy, we found sequence and assembly errors had minimal impact, but we observed a strong effect of ancestry. We confirmed that polymorphic mismatches and small indels are enriched at shifted breakpoints and that these polymorphisms are generally lost when breakpoints shift. Long tracts of homology, such as SVs mediated by transposable elements, increase the likelihood of imprecise SV calls and the distance they are shifted. Tandem Duplication (TD) breakpoints are the most heavily affected SV class with 14% of TDs placed at different locations across haplotypes. While graph genome methods normalize SV calls across many samples, the resulting breakpoints are sometimes incorrect, highlighting a need to tune graph methods for breakpoint accuracy. The breakpoint inconsistencies we characterize collectively affect ~5% of the SVs called in a human genome and underscore a need for algorithm development to improve SV databases, mitigate the impact of ancestry on breakpoint placement, and increase the value of callsets for investigating mutational processes.

5.
Cell Rep ; 42(6): 112625, 2023 06 27.
Artículo en Inglés | MEDLINE | ID: mdl-37294634

RESUMEN

Endogenous retroviruses (ERVs) have rewired host gene networks. To explore the origins of co-option, we employed an active murine ERV, IAPEz, and an embryonic stem cell (ESC) to neural progenitor cell (NPC) differentiation model. Transcriptional silencing via TRIM28 maps to a 190 bp sequence encoding the intracisternal A-type particle (IAP) signal peptide, which confers retrotransposition activity. A subset of "escapee" IAPs (∼15%) exhibits significant genetic divergence from this sequence. Canonical repressed IAPs succumb to a previously undocumented demarcation by H3K9me3 and H3K27me3 in NPCs. Escapee IAPs, in contrast, evade repression in both cell types, resulting in their transcriptional derepression, particularly in NPCs. We validate the enhancer function of a 47 bp sequence within the U3 region of the long terminal repeat (LTR) and show that escapee IAPs convey an activating effect on nearby neural genes. In sum, co-opted ERVs stem from genetic escapees that have lost vital sequences required for both TRIM28 restriction and autonomous retrotransposition.


Asunto(s)
Retrovirus Endógenos , Proteína 28 que Contiene Motivos Tripartito , Animales , Ratones , Diferenciación Celular , Células Madre Embrionarias/metabolismo , Retrovirus Endógenos/genética , Retrovirus Endógenos/metabolismo , Histonas/metabolismo , Proteína 28 que Contiene Motivos Tripartito/metabolismo , Secuencias Repetidas Terminales/genética
6.
bioRxiv ; 2023 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-37205567

RESUMEN

Advances in long-read sequencing (LRS) technology continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant calling precision and recall of Oxford Nanopore Technologies (ONT) and PacBio HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant calling precision and recall of SVs and indels in HiFi datasets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant callsets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.

7.
Cell Genom ; 3(5): 100291, 2023 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-37228752

RESUMEN

Diverse inbred mouse strains are important biomedical research models, yet genome characterization of many strains is fundamentally lacking in comparison with humans. In particular, catalogs of structural variants (SVs) (variants ≥ 50 bp) are incomplete, limiting the discovery of causative alleles for phenotypic variation. Here, we resolve genome-wide SVs in 20 genetically distinct inbred mice with long-read sequencing. We report 413,758 site-specific SVs affecting 13% (356 Mbp) of the mouse reference assembly, including 510 previously unannotated coding variants. We substantially improve the Mus musculus transposable element (TE) callset, and we find that TEs comprise 39% of SVs and account for 75% of altered bases. We further utilize this callset to investigate how TE heterogeneity affects mouse embryonic stem cells and find multiple TE classes that influence chromatin accessibility. Our work provides a comprehensive analysis of SVs found in diverse mouse genomes and illustrates the role of TEs in epigenetic differences.

8.
Genome Res ; 33(12): 2029-2040, 2023 12 27.
Artículo en Inglés | MEDLINE | ID: mdl-38190646

RESUMEN

Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.


Asunto(s)
Genómica , Nanoporos , Mutación INDEL , Secuenciación Completa del Genoma
9.
Nat Commun ; 13(1): 7115, 2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36402840

RESUMEN

Transposable elements constitute about half of human genomes, and their role in generating human variation through retrotransposition is broadly studied and appreciated. Structural variants mediated by transposons, which we call transposable element-mediated rearrangements (TEMRs), are less well studied, and the mechanisms leading to their formation as well as their broader impact on human diversity are poorly understood. Here, we identify 493 unique TEMRs across the genomes of three individuals. While homology directed repair is the dominant driver of TEMRs, our sequence-resolved TEMR resource allows us to identify complex inversion breakpoints, triplications or other high copy number polymorphisms, and additional complexities. TEMRs are enriched in genic loci and can create potentially important risk alleles such as a deletion in TRIM65, a known cancer biomarker and therapeutic target. These findings expand our understanding of this important class of structural variation, the mechanisms responsible for their formation, and establish them as an important driver of human diversity.


Asunto(s)
Elementos Transponibles de ADN , Genoma Humano , Humanos , Elementos Transponibles de ADN/genética , Genoma Humano/genética , Reordenamiento Génico/genética , Variaciones en el Número de Copia de ADN , Proteínas de Motivos Tripartitos/genética , Ubiquitina-Proteína Ligasas/genética
10.
Nat Methods ; 19(10): 1230-1233, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36109679

RESUMEN

Complex structural variants (CSVs) encompass multiple breakpoints and are often missed or misinterpreted. We developed SVision, a deep-learning-based multi-object-recognition framework, to automatically detect and haracterize CSVs from long-read sequencing data. SVision outperforms current callers at identifying the internal structure of complex events and has revealed 80 high-quality CSVs with 25 distinct structures from an individual genome. SVision directly detects CSVs without matching known structures, allowing sensitive detection of both common and previously uncharacterized complex rearrangements.


Asunto(s)
Aprendizaje Profundo , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN
11.
Cell ; 185(11): 1986-2005.e26, 2022 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-35525246

RESUMEN

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.


Asunto(s)
Inversión Cromosómica , Duplicaciones Segmentarias en el Genoma , Inversión Cromosómica/genética , Variaciones en el Número de Copia de ADN/genética , Genoma Humano , Genómica , Humanos
12.
Genome Med ; 14(1): 44, 2022 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-35484572

RESUMEN

Structural variants (SVs) are implicated in the etiology of Mendelian diseases but have been systematically underascertained owing to sequencing technology limitations. Long-read sequencing enables comprehensive detection of SVs, but approaches for prioritization of candidate SVs are needed. Structural variant Annotation and analysis (SvAnna) assesses all classes of SVs and their intersection with transcripts and regulatory sequences, relating predicted effects on gene function with clinical phenotype data. SvAnna places 87% of deleterious SVs in the top ten ranks. The interpretable prioritizations offered by SvAnna will facilitate the widespread adoption of long-read sequencing in diagnostic genomics. SvAnna is available at https://github.com/TheJacksonLaboratory/SvAnn a .


Asunto(s)
Genómica , Secuencia de Bases , Mapeo Cromosómico , Humanos , Análisis de Secuencia de ADN , Virulencia
13.
Sci Adv ; 8(3): eabg6711, 2022 01 21.
Artículo en Inglés | MEDLINE | ID: mdl-35044822

RESUMEN

Tumors display widespread transcriptome alterations, but the full repertoire of isoform-level alternative splicing in cancer is unknown. We developed a long-read (LR) RNA sequencing and analytical platform that identifies and annotates full-length isoforms and infers tumor-specific splicing events. Application of this platform to breast cancer samples identifies thousands of previously unannotated isoforms; ~30% affect protein coding exons and are predicted to alter protein localization and function. We performed extensive cross-validation with -omics datasets to support transcription and translation of novel isoforms. We identified 3059 breast tumor­specific splicing events, including 35 that are significantly associated with patient survival. Of these, 21 are absent from GENCODE and 10 are enriched in specific breast cancer subtypes. Together, our results demonstrate the complexity, cancer subtype specificity, and clinical relevance of previously unidentified isoforms and splicing events in breast cancer that are only annotatable by LR-seq and provide a rich resource of immuno-oncology therapeutic targets.


Asunto(s)
Neoplasias de la Mama , Empalme Alternativo , Neoplasias de la Mama/genética , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Análisis de Secuencia de ARN/métodos , Transcriptoma
14.
Trends Genet ; 37(8): 717-729, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-33199048

RESUMEN

Mutation of the human genome results in three classes of genomic variation: single nucleotide variants; short insertions or deletions; and large structural variants (SVs). Some mutations occur during normal processes, such as meiotic recombination or B cell development, and others result from DNA replication or aberrant repair of breaks in sequence-specific contexts. Regardless of mechanism, mutations are subject to selection, and some hotspots can manifest in disease. Here, we discuss genomic regions prone to mutation, mechanisms contributing to mutation susceptibility, and the processes leading to their accumulation in normal and somatic genomes. With further, more accurate human genome sequencing, additional mutation hotspots, mechanistic details of their formation, and the relevance of hotspots to evolution and disease are likely to be discovered.


Asunto(s)
Genoma Humano/genética , Genómica , Mutación/genética , Replicación del ADN/genética , Variación Estructural del Genoma/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Recombinación Genética/genética
15.
Nat Genet ; 52(9): 891-897, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32807987

RESUMEN

Extrachromosomal DNA (ecDNA) amplification promotes intratumoral genetic heterogeneity and accelerated tumor evolution1-3; however, its frequency and clinical impact are unclear. Using computational analysis of whole-genome sequencing data from 3,212 cancer patients, we show that ecDNA amplification frequently occurs in most cancer types but not in blood or normal tissue. Oncogenes were highly enriched on amplified ecDNA, and the most common recurrent oncogene amplifications arose on ecDNA. EcDNA amplifications resulted in higher levels of oncogene transcription compared to copy number-matched linear DNA, coupled with enhanced chromatin accessibility, and more frequently resulted in transcript fusions. Patients whose cancers carried ecDNA had significantly shorter survival, even when controlled for tissue type, than patients whose cancers were not driven by ecDNA-based oncogene amplification. The results presented here demonstrate that ecDNA-based oncogene amplification is common in cancer, is different from chromosomal amplification and drives poor outcome for patients across many cancer types.


Asunto(s)
Cromosomas/genética , ADN/genética , Amplificación de Genes/genética , Neoplasias/genética , Oncogenes/genética , Línea Celular Tumoral , Cromatina/genética , Humanos
16.
Viruses ; 12(7)2020 07 21.
Artículo en Inglés | MEDLINE | ID: mdl-32708087

RESUMEN

Insertions of endogenous retroviruses cause a significant fraction of mutations in inbred mice but not all strains are equally susceptible. Notably, most new Intracisternal A particle (IAP) ERV mutagenic insertions have occurred in C3H mice. We show here that strain-specific insertional polymorphic IAPs accumulate faster in C3H/HeJ mice, relative to other sequenced strains, and that IAP transcript levels are higher in C3H/HeJ embryonic stem (ES) cells compared to other ES cells. To investigate the mechanism for high IAP activity in C3H mice, we identified 61 IAP copies in C3H/HeJ ES cells enriched with H3K4me3 (a mark of active promoters) and, among those tested, all are unmethylated in C3H/HeJ ES cells. Notably, 13 of the 61 are specific to C3H/HeJ and are members of the non-autonomous 1Δ1 IAP subfamily that is responsible for nearly all new insertions in C3H. One copy is full length with intact open reading frames and hence potentially capable of providing proteins in trans to other 1Δ1 elements. This potential "master copy" is present in other strains, including 129, but its 5' long terminal repeat (LTR) is methylated in 129 ES cells. Thus, the unusual IAP activity in C3H may be due to reduced epigenetic repression coupled with the presence of a master copy.


Asunto(s)
Epigenómica , Genes de Partícula A Intracisternal/genética , Genes de Partícula A Intracisternal/fisiología , Ratones Endogámicos C3H/genética , Animales , Células Cultivadas , Células Madre Embrionarias , Metilación , Ratones , Ratones Endogámicos C57BL/genética , Regiones Promotoras Genéticas , Especificidad de la Especie , Secuencias Repetidas Terminales
17.
Chromosome Res ; 28(1): 31-47, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-31907725

RESUMEN

Structural variant (SV) differences between human genomes can cause germline and mosaic disease as well as inter-individual variation. De-regulation of accurate DNA repair and genomic surveillance mechanisms results in a large number of SVs in cancer. Analysis of the DNA sequences at SV breakpoints can help identify pathways of mutagenesis and regions of the genome that are more susceptible to rearrangement. Large-scale SV analyses have been enabled by high-throughput genome-level sequencing on humans in the past decade. These studies have shed light on the mechanisms and prevalence of complex genomic rearrangements. Recent advancements in both sequencing and other mapping technologies as well as calling algorithms for detection of genomic rearrangements have helped propel SV detection into population-scale studies, and have begun to elucidate previously inaccessible regions of the genome. Here, we discuss the genomic organization of simple and complex SVs, the molecular mechanisms of their formation, and various ways to detect them. We also introduce methods for characterizing SVs and their consequences on human genomes.


Asunto(s)
Genoma Humano , Variación Estructural del Genoma , Genómica/métodos , Aberraciones Cromosómicas , Bandeo Cromosómico , Mapeo Cromosómico , Hibridación Genómica Comparativa , Biología Computacional/métodos , Genómica/normas , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Reproducibilidad de los Resultados
18.
Genome Med ; 11(1): 80, 2019 12 09.
Artículo en Inglés | MEDLINE | ID: mdl-31818324

RESUMEN

BACKGROUND: We investigated the features of the genomic rearrangements in a cohort of 50 male individuals with proteolipid protein 1 (PLP1) copy number gain events who were ascertained with Pelizaeus-Merzbacher disease (PMD; MIM: 312080). We then compared our new data to previous structural variant mutagenesis studies involving the Xq22 region of the human genome. The aggregate data from 159 sequenced join-points (discontinuous sequences in the reference genome that are joined during the rearrangement process) were studied. Analysis of these data from 150 individuals enabled the spectrum and relative distribution of the underlying genomic mutational signatures to be delineated. METHODS: Genomic rearrangements in PMD individuals with PLP1 copy number gain events were investigated by high-density customized array or clinical chromosomal microarray analysis and breakpoint junction sequence analysis. RESULTS: High-density customized array showed that the majority of cases (33/50; ~ 66%) present with single duplications, although complex genomic rearrangements (CGRs) are also frequent (17/50; ~ 34%). Breakpoint mapping to nucleotide resolution revealed further previously unknown structural and sequence complexities, even in single duplications. Meta-analysis of all studied rearrangements that occur at the PLP1 locus showed that single duplications were found in ~ 54% of individuals and that, among all CGR cases, triplication flanked by duplications is the most frequent CGR array CGH pattern observed. Importantly, in ~ 32% of join-points, there is evidence for a mutational signature of microhomeology (highly similar yet imperfect sequence matches). CONCLUSIONS: These data reveal a high frequency of CGRs at the PLP1 locus and support the assertion that replication-based mechanisms are prominent contributors to the formation of CGRs at Xq22. We propose that microhomeology can facilitate template switching, by stabilizing strand annealing of the primer using W-C base complementarity, and is a mutational signature for replicative repair.


Asunto(s)
Variaciones en el Número de Copia de ADN , Reordenamiento Génico , Mutación , Proteína Proteolipídica de la Mielina/genética , Puntos de Rotura del Cromosoma , Hibridación Genómica Comparativa , Duplicación de Gen , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Genoma Humano , Inestabilidad Genómica , Genómica/métodos , Humanos , Polimorfismo de Nucleótido Simple
19.
Cell ; 176(6): 1310-1324.e10, 2019 03 07.
Artículo en Inglés | MEDLINE | ID: mdl-30827684

RESUMEN

DNA rearrangements resulting in human genome structural variants (SVs) are caused by diverse mutational mechanisms. We used long- and short-read sequencing technologies to investigate end products of de novo chromosome 17p11.2 rearrangements and query the molecular mechanisms underlying both recurrent and non-recurrent events. Evidence for an increased rate of clustered single-nucleotide variant (SNV) mutation in cis with non-recurrent rearrangements was found. Indel and SNV formation are associated with both copy-number gains and losses of 17p11.2, occur up to ∼1 Mb away from the breakpoint junctions, and favor C > G transversion substitutions; results suggest that single-stranded DNA is formed during the genesis of the SV and provide compelling support for a microhomology-mediated break-induced replication (MMBIR) mechanism for SV formation. Our data show an additional mutational burden of MMBIR consisting of hypermutation confined to the locus and manifesting as SNVs and indels predominantly within genes.


Asunto(s)
Cromosomas Humanos Par 17 , Mutación , Anomalías Múltiples/genética , Puntos de Rotura del Cromosoma , Trastornos de los Cromosomas/genética , Duplicación Cromosómica/genética , Variaciones en el Número de Copia de ADN , Reparación del ADN/genética , Replicación del ADN , Reordenamiento Génico , Genoma Humano , Variación Estructural del Genoma , Humanos , Mutación INDEL , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Recombinación Genética , Análisis de Secuencia de ADN/métodos , Síndrome de Smith-Magenis/genética
20.
Genome Res ; 28(8): 1228-1242, 2018 08.
Artículo en Inglés | MEDLINE | ID: mdl-29907612

RESUMEN

Alu elements, the short interspersed element numbering more than 1 million copies per human genome, can mediate the formation of copy number variants (CNVs) between substrate pairs. These Alu/Alu-mediated rearrangements (AAMRs) can result in pathogenic variants that cause diseases. To investigate the impact of AAMR on gene variation and human health, we first characterized Alus that are involved in mediating CNVs (CNV-Alus) and observed that these Alus tend to be evolutionarily younger. We then computationally generated, with the assistance of a supercomputer, a test data set consisting of 78 million Alu pairs and predicted ∼18% of them are potentially susceptible to AAMR. We further determined the relative risk of AAMR in 12,074 OMIM genes using the count of predicted CNV-Alu pairs and experimentally validated the predictions with 89 samples selected by correlating predicted hotspots with a database of CNVs identified by clinical chromosomal microarrays (CMAs) on the genomes of approximately 54,000 subjects. We fine-mapped 47 duplications, 40 deletions, and two complex rearrangements and examined a total of 52 breakpoint junctions of simple CNVs. Overall, 94% of the candidate breakpoints were at least partially Alu mediated. We successfully predicted all (100%) of Alu pairs that mediated deletions (n = 21) and achieved an 87% positive predictive value overall when including AAMR-generated deletions and duplications. We provided a tool, AluAluCNVpredictor, for assessing AAMR hotspots and their role in human disease. These results demonstrate the utility of our predictive model and provide insights into the genomic features and molecular mechanisms underlying AAMR.


Asunto(s)
Elementos Alu/genética , Variaciones en el Número de Copia de ADN/genética , Inestabilidad Genómica/genética , Duplicación de Gen/genética , Genoma Humano/genética , Humanos , Eliminación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA