Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Nature ; 621(7978): 344-354, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37612512

RESUMEN

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.


Asunto(s)
Cromosomas Humanos Y , Genómica , Análisis de Secuencia de ADN , Humanos , Secuencia de Bases , Cromosomas Humanos Y/genética , ADN Satélite/genética , Variación Genética/genética , Genética de Población , Genómica/métodos , Genómica/normas , Heterocromatina/genética , Familia de Multigenes/genética , Estándares de Referencia , Duplicaciones Segmentarias en el Genoma/genética , Análisis de Secuencia de ADN/normas , Secuencias Repetidas en Tándem/genética , Telómero/genética
2.
Genome Res ; 32(2): 242-257, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35042723

RESUMEN

Single-cell RNA sequencing (scRNA-seq) enables molecular characterization of complex biological tissues at high resolution. The requirement of single-cell extraction, however, makes it challenging for profiling tissues such as adipose tissue, for which collection of intact single adipocytes is complicated by their fragile nature. For such tissues, single-nucleus extraction is often much more efficient and therefore single-nucleus RNA sequencing (snRNA-seq) presents an alternative to scRNA-seq. However, nuclear transcripts represent only a fraction of the transcriptome in a single cell, with snRNA-seq marked with inherent transcript enrichment and detection biases. Therefore, snRNA-seq may be inadequate for mapping important transcriptional signatures in adipose tissue. In this study, we compare the transcriptomic landscape of single nuclei isolated from preadipocytes and mature adipocytes across human white and brown adipocyte lineages, with whole-cell transcriptome. We show that snRNA-seq is capable of identifying the broad cell types present in scRNA-seq at all states of adipogenesis. However, we also explore how and why the nuclear transcriptome is biased and limited, as well as how it can be advantageous. We robustly characterize the enrichment of nuclear-localized transcripts and adipogenic regulatory lncRNAs in snRNA-seq, while also providing a detailed understanding for the preferential detection of long genes upon using this technique. To remove such technical detection biases, we propose a normalization strategy for a more accurate comparison of nuclear and cellular data. Finally, we show successful integration of scRNA-seq and snRNA-seq data sets with existing bioinformatic tools. Overall, our results illustrate the applicability of snRNA-seq for the characterization of cellular diversity in the adipose tissue.


Asunto(s)
Adipocitos/citología , Linaje de la Célula , Perfilación de la Expresión Génica , RNA-Seq , Análisis de la Célula Individual , Sesgo , Perfilación de la Expresión Génica/métodos , Humanos , RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Transcriptoma
3.
Nat Methods ; 19(6): 711-723, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35396487

RESUMEN

Studies of genome regulation routinely use high-throughput DNA sequencing approaches to determine where specific proteins interact with DNA, and they rely on DNA amplification and short-read sequencing, limiting their quantitative application in complex genomic regions. To address these limitations, we developed directed methylation with long-read sequencing (DiMeLo-seq), which uses antibody-tethered enzymes to methylate DNA near a target protein's binding sites in situ. These exogenous methylation marks are then detected simultaneously with endogenous CpG methylation on unamplified DNA using long-read, single-molecule sequencing technologies. We optimized and benchmarked DiMeLo-seq by mapping chromatin-binding proteins and histone modifications across the human genome. Furthermore, we identified where centromere protein A localizes within highly repetitive regions that were unmappable with short sequencing reads, and we estimated the density of centromere protein A molecules along single chromatin fibers. DiMeLo-seq is a versatile method that provides multimodal, genome-wide information for investigating protein-DNA interactions.


Asunto(s)
Metilación de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Proteína A Centromérica/genética , Cromatina/genética , ADN/química , ADN/genética , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Análisis de Secuencia de ADN/métodos
4.
Semin Cell Dev Biol ; 128: 2-14, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35487859

RESUMEN

The classical human satellite DNAs, also referred to as human satellites 1, 2 and 3 (HSat1, HSat2, HSat3, or collectively HSat1-3), occur on most human chromosomes as large, pericentromeric tandem repeat arrays, which together constitute roughly 3% of the human genome (100 megabases, on average). Even though HSat1-3 were among the first human DNA sequences to be isolated and characterized at the dawn of molecular biology, they have remained almost entirely missing from the human genome reference assembly for 20 years, hindering studies of their sequence, regulation, and potential structural roles in the nucleus. Recently, the Telomere-to-Telomere Consortium produced the first truly complete assembly of a human genome, paving the way for new studies of HSat1-3 with modern genomic tools. This review provides an account of the history and current understanding of HSat1-3, with a view towards future studies of their evolution and roles in health and disease.


Asunto(s)
ADN Satélite , Genómica , ADN Satélite/genética , Genoma Humano/genética , Humanos
5.
Nature ; 530(7589): 171-176, 2016 Feb 11.
Artículo en Inglés | MEDLINE | ID: mdl-26840484

RESUMEN

The DNA-binding protein PRDM9 directs positioning of the double-strand breaks (DSBs) that initiate meiotic recombination in mice and humans. Prdm9 is the only mammalian speciation gene yet identified and is responsible for sterility phenotypes in male hybrids of certain mouse subspecies. To investigate PRDM9 binding and its role in fertility and meiotic recombination, we humanized the DNA-binding domain of PRDM9 in C57BL/6 mice. This change repositions DSB hotspots and completely restores fertility in male hybrids. Here we show that alteration of one Prdm9 allele impacts the behaviour of DSBs controlled by the other allele at chromosome-wide scales. These effects correlate strongly with the degree to which each PRDM9 variant binds both homologues at the DSB sites it controls. Furthermore, higher genome-wide levels of such 'symmetric' PRDM9 binding associate with increasing fertility measures, and comparisons of individual hotspots suggest binding symmetry plays a downstream role in the recombination process. These findings reveal that subspecies-specific degradation of PRDM9 binding sites by meiotic drive, which steadily increases asymmetric PRDM9 binding, has impacts beyond simply changing hotspot positions, and strongly support a direct involvement in hybrid infertility. Because such meiotic drive occurs across mammals, PRDM9 may play a wider, yet transient, role in the early stages of speciation.


Asunto(s)
Especiación Genética , N-Metiltransferasa de Histona-Lisina/química , N-Metiltransferasa de Histona-Lisina/metabolismo , Hibridación Genética/genética , Infertilidad/genética , Ingeniería de Proteínas , Dedos de Zinc/genética , Alelos , Animales , Sitios de Unión , Emparejamiento Cromosómico/genética , Cromosomas de los Mamíferos/genética , Cromosomas de los Mamíferos/metabolismo , Roturas del ADN de Doble Cadena , Femenino , N-Metiltransferasa de Histona-Lisina/genética , Humanos , Masculino , Meiosis/genética , Ratones , Ratones Endogámicos C57BL , Unión Proteica , Estructura Terciaria de Proteína/genética , Recombinación Genética/genética
6.
Genome Res ; 24(4): 697-707, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24501022

RESUMEN

The human genome sequence remains incomplete, with multimegabase-sized gaps representing the endogenous centromeres and other heterochromatic regions. Available sequence-based studies within these sites in the genome have demonstrated a role in centromere function and chromosome pairing, necessary to ensure proper chromosome segregation during cell division. A common genomic feature of these regions is the enrichment of long arrays of near-identical tandem repeats, known as satellite DNAs, which offer a limited number of variant sites to differentiate individual repeat copies across millions of bases. This substantial sequence homogeneity challenges available assembly strategies and, as a result, centromeric regions are omitted from ongoing genomic studies. To address this problem, we utilize monomer sequence and ordering information obtained from whole-genome shotgun reads to model two haploid human satellite arrays on chromosomes X and Y, resulting in an initial characterization of 3.83 Mb of centromeric DNA within an individual genome. To further expand the utility of each centromeric reference sequence model, we evaluate sites within the arrays for short-read mappability and chromosome specificity. Because satellite DNAs evolve in a concerted manner, we use these centromeric assemblies to assess the extent of sequence variation among 366 individuals from distinct human populations. We thus identify two satellite array variants in both X and Y centromeres, as determined by array length and sequence composition. This study provides an initial sequence characterization of a regional centromere and establishes a foundation to extend genomic characterization to these sites as well as to other repeat-rich regions within complex genomes.


Asunto(s)
Centrómero/genética , ADN Satélite/genética , Análisis de Secuencia de ADN , Secuencias Repetidas en Tándem/genética , Cromosomas Humanos X/genética , Cromosomas Humanos Y/genética , Genoma Humano , Humanos , Datos de Secuencia Molecular
7.
PLoS Genet ; 10(7): e1004503, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25033397

RESUMEN

The pseudoautosomal region (PAR) is a short region of homology between the mammalian X and Y chromosomes, which has undergone rapid evolution. A crossover in the PAR is essential for the proper disjunction of X and Y chromosomes in male meiosis, and PAR deletion results in male sterility. This leads the human PAR with the obligatory crossover, PAR1, to having an exceptionally high male crossover rate, which is 17-fold higher than the genome-wide average. However, the mechanism by which this obligatory crossover occurs remains unknown, as does the fine-scale positioning of crossovers across this region. Recent research in mice has suggested that crossovers in PAR may be mediated independently of the protein PRDM9, which localises virtually all crossovers in the autosomes. To investigate recombination in this region, we construct the most fine-scale genetic map containing directly observed crossovers to date using African-American pedigrees. We leverage recombination rates inferred from the breakdown of linkage disequilibrium in human populations and investigate the signatures of DNA evolution due to recombination. Further, we identify direct PRDM9 binding sites using ChIP-seq in human cells. Using these independent lines of evidence, we show that, in contrast with mouse, PRDM9 does localise peaks of recombination in the human PAR1. We find that recombination is a far more rapid and intense driver of sequence evolution in PAR1 than it is on the autosomes. We also show that PAR1 hotspot activities differ significantly among human populations. Finally, we find evidence that PAR1 hotspot positions have changed between human and chimpanzee, with no evidence of sharing among the hottest hotspots. We anticipate that the genetic maps built and validated in this work will aid research on this vital and fascinating region of the genome.


Asunto(s)
Intercambio Genético , N-Metiltransferasa de Histona-Lisina/genética , Infertilidad Masculina/genética , Recombinación Genética , Cromosomas Humanos X/genética , Cromosomas Humanos Y/genética , Femenino , Genética de Población , Proyecto Mapa de Haplotipos , Humanos , Desequilibrio de Ligamiento , Masculino , Meiosis/genética
8.
PLoS Comput Biol ; 10(5): e1003628, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-24831296

RESUMEN

The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb) and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations.


Asunto(s)
Mapeo Cromosómico/métodos , Cromosomas Humanos Y/genética , ADN Satélite/genética , Genoma Humano/genética , Heterocromatina/genética , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Humanos , Datos de Secuencia Molecular
9.
Science ; 376(6588): eabj5089, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35357915

RESUMEN

The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions of the genome, including repetitive and homologous regions. Here, we present a high-resolution epigenetic study of previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, and short-read datasets (166,058 previously unresolved chromatin immunoprecipitation sequencing peaks) to provide evidence of activity across previously unidentified or corrected genes and reveals clinically relevant paralog-specific regulation. Probing CpG methylation across human centromeres from six diverse individuals generated an estimate of variability in kinetochore localization. This analysis provides a framework with which to investigate the most elusive regions of the human genome, granting insights into epigenetic regulation.


Asunto(s)
Islas de CpG , Metilación de ADN , Epigénesis Genética , Genoma Humano , Centrómero/genética , Centrómero/metabolismo , Enfermedad/genética , Sitios Genéticos , Genómica/normas , Humanos , Estándares de Referencia , Análisis de Secuencia de ADN
10.
Science ; 376(6588): eabk3112, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35357925

RESUMEN

Mobile elements and repetitive genomic regions are sources of lineage-specific genomic innovation and uniquely fingerprint individual genomes. Comprehensive analyses of such repeat elements, including those found in more complex regions of the genome, require a complete, linear genome assembly. We present a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome. We identified previously unknown satellite arrays, expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeats, and located retroelement transduction events. We detected nascent transcription and delineated CpG methylation profiles to define the structure of transcriptionally active retroelements in humans, including those in centromeres. These data expand our insight into the diversity, distribution, and evolution of repetitive regions that have shaped the human genome.


Asunto(s)
Epigénesis Genética , Genoma Humano , Secuencias Repetitivas de Ácidos Nucleicos , Telómero/genética , Transcripción Genética , Humanos
11.
Science ; 376(6588): eabl4178, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35357911

RESUMEN

Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.


Asunto(s)
Centrómero/genética , Mapeo Cromosómico , Epigénesis Genética , Genoma Humano , Evolución Molecular , Genómica , Humanos , Secuencias Repetitivas de Ácidos Nucleicos
12.
Science ; 376(6588): 44-53, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35357919

RESUMEN

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.


Asunto(s)
Genoma Humano , Proyecto Genoma Humano , Análisis de Secuencia de ADN/normas , Línea Celular , Cromosomas Artificiales Bacterianos/genética , Cromosomas Humanos/genética , Humanos , Valores de Referencia
13.
Cell Syst ; 11(4): 354-366.e9, 2020 10 21.
Artículo en Inglés | MEDLINE | ID: mdl-33099405

RESUMEN

DNA adenine methyltransferase identification (DamID) measures a protein's DNA-binding history by methylating adenine bases near each protein-DNA interaction site and then selectively amplifying and sequencing these methylated regions. Additionally, these interactions can be visualized using m6A-Tracer, a fluorescent protein that binds to methyladenines. Here, we combine these imaging and sequencing technologies in an integrated microfluidic platform (µDamID) that enables single-cell isolation, imaging, and sorting, followed by DamID. We use µDamID and an improved m6A-Tracer protein to generate paired imaging and sequencing data from individual human cells. We validate interactions between Lamin-B1 protein and lamina-associated domains (LADs), observe variable 3D chromatin organization and broad gene regulation patterns, and jointly measure single-cell heterogeneity in Dam expression and background methylation. µDamID provides the unique ability to compare paired imaging and sequencing data for each cell and between cells, enabling the joint analysis of the nuclear localization, sequence identity, and variability of protein-DNA interactions. A record of this paper's transparent peer review process is included in the Supplemental Information.


Asunto(s)
Microfluídica/métodos , Análisis de Secuencia de ADN/métodos , Análisis de la Célula Individual/métodos , Adenina/metabolismo , Núcleo Celular/metabolismo , Cromatina/metabolismo , ADN/metabolismo , Metilación de ADN/genética , Proteínas de Unión al ADN/genética , Genómica/métodos , Células HEK293 , Humanos , Lamina Tipo B/metabolismo , Receptores Purinérgicos/metabolismo
14.
Sci Rep ; 10(1): 16902, 2020 10 09.
Artículo en Inglés | MEDLINE | ID: mdl-33037294

RESUMEN

Epidemiological studies have suggested differences in the rate of multiple sclerosis (MS) in individuals of European ancestry compared to African ancestry, motivating genetic scans to identify variants that could contribute to such patterns. In a whole-genome scan in 899 African-American cases and 1155 African-American controls, we confirm that African-Americans who inherit segments of the genome of European ancestry at a chromosome 1 locus are at increased risk for MS [logarithm of odds (LOD) = 9.8], although the signal weakens when adding an additional 406 cases, reflecting heterogeneity in the two sets of cases [logarithm of odds (LOD) = 2.7]. The association in the 899 individuals can be fully explained by two variants previously associated with MS in European ancestry individuals. These variants tag a MS susceptibility haplotype associated with decreased CD58 gene expression (odds ratio of 1.37; frequency of 84% in Europeans and 22% in West Africans for the tagging variant) as well as another haplotype near the FCRL3 gene (odds ratio of 1.07; frequency of 49% in Europeans and 8% in West Africans). Controlling for all other genetic and environmental factors, the two variants predict a 1.44-fold higher rate of MS in European-Americans compared to African-Americans.


Asunto(s)
Negro o Afroamericano/genética , Predisposición Genética a la Enfermedad/genética , Esclerosis Múltiple/genética , Polimorfismo de Nucleótido Simple/genética , Población Blanca/genética , Femenino , Estudio de Asociación del Genoma Completo/métodos , Haplotipos/genética , Humanos , Masculino , Oportunidad Relativa
15.
Nat Commun ; 10(1): 3900, 2019 08 29.
Artículo en Inglés | MEDLINE | ID: mdl-31467277

RESUMEN

During meiotic recombination, homologue-templated repair of programmed DNA double-strand breaks (DSBs) produces relatively few crossovers and many difficult-to-detect non-crossovers. By intercrossing two diverged mouse subspecies over five generations and deep-sequencing 119 offspring, we detect thousands of crossover and non-crossover events genome-wide with unprecedented power and spatial resolution. We find that both crossovers and non-crossovers are strongly depleted at DSB hotspots where the DSB-positioning protein PRDM9 fails to bind to the unbroken homologous chromosome, revealing that PRDM9 also functions to promote homologue-templated repair. Our results show that complex non-crossovers are much rarer in mice than humans, consistent with complex events arising from accumulated non-programmed DNA damage. Unexpectedly, we also find that GC-biased gene conversion is restricted to non-crossover tracts containing only one mismatch. These results demonstrate that local genetic diversity profoundly alters meiotic repair pathway decisions via at least two distinct mechanisms, impacting genome evolution and Prdm9-related hybrid infertility.


Asunto(s)
Roturas del ADN de Doble Cadena , Variación Genética , Recombinación Homóloga , Alelos , Animales , Proteínas de Ciclo Celular/genética , Cromosomas , Intercambio Genético , Daño del ADN , Reparación de la Incompatibilidad de ADN , Femenino , Conversión Génica , N-Metiltransferasa de Histona-Lisina/genética , Histonas/genética , Humanos , Hibridación Genética , Masculino , Ratones , Ratones Endogámicos C57BL , Modelos Genéticos , Proteínas de Unión a Fosfato/genética , Polimorfismo de Nucleótido Simple , Reparación del ADN por Recombinación
16.
Elife ; 62017 10 26.
Artículo en Inglés | MEDLINE | ID: mdl-29072575

RESUMEN

PRDM9 binding localizes almost all meiotic recombination sites in humans and mice. However, most PRDM9-bound loci do not become recombination hotspots. To explore factors that affect binding and subsequent recombination outcomes, we mapped human PRDM9 binding sites in a transfected human cell line and measured PRDM9-induced histone modifications. These data reveal varied DNA-binding modalities of PRDM9. We also find that human PRDM9 frequently binds promoters, despite their low recombination rates, and it can activate expression of a small number of genes including CTCFL and VCX. Furthermore, we identify specific sequence motifs that predict consistent, localized meiotic recombination suppression around a subset of PRDM9 binding sites. These motifs strongly associate with KRAB-ZNF protein binding, TRIM28 recruitment, and specific histone modifications. Finally, we demonstrate that, in addition to binding DNA, PRDM9's zinc fingers also mediate its multimerization, and we show that a pair of highly diverged alleles preferentially form homo-multimers.


Asunto(s)
ADN/metabolismo , N-Metiltransferasa de Histona-Lisina/metabolismo , Recombinación Homóloga , Meiosis , Sitios de Unión , Mapeo Cromosómico , Células HEK293 , Humanos , Unión Proteica , Multimerización de Proteína
17.
Elife ; 42015 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-25806687

RESUMEN

Although the past decade has seen tremendous progress in our understanding of fine-scale recombination, little is known about non-crossover (NCO) gene conversion. We report the first genome-wide study of NCO events in humans. Using SNP array data from 98 meioses, we identified 103 sites affected by NCO, of which 50/52 were confirmed in sequence data. Overlap with double strand break (DSB) hotspots indicates that most of the events are likely of meiotic origin. We estimate that a site is involved in a NCO at a rate of 5.9 × 10(-6)/bp/generation, consistent with sperm-typing studies, and infer that tract lengths span at least an order of magnitude. Observed NCO events show strong allelic bias at heterozygous AT/GC SNPs, with 68% (58-78%) transmitting GC alleles (p = 5 × 10(-4)). Strikingly, in 4 of 15 regions with resequencing data, multiple disjoint NCO tracts cluster in close proximity (∼20-30 kb), a phenomenon not previously seen in mammals.


Asunto(s)
Composición de Base/genética , Intercambio Genético , Conversión Génica , Alelos , Secuencia de Bases , Análisis por Conglomerados , Femenino , Humanos , Masculino , Linaje , Polimorfismo de Nucleótido Simple/genética
18.
Nat Genet ; 45(4): 406-14, 414e1-2, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23435088

RESUMEN

Tens of millions of base pairs of euchromatic human genome sequence, including many protein-coding genes, have no known location in the human genome. We describe an approach for localizing the human genome's missing pieces using the patterns of genome sequence variation created by population admixture. We mapped the locations of 70 scaffolds spanning 4 million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified 8 new large interchromosomal segmental duplications. We find that most of these sequences are hidden in the genome's heterochromatin, particularly its pericentromeric regions. Many cryptic, pericentromeric genes are expressed at the RNA level and have been maintained intact for millions of years while their expression patterns diverged from those of paralogous genes elsewhere in the genome. We describe how knowledge of the locations of these sequences can inform disease association and genome biology studies.


Asunto(s)
Mapeo Cromosómico , Eucromatina/genética , Evolución Molecular , Variación Genética/genética , Genética de Población , Genoma Humano , Heterocromatina/genética , Biología Computacional , Duplicación de Gen , Humanos , Hibridación Fluorescente in Situ
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA