RESUMEN
Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations. Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups. This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.
Asunto(s)
Etnicidad/genética , Salud Poblacional , Bases de Datos Genéticas , Registros Electrónicos de Salud , Genómica , Humanos , AutoinformeRESUMEN
Mosaic loss of chromosome Y (LOY) in circulating white blood cells is the most common form of clonal mosaicism1-5, yet our knowledge of the causes and consequences of this is limited. Here, using a computational approach, we estimate that 20% of the male population represented in the UK Biobank study (n = 205,011) has detectable LOY. We identify 156 autosomal genetic determinants of LOY, which we replicate in 757,114 men of European and Japanese ancestry. These loci highlight genes that are involved in cell-cycle regulation and cancer susceptibility, as well as somatic drivers of tumour growth and targets of cancer therapy. We demonstrate that genetic susceptibility to LOY is associated with non-haematological effects on health in both men and women, which supports the hypothesis that clonal haematopoiesis is a biomarker of genomic instability in other tissues. Single-cell RNA sequencing identifies dysregulated expression of autosomal genes in leukocytes with LOY and provides insights into why clonal expansion of these cells may occur. Collectively, these data highlight the value of studying clonal mosaicism to uncover fundamental mechanisms that underlie cancer and other ageing-related diseases.
Asunto(s)
Deleción Cromosómica , Cromosomas Humanos Y/genética , Predisposición Genética a la Enfermedad/genética , Inestabilidad Genómica/genética , Leucocitos/patología , Mosaicismo , Adulto , Anciano , Biología Computacional , Bases de Datos Genéticas , Femenino , Marcadores Genéticos/genética , Humanos , Masculino , Persona de Mediana Edad , Neoplasias/genética , Reino UnidoRESUMEN
Pedigree inference from genotype data is a challenging problem, particularly when pedigrees are sparsely sampled and individuals may be distantly related to their closest genotyped relatives. We present a method that infers small pedigrees of close relatives and then assembles them into larger pedigrees. To assemble large pedigrees, we introduce several formulas and tools including a likelihood for the degree separating two small pedigrees, a generalization of the fast DRUID point estimate of the degree separating two pedigrees, a method for detecting individuals who share background identity-by-descent (IBD) that does not reflect recent common ancestry, and a method for identifying the ancestral branches through which distant relatives are connected. Our method also takes several approaches that help to improve the accuracy and efficiency of pedigree inference. In particular, we incorporate age information directly into the likelihood rather than using ages only for consistency checks and we employ a heuristic branch-and-bound-like approach to more efficiently explore the space of possible pedigrees. Together, these approaches make it possible to construct large pedigrees that are challenging or intractable for current inference methods.
Asunto(s)
Genotipo , Linaje , Algoritmos , Femenino , Humanos , Funciones de Verosimilitud , Masculino , Modelos GenéticosRESUMEN
It is important to study the genetics of complex traits in diverse populations. Here, we introduce covariate-adjusted linkage disequilibrium (LD) score regression (cov-LDSC), a method to estimate SNP-heritability (${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}})$ and its enrichment in homogenous and admixed populations with summary statistics and in-sample LD estimates. In-sample LD can be estimated from a subset of the genome-wide association studies samples, allowing our method to be applied efficiently to very large cohorts. In simulations, we show that unadjusted LDSC underestimates ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ by 10-60% in admixed populations; in contrast, cov-LDSC is robustly accurate. We apply cov-LDSC to genotyping data from 8124 individuals, mostly of admixed ancestry, from the Slim Initiative in Genomic Medicine for the Americas study, and to approximately 161 000 Latino-ancestry individuals, 47 000 African American-ancestry individuals and 135 000 European-ancestry individuals, as classified by 23andMe. We estimate ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ and detect heritability enrichment in three quantitative and five dichotomous phenotypes, making this, to our knowledge, the most comprehensive heritability-based analysis of admixed individuals to date. Most traits have high concordance of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$ and consistent tissue-specific heritability enrichment among different populations. However, for age at menarche, we observe population-specific heritability estimates of ${\boldsymbol{h}}_{\boldsymbol{g}}^{\mathbf{2}}$. We observe consistent patterns of tissue-specific heritability enrichment across populations; for example, in the limbic system for BMI, the per-standardized-annotation effect size $ \tau $* is 0.16 ± 0.04, 0.28 ± 0.11 and 0.18 ± 0.03 in the Latino-, African American- and European-ancestry populations, respectively. Our approach is a powerful way to analyze genetic data for complex traits from admixed populations.
Asunto(s)
Genética de Población , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Desequilibrio de Ligamiento/genética , Herencia Multifactorial/genética , Técnicas de Genotipaje/estadística & datos numéricos , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Carácter Cuantitativo HeredableRESUMEN
Hypersensitivity reactions to drugs are often unpredictable and can be life threatening, underscoring a need for understanding their underlying mechanisms and risk factors. The extent to which germline genetic variation influences the risk of commonly reported drug allergies such as penicillin allergy remains largely unknown. We extracted data from the electronic health records of more than 600,000 participants from the UK, Estonian, and Vanderbilt University Medical Center's BioVU biobanks to study the role of genetic variation in the occurrence of self-reported penicillin hypersensitivity reactions. We used imputed SNP to HLA typing data from these cohorts to further fine map the human leukocyte antigen (HLA) association and replicated our results in 23andMe's research cohort involving a total of 1.12 million individuals. Genome-wide meta-analysis of penicillin allergy revealed two loci, including one located in the HLA region on chromosome 6. This signal was further fine-mapped to the HLA-B∗55:01 allele (OR 1.41 95% CI 1.33-1.49, p value 2.04 × 10-31) and confirmed by independent replication in 23andMe's research cohort (OR 1.30 95% CI 1.25-1.34, p value 1.00 × 10-47). The lead SNP was also associated with lower lymphocyte counts and in silico follow-up suggests a potential effect on T-lymphocytes at HLA-B∗55:01. We also observed a significant hit in PTPN22 and the GWAS results correlated with the genetics of rheumatoid arthritis and psoriasis. We present robust evidence for the role of an allele of the major histocompatibility complex (MHC) I gene HLA-B in the occurrence of penicillin allergy.
Asunto(s)
Artritis Reumatoide/genética , Hipersensibilidad a las Drogas/genética , Antígenos HLA-B/genética , Polimorfismo de Nucleótido Simple , Proteína Tirosina Fosfatasa no Receptora Tipo 22/genética , Psoriasis/genética , Adulto , Alelos , Artritis Reumatoide/complicaciones , Artritis Reumatoide/inmunología , Cromosomas Humanos Par 6/química , Hipersensibilidad a las Drogas/complicaciones , Hipersensibilidad a las Drogas/etiología , Hipersensibilidad a las Drogas/inmunología , Registros Electrónicos de Salud , Europa (Continente) , Femenino , Expresión Génica , Sitios Genéticos , Predisposición Genética a la Enfermedad , Genoma Humano , Estudio de Asociación del Genoma Completo , Antígenos HLA-B/inmunología , Prueba de Histocompatibilidad , Humanos , Masculino , Penicilinas/efectos adversos , Proteína Tirosina Fosfatasa no Receptora Tipo 22/inmunología , Psoriasis/complicaciones , Psoriasis/inmunología , Autoinforme , Linfocitos T/inmunología , Linfocitos T/patología , Estados UnidosRESUMEN
Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows-Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors, we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally, we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale data sets with millions of samples. Furthermore, we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis, exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for noncommercial use in the code repository (https://github.com/23andMe/phasedibd, last accessed January 11, 2021).
Asunto(s)
Genoma Humano , Haplotipos , Programas Informáticos , Algoritmos , Reacciones Falso Negativas , Reacciones Falso Positivas , Humanos , México , FilogeografíaRESUMEN
Current approaches to detect and characterize mosaic chromosomal aneuploidy are limited by sensitivity, efficiency, cost, or the need to culture cells. We describe the mosaic aneuploidy detection by massively parallel sequencing (MAD-seq) capture assay and the MADSEQ analytical approach that allow low (<10%) levels of mosaicism for chromosomal aneuploidy or regional loss of heterozygosity to be detected, assigned to a meiotic or mitotic origin, and quantified as a proportion of the cells in the sample. We show results from a multi-ethnic MAD-seq (meMAD-seq) capture design that works equally well in populations of diverse racial and ethnic origins and how the MADSEQ analytical approach can be applied to exome or whole-genome sequencing data, revealing previously unrecognized aneuploidy or copy number neutral loss of heterozygosity in samples studied by the 1000 Genomes Project, cell lines from public repositories, and one of the Illumina Platinum Genomes samples. We have made the meMAD-seq capture design and MADSEQ analytical software open for unrestricted use, with the goal that they can be applied in clinical samples to allow new insights into the unrecognized prevalence of mosaic chromosomal aneuploidy in humans and its phenotypic associations.
Asunto(s)
Cromosomas/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Aneuploidia , Exoma/genética , Femenino , Genoma/genética , Humanos , Masculino , Mosaicismo , Programas InformáticosRESUMEN
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Asunto(s)
Variación Genética/genética , Genética de Población/normas , Genoma Humano/genética , Genómica/normas , Internacionalidad , Conjuntos de Datos como Asunto , Demografía , Susceptibilidad a Enfermedades , Exoma/genética , Genética Médica , Estudio de Asociación del Genoma Completo , Genotipo , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL/genética , Mapeo Físico de Cromosoma , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Enfermedades Raras/genética , Estándares de Referencia , Análisis de Secuencia de ADNRESUMEN
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.
Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Mapeo Físico de Cromosoma , Secuencia de Aminoácidos , Predisposición Genética a la Enfermedad , Genética Médica , Genética de Población , Estudio de Asociación del Genoma Completo , Genómica , Genotipo , Haplotipos/genética , Homocigoto , Humanos , Datos de Secuencia Molecular , Tasa de Mutación , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Análisis de Secuencia de ADN , Eliminación de Secuencia/genéticaRESUMEN
The ratio of the length of the index finger to that of the ring finger (2D:4D) is sexually dimorphic and is commonly used as a non-invasive biomarker of prenatal androgen exposure. Most association studies of 2D:4D ratio with a diverse range of sex-specific traits have typically involved small sample sizes and have been difficult to replicate, raising questions around the utility and precise meaning of the measure. In the largest genome-wide association meta-analysis of 2D:4D ratio to date (N = 15 661, with replication N = 75 821), we identified 11 loci (9 novel) explaining 3.8% of the variance in mean 2D:4D ratio. We also found weak evidence for association (ß = 0.06; P = 0.02) between 2D:4D ratio and sensitivity to testosterone [length of the CAG microsatellite repeat in the androgen receptor (AR) gene] in females only. Furthermore, genetic variants associated with (adult) testosterone levels and/or sex hormone-binding globulin were not associated with 2D:4D ratio in our sample. Although we were unable to find strong evidence from our genetic study to support the hypothesis that 2D:4D ratio is a direct biomarker of prenatal exposure to androgens in healthy individuals, our findings do not explicitly exclude this possibility, and pathways involving testosterone may become apparent as the size of the discovery sample increases further. Our findings provide new insight into the underlying biology shaping 2D:4D variation in the general population.
Asunto(s)
Dedos/anatomía & histología , Estudio de Asociación del Genoma Completo , Testosterona/metabolismo , Adulto , Andrógenos/metabolismo , Biomarcadores , Femenino , Dedos/crecimiento & desarrollo , Variación Genética , Humanos , Masculino , Embarazo , Estudios Retrospectivos , Caracteres Sexuales , Testosterona/genéticaRESUMEN
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
Asunto(s)
Variación Genética/genética , Genética de Población , Genoma Humano/genética , Genómica , Alelos , Sitios de Unión/genética , Secuencia Conservada/genética , Evolución Molecular , Genética Médica , Estudio de Asociación del Genoma Completo , Haplotipos/genética , Humanos , Motivos de Nucleótidos , Polimorfismo de Nucleótido Simple/genética , Grupos Raciales/genética , Eliminación de Secuencia/genética , Factores de Transcripción/metabolismoRESUMEN
DNA mutational events are increasingly being identified in autism spectrum disorder (ASD), but the potential additional role of dysregulation of the epigenome in the pathogenesis of the condition remains unclear. The epigenome is of interest as a possible mediator of environmental effects during development, encoding a cellular memory reflected by altered function of progeny cells. Advanced maternal age (AMA) is associated with an increased risk of having a child with ASD for reasons that are not understood. To explore whether AMA involves covert aneuploidy or epigenetic dysregulation leading to ASD in the offspring, we tested a homogeneous ectodermal cell type from 47 individuals with ASD compared with 48 typically developing (TD) controls born to mothers of ≥35 years, using a quantitative genome-wide DNA methylation assay. We show that DNA methylation patterns are dysregulated in ectodermal cells in these individuals, having accounted for confounding effects due to subject age, sex and ancestral haplotype. We did not find mosaic aneuploidy or copy number variability to occur at differentially-methylated regions in these subjects. Of note, the loci with distinctive DNA methylation were found at genes expressed in the brain and encoding protein products significantly enriched for interactions with those produced by known ASD-causing genes, representing a perturbation by epigenomic dysregulation of the same networks compromised by DNA mutational mechanisms. The results indicate the presence of a mosaic subpopulation of epigenetically-dysregulated, ectodermally-derived cells in subjects with ASD. The epigenetic dysregulation observed in these ASD subjects born to older mothers may be associated with aging parental gametes, environmental influences during embryogenesis or could be the consequence of mutations of the chromatin regulatory genes increasingly implicated in ASD. The results indicate that epigenetic dysregulatory mechanisms may complement and interact with DNA mutations in the pathogenesis of the disorder.
Asunto(s)
Factores de Edad , Trastornos Generalizados del Desarrollo Infantil/genética , Metilación de ADN/genética , Epigénesis Genética , Mosaicismo , Adulto , Trastornos Generalizados del Desarrollo Infantil/patología , Aberraciones Cromosómicas , Femenino , Perfilación de la Expresión Génica , Genoma Humano , Haplotipos , Humanos , Masculino , Relaciones Materno-Fetales , Persona de Mediana Edad , EmbarazoRESUMEN
In humans, most meiotic crossover events are clustered into short regions of the genome known as recombination hot spots. We have previously identified DNA motifs that are enriched in hot spots, particularly the 7-mer CCTCCCT. Here we use the increased hot-spot resolution afforded by the Phase 2 HapMap and novel search methods to identify an extended family of motifs based around the degenerate 13-mer CCNCCNTNNCCNC, which is critical in recruiting crossover events to at least 40% of all human hot spots and which operates on diverse genetic backgrounds in both sexes. Furthermore, these motifs are found in hypervariable minisatellites and are clustered in the breakpoint regions of both disease-causing nonallelic homologous recombination hot spots and common mitochondrial deletion hot spots, implicating the motif as a driver of genome instability.
Asunto(s)
Secuencia de Bases , Intercambio Genético , Inestabilidad Genómica , Recombinación Genética , Humanos , Datos de Secuencia Molecular , Mutación , Secuencias Repetitivas de Ácidos NucleicosRESUMEN
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
Asunto(s)
Variación Genética/genética , Genética de Población/métodos , Genoma Humano/genética , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Calibración , Cromosomas Humanos Y/genética , Biología Computacional , Análisis Mutacional de ADN , ADN Mitocondrial/genética , Evolución Molecular , Femenino , Estudios de Asociación Genética , Estudio de Asociación del Genoma Completo , Genotipo , Haplotipos/genética , Humanos , Masculino , Mutación/genética , Proyectos Piloto , Polimorfismo de Nucleótido Simple/genética , Recombinación Genética/genética , Tamaño de la Muestra , Selección Genética/genética , Alineación de SecuenciaRESUMEN
Advances in genome technology have facilitated a new understanding of the historical and genetic processes crucial to rapid phenotypic evolution under domestication. To understand the process of dog diversification better, we conducted an extensive genome-wide survey of more than 48,000 single nucleotide polymorphisms in dogs and their wild progenitor, the grey wolf. Here we show that dog breeds share a higher proportion of multi-locus haplotypes unique to grey wolves from the Middle East, indicating that they are a dominant source of genetic diversity for dogs rather than wolves from east Asia, as suggested by mitochondrial DNA sequence data. Furthermore, we find a surprising correspondence between genetic and phenotypic/functional breed groupings but there are exceptions that suggest phenotypic diversification depended in part on the repeated crossing of individuals with novel phenotypes. Our results show that Middle Eastern wolves were a critical source of genome diversity, although interbreeding with local wolf populations clearly occurred elsewhere in the early history of specific lineages. More recently, the evolution of modern dog breeds seems to have been an iterative process that drew on a limited genetic toolkit to create remarkable phenotypic diversity.
Asunto(s)
Animales Domésticos/genética , Perros/genética , Genoma/genética , Haplotipos/genética , Polimorfismo de Nucleótido Simple/genética , Animales , Animales Domésticos/clasificación , Animales Salvajes/clasificación , Animales Salvajes/genética , Cruzamiento , Biología Computacional , Perros/clasificación , Evolución Molecular , Asia Oriental/etnología , Medio Oriente/etnología , Fenotipo , Filogenia , Lobos/clasificación , Lobos/genéticaRESUMEN
Copy number variants (CNVs) account for a major proportion of human genetic polymorphism and have been predicted to have an important role in genetic susceptibility to common disease. To address this we undertook a large, direct genome-wide study of association between CNVs and eight common human diseases. Using a purpose-designed array we typed approximately 19,000 individuals into distinct copy-number classes at 3,432 polymorphic CNVs, including an estimated approximately 50% of all common CNVs larger than 500 base pairs. We identified several biological artefacts that lead to false-positive associations, including systematic CNV differences between DNAs derived from blood and cell lines. Association testing and follow-up replication analyses confirmed three loci where CNVs were associated with disease-IRGM for Crohn's disease, HLA for Crohn's disease, rheumatoid arthritis and type 1 diabetes, and TSPAN8 for type 2 diabetes-although in each case the locus had previously been identified in single nucleotide polymorphism (SNP)-based studies, reflecting our observation that most common CNVs that are well-typed on our array are well tagged by SNPs and so have been indirectly explored through SNP studies. We conclude that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases.
Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Enfermedad , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Artritis Reumatoide/genética , Estudios de Casos y Controles , Enfermedad de Crohn/genética , Diabetes Mellitus/genética , Frecuencia de los Genes/genética , Humanos , Hibridación de Ácido Nucleico , Análisis de Secuencia por Matrices de Oligonucleótidos , Proyectos Piloto , Polimorfismo de Nucleótido Simple/genética , Control de CalidadRESUMEN
The identification of the H3K4 trimethylase, PRDM9, as the gene responsible for recombination hotspot localization has provided considerable insight into the mechanisms by which recombination is initiated in mammals. However, uniquely amongst mammals, canids appear to lack a functional version of PRDM9 and may therefore provide a model for understanding recombination that occurs in the absence of PRDM9, and thus how PRDM9 functions to shape the recombination landscape. We have constructed a fine-scale genetic map from patterns of linkage disequilibrium assessed using high-throughput sequence data from 51 free-ranging dogs, Canis lupus familiaris. While broad-scale properties of recombination appear similar to other mammalian species, our fine-scale estimates indicate that canine highly elevated recombination rates are observed in the vicinity of CpG rich regions including gene promoter regions, but show little association with H3K4 trimethylation marks identified in spermatocytes. By comparison to genomic data from the Andean fox, Lycalopex culpaeus, we show that biased gene conversion is a plausible mechanism by which the high CpG content of the dog genome could have occurred.
Asunto(s)
Evolución Molecular , Conversión Génica , Regiones Promotoras Genéticas , Recombinación Genética , Animales , Mapeo Cromosómico , Islas de CpG , Perros , Estudios de Asociación Genética , Genoma , N-Metiltransferasa de Histona-Lisina/genética , Desequilibrio de LigamientoRESUMEN
Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago.
Asunto(s)
Genoma Humano , Haplotipos/genética , Población/genética , Grupos Raciales/genética , Genética de Población/métodos , Heterocigoto , Humanos , Polimorfismo de Nucleótido SimpleRESUMEN
Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome. Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing; an individual's DNA can be used to infer their geographic origin with surprising accuracy-often to within a few hundred kilometres.
Asunto(s)
Variación Genética/genética , Genética de Población , Geografía , Emigración e Inmigración , Europa (Continente)/etnología , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Filogenia , Polimorfismo de Nucleótido Simple , Análisis de Componente Principal , Carácter Cuantitativo Heredable , Tamaño de la MuestraRESUMEN
Domestic dogs exhibit tremendous phenotypic diversity, including a greater variation in body size than any other terrestrial mammal. Here, we generate a high density map of canine genetic variation by genotyping 915 dogs from 80 domestic dog breeds, 83 wild canids, and 10 outbred African shelter dogs across 60,968 single-nucleotide polymorphisms (SNPs). Coupling this genomic resource with external measurements from breed standards and individuals as well as skeletal measurements from museum specimens, we identify 51 regions of the dog genome associated with phenotypic variation among breeds in 57 traits. The complex traits include average breed body size and external body dimensions and cranial, dental, and long bone shape and size with and without allometric scaling. In contrast to the results from association mapping of quantitative traits in humans and domesticated plants, we find that across dog breeds, a small number of quantitative trait loci (< or = 3) explain the majority of phenotypic variation for most of the traits we studied. In addition, many genomic regions show signatures of recent selection, with most of the highly differentiated regions being associated with breed-defining traits such as body size, coat characteristics, and ear floppiness. Our results demonstrate the efficacy of mapping multiple traits in the domestic dog using a database of genotyped individuals and highlight the important role human-directed selection has played in altering the genetic architecture of key traits in this important species.