Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
2.
Cell Genom ; 4(1): 100469, 2024 Jan 10.
Artículo en Inglés | MEDLINE | ID: mdl-38190103

RESUMEN

Epigenetics underpins the regulation of genes known to play a key role in the adaptive and innate immune system (AIIS). We developed a method, EpiNN, that leverages epigenetic data to detect AIIS-relevant genomic regions and used it to detect 2,765 putative AIIS loci. Experimental validation of one of these loci, DNMT1, provided evidence for a novel AIIS-specific transcription start site. We built a genome-wide AIIS annotation and used linkage disequilibrium (LD) score regression to test whether it predicts regional heritability using association statistics for 176 traits. We detected significant heritability effects (average |τ∗|=1.65) for 20 out of 26 immune-relevant traits. In a meta-analysis, immune-relevant traits and diseases were 4.45× more enriched for heritability than other traits. The EpiNN annotation was also depleted of trans-ancestry genetic correlation, indicating ancestry-specific effects. These results underscore the effectiveness of leveraging supervised learning algorithms and epigenetic data to detect loci implicated in specific classes of traits and diseases.


Asunto(s)
Genómica , Sitios de Carácter Cuantitativo , Fenotipo , Desequilibrio de Ligamiento/genética , Epigénesis Genética/genética
3.
Nat Commun ; 14(1): 7945, 2023 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-38040695

RESUMEN

Individuals sharing recent ancestors are likely to co-inherit large identical-by-descent (IBD) genomic regions. The distribution of these IBD segments in a population may be used to reconstruct past demographic events such as effective population size variation, but accurate IBD detection is difficult in ancient DNA data and in underrepresented populations with limited reference data. In this work, we introduce an accurate method for inferring effective population size variation during the past ~2000 years in both modern and ancient DNA data, called HapNe. HapNe infers recent population size fluctuations using either IBD sharing (HapNe-IBD) or linkage disequilibrium (HapNe-LD), which does not require phasing and can be computed in low coverage data, including data sets with heterogeneous sampling times. HapNe shows improved accuracy in a range of simulated demographic scenarios compared to currently available methods for IBD-based and LD-based inference of recent effective population size, while requiring fewer computational resources. We apply HapNe to several modern populations from the 1,000 Genomes Project, the UK Biobank, the Allen Ancient DNA Resource, and recently published samples from Iron Age Britain, detecting multiple instances of recent effective population size variation across these groups.


Asunto(s)
ADN Antiguo , Genómica , Humanos , Haplotipos/genética , Densidad de Población , Desequilibrio de Ligamiento , Genética de Población , Polimorfismo de Nucleótido Simple
4.
Mol Biol Evol ; 40(10)2023 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-37738175

RESUMEN

Accurate inference of the time to the most recent common ancestor (TMRCA) between pairs of individuals and of the age of genomic variants is key in several population genetic analyses. We developed a likelihood-free approach, called CoalNN, which uses a convolutional neural network to predict pairwise TMRCAs and allele ages from sequencing or SNP array data. CoalNN is trained through simulation and can be adapted to varying parameters, such as demographic history, using transfer learning. Across several simulated scenarios, CoalNN matched or outperformed the accuracy of model-based approaches for pairwise TMRCA and allele age prediction. We applied CoalNN to settings for which model-based approaches are under-developed and performed analyses to gain insights into the set of features it uses to perform TMRCA prediction. We next used CoalNN to analyze 2,504 samples from 26 populations in the 1,000 Genome Project data set, inferring the age of ∼80 million variants. We observed substantial variation across populations and for variants predicted to be pathogenic, reflecting heterogeneous demographic histories and the action of negative selection. We used CoalNN's predicted allele ages to construct genome-wide annotations capturing the signature of past negative selection. We performed LD-score regression analysis of heritability using summary association statistics from 63 independent complex traits and diseases (average N=314k), observing increased annotation-specific effects on heritability compared to a previous allele age annotation. These results highlight the effectiveness of using likelihood-free, simulation-trained models to infer properties of gene genealogies in large genomic data sets.


Asunto(s)
Genoma , Redes Neurales de la Computación , Humanos , Simulación por Computador , Genómica , Herencia Multifactorial , Polimorfismo de Nucleótido Simple , Modelos Genéticos
5.
Bioinformatics ; 39(9)2023 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-37647640

RESUMEN

MOTIVATION: Existing methods for simulating synthetic genotype and phenotype datasets have limited scalability, constraining their usability for large-scale analyses. Moreover, a systematic approach for evaluating synthetic data quality and a benchmark synthetic dataset for developing and evaluating methods for polygenic risk scores are lacking. RESULTS: We present HAPNEST, a novel approach for efficiently generating diverse individual-level genotypic and phenotypic data. In comparison to alternative methods, HAPNEST shows faster computational speed and a lower degree of relatedness with reference panels, while generating datasets that preserve key statistical properties of real data. These desirable synthetic data properties enabled us to generate 6.8 million common variants and nine phenotypes with varying degrees of heritability and polygenicity across 1 million individuals. We demonstrate how HAPNEST can facilitate biobank-scale analyses through the comparison of seven methods to generate polygenic risk scoring across multiple ancestry groups and different genetic architectures. AVAILABILITY AND IMPLEMENTATION: A synthetic dataset of 1 008 000 individuals and nine traits for 6.8 million common variants is available at https://www.ebi.ac.uk/biostudies/studies/S-BSST936. The HAPNEST software for generating synthetic datasets is available as Docker/Singularity containers and open source Julia and C code at https://github.com/intervene-EU-H2020/synthetic_data.


Asunto(s)
Benchmarking , Exactitud de los Datos , Humanos , Genotipo , Fenotipo , Herencia Multifactorial
6.
Nat Genet ; 55(5): 768-776, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37127670

RESUMEN

Genome-wide genealogies compactly represent the evolutionary history of a set of genomes and inferring them from genetic data has the potential to facilitate a wide range of analyses. We introduce a method, ARG-Needle, for accurately inferring biobank-scale genealogies from sequencing or genotyping array data, as well as strategies to utilize genealogies to perform association and other complex trait analyses. We use these methods to build genome-wide genealogies using genotyping data for 337,464 UK Biobank individuals and test for association across seven complex traits. Genealogy-based association detects more rare and ultra-rare signals (N = 134, frequency range 0.0007-0.1%) than genotype imputation using ~65,000 sequenced haplotypes (N = 64). In a subset of 138,039 exome sequencing samples, these associations strongly tag (average r = 0.72) underlying sequencing variants enriched (4.8×) for loss-of-function variation. These results demonstrate that inferred genome-wide genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.


Asunto(s)
Genética de Población , Herencia Multifactorial , Humanos , Herencia Multifactorial/genética , Bancos de Muestras Biológicas , Genotipo , Recombinación Genética , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple/genética
7.
Sci Adv ; 7(36): eabh0534, 2021 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-34516908

RESUMEN

Multimodal, genome-wide characterization of epigenetic and genetic information in circulating cell-free DNA (cfDNA) could enable more sensitive early cancer detection, but it is technologically challenging. Recently, we developed TET-assisted pyridine borane sequencing (TAPS), which is a mild, bisulfite-free method for base-resolution direct DNA methylation sequencing. Here, we optimized TAPS for cfDNA (cfTAPS) to provide high-quality and high-depth whole-genome cell-free methylomes. We applied cfTAPS to 85 cfDNA samples from patients with hepatocellular carcinoma (HCC) or pancreatic ductal adenocarcinoma (PDAC) and noncancer controls. From only 10 ng of cfDNA (1 to 3 ml of plasma), we generated the most comprehensive cfDNA methylome to date. We demonstrated that cfTAPS provides multimodal information about cfDNA characteristics, including DNA methylation, tissue of origin, and DNA fragmentation. Integrated analysis of these epigenetic and genetic features enables accurate identification of early HCC and PDAC.

8.
Nat Commun ; 11(1): 6130, 2020 11 30.
Artículo en Inglés | MEDLINE | ID: mdl-33257650

RESUMEN

Detection of Identical-By-Descent (IBD) segments provides a fundamental measure of genetic relatedness and plays a key role in a wide range of analyses. We develop FastSMC, an IBD detection algorithm that combines a fast heuristic search with accurate coalescent-based likelihood calculations. FastSMC enables biobank-scale detection and dating of IBD segments within several thousands of years in the past. We apply FastSMC to 487,409 UK Biobank samples and detect ~214 billion IBD segments transmitted by shared ancestors within the past 1500 years, obtaining a fine-grained picture of genetic relatedness in the UK. Sharing of common ancestors strongly correlates with geographic distance, enabling the use of genomic data to localize a sample's birth coordinates with a median error of 45 km. We seek evidence of recent positive selection by identifying loci with unusually strong shared ancestry and detect 12 genome-wide significant signals. We devise an IBD-based test for association between phenotype and ultra-rare loss-of-function variation, identifying 29 association signals in 7 blood-related traits.


Asunto(s)
Genética de Población , Población Blanca/genética , Algoritmos , Genoma Humano , Estudio de Asociación del Genoma Completo , Haplotipos , Humanos , Fenotipo , Carácter Cuantitativo Heredable , Reino Unido
9.
Mol Biol Evol ; 37(5): 1306-1316, 2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-31957793

RESUMEN

Elucidation of natural selection signatures and relationships with phenotype spectra is important to understand adaptive evolution of modern humans. Here, we conducted a genome-wide scan of selection signatures of the Japanese population by estimating locus-specific time to the most recent common ancestor using the ascertained sequentially Markovian coalescent (ASMC), from the biobank-based large-scale genome-wide association study data of 170,882 subjects. We identified 29 genetic loci with selection signatures satisfying the genome-wide significance. The signatures were most evident at the alcohol dehydrogenase (ADH) gene cluster locus at 4q23 (PASMC = 2.2 × 10-36), followed by relatively strong selection at the FAM96A (15q22), MYOF (10q23), 13q21, GRIA2 (4q32), and ASAP2 (2p25) loci (PASMC < 1.0 × 10-10). The additional analysis interrogating extended haplotypes (integrated haplotype score) showed robust concordance of the detected signatures, contributing to fine-mapping of the genes, and provided allelic directional insights into selection pressure (e.g., positive selection for ADH1B-Arg48His and HLA-DPB1*04:01). The phenome-wide selection enrichment analysis with the trait-associated variants identified a variety of the modern human phenotypes involved in the adaptation of Japanese. We observed population-specific evidence of enrichment with the alcohol-related phenotypes, anthropometric and biochemical clinical measurements, and immune-related diseases, differently from the findings in Europeans using the UK Biobank resource. Our study demonstrated population-specific features of the selection signatures in Japanese, highlighting a value of the natural selection study using the nation-wide biobank-scale genome and phenotype data.


Asunto(s)
Pueblo Asiatico/genética , Genoma Humano , Selección Genética , Estudio de Asociación del Genoma Completo , Humanos , Cadenas de Markov , Fenotipo
10.
Nat Genet ; 51(8): 1295, 2019 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-31273336

RESUMEN

In the version of the paper initially published, information on competing interests for author Benjamin M. Neale was missing. The 'Competing interests' statement should have included the sentence 'B.M.N. is on the Scientific Advisory Board of Deep Genomics'.

11.
Nat Genet ; 50(10): 1483-1493, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30177862

RESUMEN

Biological interpretation of genome-wide association study data frequently involves assessing whether SNPs linked to a biological process, for example, binding of a transcription factor, show unsigned enrichment for disease signal. However, signed annotations quantifying whether each SNP allele promotes or hinders the biological process can enable stronger statements about disease mechanism. We introduce a method, signed linkage disequilibrium profile regression, for detecting genome-wide directional effects of signed functional annotations on disease risk. We validate the method via simulations and application to molecular quantitative trait loci in blood, recovering known transcriptional regulators. We apply the method to expression quantitative trait loci in 48 Genotype-Tissue Expression tissues, identifying 651 transcription factor-tissue associations including 30 with robust evidence of tissue specificity. We apply the method to 46 diseases and complex traits (average n = 290 K), identifying 77 annotation-trait associations representing 12 independent transcription factor-trait associations, and characterize the underlying transcriptional programs using gene-set enrichment analyses. Our results implicate new causal disease genes and new disease mechanisms.


Asunto(s)
Enfermedad/genética , Estudio de Asociación del Genoma Completo , Herencia Multifactorial/genética , Sitios de Carácter Cuantitativo , Factores de Transcripción/metabolismo , Sitios de Unión/genética , Células Sanguíneas/metabolismo , Células Sanguíneas/patología , Análisis Químico de la Sangre , Regulación de la Expresión Génica , Predisposición Genética a la Enfermedad , Humanos , Desequilibrio de Ligamiento , Fenotipo , Polimorfismo de Nucleótido Simple , Unión Proteica , Factores de Riesgo
12.
Nat Genet ; 50(9): 1311-1317, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-30104759

RESUMEN

Interest in reconstructing demographic histories has motivated the development of methods to estimate locus-specific pairwise coalescence times from whole-genome sequencing data. Here we introduce a powerful new method, ASMC, that can estimate coalescence times using only SNP array data, and is orders of magnitude faster than previous approaches. We applied ASMC to detect recent positive selection in 113,851 phased British samples from the UK Biobank, and detected 12 genome-wide significant signals, including 6 novel loci. We also applied ASMC to sequencing data from 498 Dutch individuals to detect background selection at deeper time scales. We detected strong heritability enrichment in regions of high background selection in an analysis of 20 independent diseases and complex traits using stratified linkage disequilibrium score regression, conditioned on a broad set of functional annotations (including other background selection annotations). These results underscore the widespread effects of background selection on the genetic architecture of complex traits.


Asunto(s)
Enfermedad/genética , Desequilibrio de Ligamiento/genética , Estudio de Asociación del Genoma Completo/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Humanos , Modelos Genéticos , Anotación de Secuencia Molecular/métodos , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética
13.
Nature ; 559(7714): 350-355, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29995854

RESUMEN

The selective pressures that shape clonal evolution in healthy individuals are largely unknown. Here we investigate 8,342 mosaic chromosomal alterations, from 50 kb to 249 Mb long, that we uncovered in blood-derived DNA from 151,202 UK Biobank participants using phase-based computational techniques (estimated false discovery rate, 6-9%). We found six loci at which inherited variants associated strongly with the acquisition of deletions or loss of heterozygosity in cis. At three such loci (MPL, TM2D3-TARSL2, and FRA10B), we identified a likely causal variant that acted with high penetrance (5-50%). Inherited alleles at one locus appeared to affect the probability of somatic mutation, and at three other loci to be objects of positive or negative clonal selection. Several specific mosaic chromosomal alterations were strongly associated with future haematological malignancies. Our results reveal a multitude of paths towards clonal expansions with a wide range of effects on human health.


Asunto(s)
Aberraciones Cromosómicas , Células Clonales/citología , Células Clonales/metabolismo , Hematopoyesis/genética , Mosaicismo , Adulto , Anciano , Alelos , Bancos de Muestras Biológicas , Rotura Cromosómica , Sitios Frágiles del Cromosoma/genética , Cromosomas Humanos Par 10/genética , Femenino , Salud , Neoplasias Hematológicas/genética , Neoplasias Hematológicas/mortalidad , Humanos , Masculino , Persona de Mediana Edad , Penetrancia , Reino Unido
14.
Nat Genet ; 49(10): 1421-1427, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-28892061

RESUMEN

Recent work has hinted at the linkage disequilibrium (LD)-dependent architecture of human complex traits, where SNPs with low levels of LD (LLD) have larger per-SNP heritability. Here we analyzed summary statistics from 56 complex traits (average N = 101,401) by extending stratified LD score regression to continuous annotations. We determined that SNPs with low LLD have significantly larger per-SNP heritability and that roughly half of this effect can be explained by functional annotations negatively correlated with LLD, such as DNase I hypersensitivity sites (DHSs). The remaining signal is largely driven by our finding that more recent common variants tend to have lower LLD and to explain more heritability (P = 2.38 × 10-104); the youngest 20% of common SNPs explain 3.9 times more heritability than the oldest 20%, consistent with the action of negative selection. We also inferred jointly significant effects of other LD-related annotations and confirmed via forward simulations that they jointly predict deleterious effects.


Asunto(s)
Variación Genética/genética , Desequilibrio de Ligamiento , Herencia Multifactorial/genética , Polimorfismo de Nucleótido Simple , Selección Genética , Alelos , Distribución de Chi-Cuadrado , Conjuntos de Datos como Asunto , Aptitud Genética , Humanos , Modelos Genéticos , Anotación de Secuencia Molecular
15.
Nat Genet ; 48(11): 1443-1448, 2016 11.
Artículo en Inglés | MEDLINE | ID: mdl-27694958

RESUMEN

Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing in a genotyped cohort, an approach that can yield high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ∼20× speedup and ∼10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2× the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.


Asunto(s)
Algoritmos , Haplotipos , Estudios de Cohortes , Femenino , Genotipo , Humanos , Masculino , Valores de Referencia
16.
Bioinformatics ; 32(19): 3032-4, 2016 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-27312410

RESUMEN

MOTIVATION: Simulation under the coalescent model is ubiquitous in the analysis of genetic data. The rapid growth of real data sets from multiple human populations led to increasing interest in simulating very large sample sizes at whole-chromosome scales. When the sample size is large, the coalescent model becomes an increasingly inaccurate approximation of the discrete time Wright-Fisher model (DTWF). Analytical and computational treatment of the DTWF, however, is generally harder. RESULTS: We present a simulator (ARGON) for the DTWF process that scales up to hundreds of thousands of samples and whole-chromosome lengths, with a time/memory performance comparable or superior to currently available methods for coalescent simulation. The simulator supports arbitrary demographic history, migration, Newick tree output, variable mutation/recombination rates and gene conversion, and efficiently outputs pairwise identical-by-descent sharing data. AVAILABILITY: ARGON (version 0.1) is written in Java, open source, and freely available at https://github.com/pierpal/ARGON CONTACT: ppalama@hsph.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Simulación por Computador , Genoma , Modelos Teóricos , Animales , Conversión Génica , Humanos , Mutación
17.
Nat Genet ; 48(7): 811-6, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27270109

RESUMEN

Recent work has leveraged the extensive genotyping of the Icelandic population to perform long-range phasing (LRP), enabling accurate imputation and association analysis of rare variants in target samples typed on genotyping arrays. Here we develop a fast and accurate LRP method, Eagle, that extends this paradigm to populations with much smaller proportions of genotyped samples by harnessing long (>4-cM) identical-by-descent (IBD) tracts shared among distantly related individuals. We applied Eagle to N ≈ 150,000 samples (0.2% of the British population) from the UK Biobank, and we determined that it is 1-2 orders of magnitude faster than existing methods while achieving similar or better phasing accuracy (switch error rate ≈ 0.3%, corresponding to perfect phase in a majority of 10-Mb segments). We also observed that, when used within an imputation pipeline, Eagle prephasing improved downstream imputation accuracy in comparison to prephasing in batches using existing methods, as necessary to achieve comparable computational cost.


Asunto(s)
Algoritmos , Bancos de Muestras Biológicas , Biología Computacional/métodos , Genética de Población , Patrón de Herencia/genética , Estudios de Cohortes , Genoma Humano , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN/métodos , Reino Unido , Población Blanca
18.
Am J Hum Genet ; 97(6): 775-89, 2015 Dec 03.
Artículo en Inglés | MEDLINE | ID: mdl-26581902

RESUMEN

The rate at which human genomes mutate is a central biological parameter that has many implications for our ability to understand demographic and evolutionary phenomena. We present a method for inferring mutation and gene-conversion rates by using the number of sequence differences observed in identical-by-descent (IBD) segments together with a reconstructed model of recent population-size history. This approach is robust to, and can quantify, the presence of substantial genotyping error, as validated in coalescent simulations. We applied the method to 498 trio-phased sequenced Dutch individuals and inferred a point mutation rate of 1.66 × 10(-8) per base per generation and a rate of 1.26 × 10(-9) for <20 bp indels. By quantifying how estimates varied as a function of allele frequency, we inferred the probability that a site is involved in non-crossover gene conversion as 5.99 × 10(-6). We found that recombination does not have observable mutagenic effects after gene conversion is accounted for and that local gene-conversion rates reflect recombination rates. We detected a strong enrichment of recent deleterious variation among mismatching variants found within IBD regions and observed summary statistics of local sharing of IBD segments to closely match previously proposed metrics of background selection; however, we found no significant effects of selection on our mutation-rate estimates. We detected no evidence of strong variation of mutation rates in a number of genomic annotations obtained from several recent studies. Our analysis suggests that a mutation-rate estimate higher than that reported by recent pedigree-based studies should be adopted in the context of DNA-based demographic reconstruction.


Asunto(s)
Genoma Humano , Mutación de Línea Germinal , Modelos Genéticos , Tasa de Mutación , Alelos , Frecuencia de los Genes , Haplotipos , Humanos , Mutación INDEL , Modelos Lineales , Recombinación Genética
19.
Bioinformatics ; 29(13): i180-8, 2013 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-23812983

RESUMEN

SUMMARY: Pairs of individuals from a study cohort will often share long-range haplotypes identical-by-descent. Such haplotypes are transmitted from common ancestors that lived tens to hundreds of generations in the past, and they can now be efficiently detected in high-resolution genomic datasets, providing a novel source of information in several domains of genetic analysis. Recently, haplotype sharing distributions were studied in the context of demographic inference, and they were used to reconstruct recent demographic events in several populations. We here extend the framework to handle demographic models that contain multiple demes interacting through migration. We extensively test our formulation in several demographic scenarios, compare our approach with methods based on ancestry deconvolution and use this method to analyze Masai samples from the HapMap 3 dataset. AVAILABILITY: DoRIS, a Java implementation of the proposed method, and its source code are freely available at http://www.cs.columbia.edu/~pier/doris.


Asunto(s)
Haplotipos , Migración Humana , Proyecto Mapa de Haplotipos , Humanos , Modelos Genéticos
20.
Genetics ; 193(3): 911-28, 2013 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-23267057

RESUMEN

Widespread sharing of long, identical-by-descent (IBD) genetic segments is a hallmark of populations that have experienced recent genetic drift. Detection of these IBD segments has recently become feasible, enabling a wide range of applications from phasing and imputation to demographic inference. Here, we study the distribution of IBD sharing in the Wright-Fisher model. Specifically, using coalescent theory, we calculate the variance of the total sharing between random pairs of individuals. We then investigate the cohort-averaged sharing: the average total sharing between one individual and the rest of the cohort. We find that for large cohorts, the cohort-averaged sharing is distributed approximately normally. Surprisingly, the variance of this distribution does not vanish even for large cohorts, implying the existence of "hypersharing" individuals. The presence of such individuals has consequences for the design of sequencing studies, since, if they are selected for whole-genome sequencing, a larger fraction of the cohort can be subsequently imputed. We calculate the expected gain in power of imputation by IBD and subsequently in power to detect an association, when individuals are either randomly selected or specifically chosen to be the hypersharing individuals. Using our framework, we also compute the variance of an estimator of the population size that is based on the mean IBD sharing and the variance in the sharing between inbred siblings. Finally, we study IBD sharing in an admixture pulse model and show that in the Ashkenazi Jewish population the admixture fraction is correlated with the cohort-averaged sharing.


Asunto(s)
Variación Genética , Modelos Genéticos , Linaje , Población/genética , Simulación por Computador , Demografía , Humanos , Judíos/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...