Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 34
1.
bioRxiv ; 2024 May 10.
Article En | MEDLINE | ID: mdl-38766004

Haplotype phasing, the process of determining which genetic variants are physically located on the same chromosome, is crucial for various genetic analyses. In this study, we first benchmark SHAPEIT and Beagle, two state-of-the-art phasing methods, on two large datasets: > 8 million diverse, research-consented 23andMe, Inc. customers and the UK Biobank (UKB). We find that both perform exceptionally well. Beagle's median switch error rate (SER) (after excluding single SNP switches) in white British trios from UKB is 0.026% compared to 0.00% for European ancestry 23andMe research participants; 55.6% of European ancestry 23andMe research participants have zero non-single SNP switches, compared to 42.4% of white British trios. South Asian ancestry 23andMe research participants have the highest median SER amongst the 23andMe populations, but it is still remarkably low at 0.46%. We also investigate the relationship between identity-by-descent (IBD) and SER, finding that switch errors tend to occur in regions of little or no IBD segment coverage. SHAPEIT and Beagle excel at 'intra-chromosomal' phasing, but lack the ability to phase across chromosomes, motivating us to develop an inter-chromosomal phasing method, called HAPTIC ( HAP lotype TI ling and C lustering), that assigns paternal and maternal variants discretely genome-wide. Our approach uses identity-by-descent (IBD) segments to phase blocks of variants on different chromosomes. HAPTIC represents the segments a focal individual shares with their relatives as nodes in a signed graph and performs bipartite clustering on the signed graph using spectral clustering. We test HAPTIC on 1022 UKB trios, yielding a median phase error of 0.08% in regions covered by IBD segments (33.5% of sites). We also ran HAPTIC in the 23andMe database and found a median phase error rate (the rate of mismatching alleles between the inferred and true phase) of 0.92% in Europeans (93.8% of sites) and 0.09% in admixed Africans (92.7% of sites). HAPTIC's precision depends heavily on data from relatives, so will increase as datasets grow larger and more diverse. HAPTIC enables analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.

2.
bioRxiv ; 2024 May 14.
Article En | MEDLINE | ID: mdl-38798596

Reconstructing the DNA of ancestors from their descendants has the potential to empower phenotypic analyses (including association and genetic nurture studies), improve pedigree reconstruction, and shed light on the ancestral population and phenotypes of ancestors. We developed HAPI-RECAP, a method that reconstructs the DNA of parents from full siblings and their relatives. This tool leverages HAPI2's output, a new phasing approach that applies to siblings (and optionally one or both parents) and reliably infers parent haplotypes but does not link the ungenotyped parents' DNA across chromosomes or between segments flanking ambiguities. By combining IBD between the reconstructed parents and the relatives, HAPI-RECAP resolves the source parent of these segments. Moreover, the method exploits crossovers the children inherited and sex-specific genetic maps to infer the reconstructed parents' sexes. We validated these methods on research participants from both 23andMe, Inc. and the San Antonio Mexican American Family Studies. Given data for one parent, HAPI2 reconstructs large fractions of the missing parent's DNA, between 77.6% and 99.97% among all families, and 90.3% on average in three- and four-child families. When reconstructing both parents, HAPI-RECAP inferred between 33.2% and 96.6% of the parents' genotypes, averaging 70.6% in four-child families. Reconstructed genotypes have average error rates < 10-3, or comparable to those from direct genotyping. HAPI-RECAP inferred the parent sexes 100% correctly given IBD-linked segments and can also reconstruct parents without any IBD. As datasets grow in size, more families will be implicitly collected; HAPI-RECAP holds promise to enable high quality parent genotype reconstruction.

3.
bioRxiv ; 2023 Dec 04.
Article En | MEDLINE | ID: mdl-38106003

Local ancestry inference (LAI) is an indispensable component of a variety of analyses in medical and population genetics, from admixture mapping to characterizing demographic history. However, the accuracy of LAI depends on a number of factors such as phase quality (for phase-based LAI methods), time since admixture of the population under study, and other factors. Here we present an empirical analysis of four LAI methods using simulated individuals of mixed African and European ancestry, examining the impact of variable phase quality and a range of demographic scenarios. We found that regardless of phasing options, calls from LAI methods that operate on unphased genotypes (phase-free LAI) have 2.6-4.6% higher Pearson correlation with the ground truth than methods that operate on phased genotypes (phase-based LAI). Applying the TRACTOR phase-correction algorithm led to modest improvements in phase-based LAI, but despite this, the Pearson correlation of phase-free LAI remained 2.4-3.8% higher than phase-corrected phase-based approaches (considering the best performing methods in each category). Phase-free and phase-based LAI accuracy differences can dramatically impact downstream analyses: estimates of the time since admixture using phase-based LAI tracts are upwardly biased by ≈10 generations using our highest quality phased data but have virtually no bias using phase-free LAI calls. Our study underscores the strong dependence of phase-based LAI accuracy on phase quality and highlights the merits of LAI approaches that analyze unphased genetic data.

4.
Birth Defects Res ; 115(7): 797-800, 2023 04 15.
Article En | MEDLINE | ID: mdl-36855851

BACKGROUND: The sixth Strategic Planning Session of the Society for Birth Defects Research and Prevention (BDRP) was held on April 24-25, 2022, in Alexandria, VA. METHODS: This effort built upon previous strategic planning sessions, conducted every 5 years. RESULTS: The overall process was designed to identify BDRP's vision, purpose, culture, and potential, as well as to communicate the value that BDRP brings to its members, volunteers, partners, and the greater community. CONCLUSIONS: The BDRP 2022-2027 Strategic Plan provides the BDRP leadership, members, and staff with a clearly articulated framework and direction to support long-term sustainability and growth of the society.


Leadership , Societies , Humans , Research Design
5.
Bioinformatics ; 39(3)2023 03 01.
Article En | MEDLINE | ID: mdl-36847450

SUMMARY: Leveraging local ancestry and haplotype information in genome-wide association studies and downstream analyses can improve the utility of genomics for individuals from diverse and recently admixed ancestries. However, most existing simulation, visualization and variant analysis frameworks are based on variant-level analysis and do not automatically handle these features. We present haptools, an open-source toolkit for performing local ancestry aware and haplotype-based analysis of complex traits. Haptools supports fast simulation of admixed genomes, visualization of admixture tracks, simulation of haplotype- and local ancestry-specific phenotype effects and a variety of file operations and statistics computed in a haplotype-aware manner. AVAILABILITY AND IMPLEMENTATION: Haptools is freely available at https://github.com/cast-genomics/haptools. DOCUMENTATION: Detailed documentation is available at https://haptools.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Genome-Wide Association Study , Software , Haplotypes , Genomics , Genome
6.
Am J Hum Genet ; 109(8): 1405-1420, 2022 08 04.
Article En | MEDLINE | ID: mdl-35908549

Population genetic analyses of local ancestry tracts routinely assume that the ancestral admixture process is identical for both parents of an individual, an assumption that may be invalid when considering recent admixture. Here, we present Parental Admixture Proportion Inference (PAPI), a Bayesian tool for inferring the admixture proportions and admixture times for each parent of a single admixed individual. PAPI analyzes unphased local ancestry tracts and has two components: a binomial model that leverages genome-wide ancestry fractions to infer parental admixture proportions and a hidden Markov model (HMM) that infers admixture times from tract lengths. Crucially, the HMM accounts for unobserved within-ancestry recombination by approximating the pedigree crossover dynamics, enabling inference of parental admixture times. In simulations, we find that PAPI's admixture proportion estimates deviate from the truth by 0.047 on average, outperforming ANCESTOR and PedMix by 46.0% and 57.6%, respectively. Moreover, PAPI's admixture time estimates were strongly correlated with the truth (R=0.76) but have an average downward bias of 1.01 generations that is partly attributable to inaccuracies in local ancestry inference. As an illustration of its utility, we ran PAPI on African American genotypes from the PAGE study (N = 5,786) and found strong evidence of assortative mating by ancestry proportion: couples' ancestry proportions are highly correlated (R = 0.87) and are closer to each other than expected under random mating (p < 10-6). We anticipate that PAPI will be useful in studying the population dynamics of admixture and will also be of interest to individuals seeking to learn about their personal genealogies.


Black or African American , Genetics, Population , Bayes Theorem , Humans , Parents , Pedigree
7.
G3 (Bethesda) ; 12(6)2022 05 30.
Article En | MEDLINE | ID: mdl-35348675

Despite decades of methods development for classifying relatives in genetic studies, pairwise relatedness methods' recalls are above 90% only for first through third-degree relatives. The top-performing approaches, which leverage identity-by-descent segments, often use only kinship coefficients, while others, including estimation of recent shared ancestry (ERSA), use the number of segments relatives share. To quantify the potential for using segment numbers in relatedness inference, we leveraged information theory measures to analyze exact (i.e. produced by a simulator) identity-by-descent segments from simulated relatives. Over a range of settings, we found that the mutual information between the relatives' degree of relatedness and a tuple of their kinship coefficient and segment number is on average 4.6% larger than between the degree and the kinship coefficient alone. We further evaluated identity-by-descent segment number utility by building a Bayes classifier to predict first through sixth-degree relationships using different feature sets. When trained and tested with exact segments, the inclusion of segment numbers improves the recall by between 0.28% and 3% for second through sixth-degree relatives. However, the recalls improve by less than 1.8% per degree when using inferred segments, suggesting limitations due to identity-by-descent detection accuracy. Last, we compared our Bayes classifier that includes segment numbers with both ERSA and IBIS and found comparable recalls, with the Bayes classifier and ERSA slightly outperforming each other across different degrees. Overall, this study shows that identity-by-descent segment numbers can improve relatedness inference, but errors from current SNP array-based detection methods yield dampened signals in practice.


Genome, Human , Information Theory , Bayes Theorem , Humans , Pedigree , Polymorphism, Single Nucleotide
8.
G3 (Bethesda) ; 12(1)2022 01 04.
Article En | MEDLINE | ID: mdl-34791172

Recombination has essential functions in meiosis, evolution, and breeding. The frequency and distribution of crossovers dictate the generation of new allele combinations and can vary across species and between sexes. Here, we examine recombination landscapes across the 18 chromosomes of cassava (Manihot esculenta Crantz) with respect to male and female meioses and known introgressions from the wild relative Manihot glaziovii. We used SHAPEIT2 and duoHMM to infer crossovers from genotyping-by-sequencing data and a validated multigenerational pedigree from the International Institute of Tropical Agriculture cassava breeding germplasm consisting of 7020 informative meioses. We then constructed new genetic maps and compared them to an existing map previously constructed by the International Cassava Genetic Map Consortium. We observed higher recombination rates in females compared to males, and lower recombination rates in M. glaziovii introgression segments on chromosomes 1 and 4, with suppressed recombination along the entire length of the chromosome in the case of the chromosome 4 introgression. Finally, we discuss hypothesized mechanisms underlying our observations of heterochiasmy and crossover suppression and discuss the broader implications for plant breeding.


Manihot , Alleles , Manihot/genetics , Plant Breeding , Recombination, Genetic , Sex Characteristics
9.
Am J Hum Genet ; 108(9): 1792-1806, 2021 09 02.
Article En | MEDLINE | ID: mdl-34411538

The Finnish population is a unique example of a genetic isolate affected by a recent founder event. Previous studies have suggested that the ancestors of Finnic-speaking Finns and Estonians reached the circum-Baltic region by the 1st millennium BC. However, high linguistic similarity points to a more recent split of their languages. To study genetic connectedness between Finns and Estonians directly, we first assessed the efficacy of imputation of low-coverage ancient genomes by sequencing a medieval Estonian genome to high depth (23×) and evaluated the performance of its down-sampled replicas. We find that ancient genomes imputed from >0.1× coverage can be reliably used in principal-component analyses without projection. By searching for long shared allele intervals (LSAIs; similar to identity-by-descent segments) in unphased data for >143,000 present-day Estonians, 99 Finns, and 14 imputed ancient genomes from Estonia, we find unexpectedly high levels of individual connectedness between Estonians and Finns for the last eight centuries in contrast to their clear differentiation by allele frequencies. High levels of sharing of these segments between Estonians and Finns predate the demographic expansion and late settlement process of Finland. One plausible source of this extensive sharing is the 8th-10th centuries AD migration event from North Estonia to Finland that has been proposed to explain uniquely shared linguistic features between the Finnish language and the northern dialect of Estonian and shared Christianity-related loanwords from Slavic. These results suggest that LSAI detection provides a computationally tractable way to detect fine-scale structure in large cohorts.


Alleles , DNA, Ancient/analysis , Genome, Human , Human Migration/history , Pedigree , Estonia , Female , Finland , Gene Frequency , Genealogy and Heraldry , High-Throughput Nucleotide Sequencing , History, 21st Century , History, Ancient , History, Medieval , Humans , Language/history , Male
10.
Mutat Res Rev Mutat Res ; 787: 108364, 2021.
Article En | MEDLINE | ID: mdl-34083043

The purpose of this review is to evaluate the literature on the genotoxicity of cumene (CAS # 98-82-8) and to assess the role of mutagenicity, if any, in the mode of action for cumene-induced rodent tumors. The studies reviewed included microbial mutagenicity, DNA damage/ repair, cytogenetic effects, and gene mutations. In reviewing these studies, attention was paid to their conformance to applicable OECD test guidelines which are considered as internationally recognized standards for performing these assays. Cumene was not a bacterial mutagen and did not induce Hprt mutations in CHO cell cultures. In the primary rat hepatocyte cultures, cumene induced unscheduled DNA synthesis in one study but this response could not be reproduced in an independent study using a similar protocol. In a study that is not fully compliant to the current OECD guideline, no increase in chromosomal aberrations was observed in CHO cells treated with cumene. The weight of the evidence (WoE) from multiple in vivo studies indicates that cumene is not a clastogen or aneugen. The weak positive response in an in vivo comet assay in the rat liver and mouse lung tissues is of questionable significance due to several study deficiencies. The genotoxicity profile of cumene does not match that of a classic DNA-reactive molecule and the available data does not support a conclusion that cumene is an in vivo mutagen. As such, mutagenicity does not appear to be an early key event in cumene-induced rodent tumors and alternate hypothesized non-mutagenic modes-of-action are presented. Further data are necessary to rule in or rule out a particular MoA.


DNA Damage/physiology , Animals , CHO Cells , Comet Assay , Cricetulus , DNA Damage/genetics , Humans , Mutagenesis/genetics , Mutagenesis/physiology , Mutagenicity Tests , Mutation/genetics , Rats
11.
Am J Hum Genet ; 108(1): 68-83, 2021 01 07.
Article En | MEDLINE | ID: mdl-33385324

The proportion of samples with one or more close relatives in a genetic dataset increases rapidly with sample size, necessitating relatedness modeling and enabling pedigree-based analyses. Despite this, relatives are generally unreported and current inference methods typically detect only the degree of relatedness of sample pairs and not pedigree relationships. We developed CREST, an accurate and fast method that identifies the pedigree relationships of close relatives. CREST utilizes identity by descent (IBD) segments shared between a pair of samples and their mutual relatives, leveraging the fact that sharing rates among these individuals differ across pedigree configurations. Furthermore, CREST exploits the profound differences in sex-specific genetic maps to classify pairs as maternally or paternally related-e.g., paternal half-siblings-using the locations of autosomal IBD segments shared between the pair. In simulated data, CREST correctly classifies 91.5%-100% of grandparent-grandchild (GP) pairs, 80.0%-97.5% of avuncular (AV) pairs, and 75.5%-98.5% of half-siblings (HS) pairs compared to PADRE's rates of 38.5%-76.0% of GP, 60.5%-92.0% of AV, 73.0%-95.0% of HS pairs. Turning to the real 20,032 sample Generation Scotland (GS) dataset, CREST identified seven pedigrees with incorrect relationship types or maternal/paternal parent sexes, five of which we confirmed as mistakes, and two with uncertain relationships. After correcting these, CREST correctly determines relationship types for 93.5% of GP, 97.7% of AV, and 92.2% of HS pairs that have sufficient mutual relative data; the parent sex in 100% of HS and 99.6% of GP pairs; and it completes this analysis in 2.8 h including IBD detection in eight threads.


Genome, Human/genetics , Female , Genetic Linkage/genetics , Genotype , Humans , Male , Models, Genetic , Pedigree , Scotland
12.
PLoS Genet ; 16(8): e1008895, 2020 08.
Article En | MEDLINE | ID: mdl-32760067

The sequencing of Neanderthal and Denisovan genomes has yielded many new insights about interbreeding events between extinct hominins and the ancestors of modern humans. While much attention has been paid to the relatively recent gene flow from Neanderthals and Denisovans into modern humans, other instances of introgression leave more subtle genomic evidence and have received less attention. Here, we present a major extension of the ARGweaver algorithm, called ARGweaver-D, which can infer local genetic relationships under a user-defined demographic model that includes population splits and migration events. This Bayesian algorithm probabilistically samples ancestral recombination graphs (ARGs) that specify not only tree topologies and branch lengths along the genome, but also indicate migrant lineages. The sampled ARGs can therefore be parsed to produce probabilities of introgression along the genome. We show that this method is well powered to detect the archaic migration into modern humans, even with only a few samples. We then show that the method can also detect introgressed regions stemming from older migration events, or from unsampled populations. We apply it to human, Neanderthal, and Denisovan genomes, looking for signatures of older proposed migration events, including ancient humans into Neanderthal, and unknown archaic hominins into Denisovans. We identify 3% of the Neanderthal genome that is putatively introgressed from ancient humans, and estimate that the gene flow occurred between 200-300kya. We find no convincing evidence that negative selection acted against these regions. Finally, we predict that 1% of the Denisovan genome was introgressed from an unsequenced, but highly diverged, archaic hominin ancestor. About 15% of these "super-archaic" regions-comprising at least about 4Mb-were, in turn, introgressed into modern humans and continue to exist in the genomes of people alive today.


Gene Flow , Models, Genetic , Neanderthals/genetics , Population/genetics , Recombination, Genetic , Animals , Evolution, Molecular , Human Migration , Humans
13.
Am J Hum Genet ; 106(4): 453-466, 2020 04 02.
Article En | MEDLINE | ID: mdl-32197076

Identity-by-descent (IBD) segments are a useful tool for applications ranging from demographic inference to relationship classification, but most detection methods rely on phasing information and therefore require substantial computation time. As genetic datasets grow, methods for inferring IBD segments that scale well will be critical. We developed IBIS, an IBD detector that locates long regions of allele sharing between unphased individuals, and benchmarked it with Refined IBD, GERMLINE, and TRUFFLE on 3,000 simulated individuals. Phasing these with Beagle 5 takes 4.3 CPU days, followed by either Refined IBD or GERMLINE segment detection in 2.9 or 1.1 h, respectively. By comparison, IBIS finishes in 6.8 min or 7.8 min with IBD2 functionality enabled: speedups of 805-946× including phasing time. TRUFFLE takes 2.6 h, corresponding to IBIS speedups of 20.2-23.3×. IBIS is also accurate, inferring ≥7 cM IBD segments at quality comparable to Refined IBD and GERMLINE. With these segments, IBIS classifies first through third degree relatives in real Mexican American samples at rates meeting or exceeding other methods tested and identifies fourth through sixth degree pairs at rates within 0.0%-2.0% of the top method. While allele frequency-based approaches that do not detect segments can infer relationship degrees faster than IBIS, the fastest are biased in admixed samples, with KING inferring 30.8% fewer fifth degree Mexican American relatives correctly compared with IBIS. Finally, we ran IBIS on chromosome 2 of the UK Biobank dataset and estimate its runtime on the autosomes to be 3.3 days parallelized across 128 cores.


Sequence Analysis/methods , Alleles , Chromosomes, Human, Pair 2/genetics , Gene Frequency/genetics , Genome, Human/genetics , Humans , Models, Genetic , Polymorphism, Single Nucleotide/genetics
14.
PLoS Genet ; 15(12): e1007979, 2019 12.
Article En | MEDLINE | ID: mdl-31860654

Simulations of close relatives and identical by descent (IBD) segments are common in genetic studies, yet most past efforts have utilized sex averaged genetic maps and ignored crossover interference, thus omitting features known to affect the breakpoints of IBD segments. We developed Ped-sim, a method for simulating relatives that can utilize either sex-specific or sex averaged genetic maps and also either a model of crossover interference or the traditional Poisson model for inter-crossover distances. To characterize the impact of previously ignored mechanisms, we simulated data for all four combinations of these factors. We found that modeling crossover interference decreases the standard deviation of pairwise IBD proportions by 10.4% on average in full siblings through second cousins. By contrast, sex-specific maps increase this standard deviation by 4.2% on average, and also impact the number of segments relatives share. Most notably, using sex-specific maps, the number of segments half-siblings share is bimodal; and when combined with interference modeling, the probability that sixth cousins have non-zero IBD sharing ranges from 9.0 to 13.1%, depending on the sexes of the individuals through which they are related. We present new analytical results for the distributions of IBD segments under these models and show they match results from simulations. Finally, we compared IBD sharing rates between simulated and real relatives and find that the combination of sex-specific maps and interference modeling most accurately captures IBD rates in real data. Ped-sim is open source and available from https://github.com/williamslab/ped-sim.


Chromosome Mapping/methods , Computer Simulation , Sex Characteristics , Female , Genetic Variation , Genetics, Population , Genome, Human , Humans , Male , Models, Genetic , Pedigree , Poisson Distribution
15.
Food Chem Toxicol ; 131: 110554, 2019 Sep.
Article En | MEDLINE | ID: mdl-31207305

The results of a large 2-year bisphenol A (BPA) rat study conducted by the NTP, called the CLARITY-BPA Core Study, were recently released. This study addressed some of the toxicological issues associated with BPA, including endocrine disruption and non-monotonic dose responses (NMDR). The study involved oral gavage treatment of rats to BPA at doses of 2.5-25,000 µg/kg-bw/day. To address NMDR, the 81 statistically significant findings (based on the primary statistical tests) from the Core Study were evaluated using a recently published methodology that relies upon six checkpoints to determine if there is evidence for a NMDR. Failure to meet the majority of the checkpoints indicates limited evidence of NMDR. The analysis found that only 2 of the 81 findings met at least 5 of the checkpoints: an increase in percent basophils in stop-dose females and decreased total bile acids in stop-dose males. However, these findings are not concordant or consistent with those of other BPA data. Importantly, none of the endocrine-related or reproductive endpoints fulfilled at least 5 of the checkpoints. This analysis found limited evidence for NMDR associated with BPA treatment in the study. These results are consistent with the conclusions reached in the Core Study report.


Benzhydryl Compounds/toxicity , Endocrine Disruptors/toxicity , Phenols/toxicity , Animals , Basophils/metabolism , Bile Acids and Salts/metabolism , Dose-Response Relationship, Drug , Female , Male , Maternal Exposure/adverse effects , Pregnancy , Prenatal Exposure Delayed Effects , Rats, Sprague-Dawley
17.
BMC Bioinformatics ; 19(1): 478, 2018 Dec 12.
Article En | MEDLINE | ID: mdl-30541436

BACKGROUND: Researchers typically sequence a given individual multiple times, either re-sequencing the same DNA sample (technical replication) or sequencing different DNA samples collected on the same individual (biological replication) or both. Before merging the data from these replicate sequence runs, it is important to verify that no errors, such as DNA contamination or mix-ups, occurred during the data collection pipeline. Methods to detect such errors exist but are often ad hoc, cannot handle missing data and several require phased data. Because they require some combination of genotype calling, imputation, and haplotype phasing, these methods are unsuitable for error detection in low- to moderate-depth sequence data where such tasks are difficult to perform accurately. Additionally, because most existing methods employ a pairwise-comparison approach for error detection rather than joint analysis of the putative replicates, results may be difficult to interpret. RESULTS: We introduce a new method for error detection suitable for shallow-, moderate-, and high-depth sequence data. Using Bayes Theorem, we calculate the posterior probability distribution over the set of relations describing the putative replicates and infer which of the samples originated from an identical genotypic source. CONCLUSIONS: Our method addresses key limitations of existing approaches and produced highly accurate results in simulation experiments. Our method is implemented as an R package called BIGRED (Bayes Inferred Genotype Replicate Error Detector), which is freely available for download: https://github.com/ac2278/BIGRED .


Databases, Nucleic Acid/standards , Sequence Analysis, DNA/methods , Humans
18.
Am J Hum Genet ; 103(1): 30-44, 2018 07 05.
Article En | MEDLINE | ID: mdl-29937093

As genetic datasets increase in size, the fraction of samples with one or more close relatives grows rapidly, resulting in sets of mutually related individuals. We present DRUID-deep relatedness utilizing identity by descent-a method that works by inferring the identical-by-descent (IBD) sharing profile of an ungenotyped ancestor of a set of close relatives. Using this IBD profile, DRUID infers relatedness between unobserved ancestors and more distant relatives, thereby combining information from multiple samples to remove one or more generations between the deep relationships to be identified. DRUID constructs sets of close relatives by detecting full siblings and also uses an approach to identify the aunts/uncles of two or more siblings, recovering 92.2% of real aunts/uncles with zero false positives. In real and simulated data, DRUID correctly infers up to 10.5% more relatives than PADRE when using data from two sets of distantly related siblings, and 10.7%-31.3% more relatives given two sets of siblings and their aunts/uncles. DRUID frequently infers relationships either correctly or within one degree of the truth, with PADRE classifying 43.3%-58.3% of tenth degree relatives in this way compared to 79.6%-96.7% using DRUID.


Genome, Human/genetics , Polymorphism, Single Nucleotide/genetics , Female , Genetics, Population/methods , Humans , Male , Pedigree , Siblings
19.
Proc Natl Acad Sci U S A ; 115(2): 379-384, 2018 01 09.
Article En | MEDLINE | ID: mdl-29279374

A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association. Using gene expression data on 21,677 transcripts for 643 pedigree members, we identified evidence for large-effect rare-variant cis-expression quantitative trait loci that could not be detected in population studies, validating our approach. However, we did not identify any rare variants of large effect associated with T2D, or the related traits of fasting glucose and insulin, suggesting that large-effect rare variants account for only a modest fraction of the genetic risk of these traits in this sample of families. Reliable identification of large-effect rare variants will require larger samples of extended pedigrees or different study designs that further enrich for such variants.


Diabetes Mellitus, Type 2/genetics , Genetic Predisposition to Disease/genetics , Genetic Variation , Mexican Americans/genetics , Diabetes Mellitus, Type 2/ethnology , Diabetes Mellitus, Type 2/pathology , Family Health , Female , Gene Frequency , Genetic Predisposition to Disease/ethnology , Genome-Wide Association Study/methods , Genotype , Humans , Male , Pedigree , Phenotype , Quantitative Trait Loci/genetics , Whole Genome Sequencing/methods
20.
Diabetes ; 66(11): 2903-2914, 2017 11.
Article En | MEDLINE | ID: mdl-28838971

Type 2 diabetes (T2D) affects more than 415 million people worldwide, and its costs to the health care system continue to rise. To identify common or rare genetic variation with potential therapeutic implications for T2D, we analyzed and replicated genome-wide protein coding variation in a total of 8,227 individuals with T2D and 12,966 individuals without T2D of Latino descent. We identified a novel genetic variant in the IGF2 gene associated with ∼20% reduced risk for T2D. This variant, which has an allele frequency of 17% in the Mexican population but is rare in Europe, prevents splicing between IGF2 exons 1 and 2. We show in vitro and in human liver and adipose tissue that the variant is associated with a specific, allele-dosage-dependent reduction in the expression of IGF2 isoform 2. In individuals who do not carry the protective allele, expression of IGF2 isoform 2 in adipose is positively correlated with both incidence of T2D and increased plasma glycated hemoglobin in individuals without T2D, providing support that the protective effects are mediated by reductions in IGF2 isoform 2. Broad phenotypic examination of carriers of the protective variant revealed no association with other disease states or impaired reproductive health. These findings suggest that reducing IGF2 isoform 2 expression in relevant tissues has potential as a new therapeutic strategy for T2D, even beyond the Latin American population, with no major adverse effects on health or reproduction.


Diabetes Mellitus, Type 2/genetics , Insulin-Like Growth Factor II/metabolism , RNA Splice Sites/genetics , Adipose Tissue , Cell Line , Gene Expression Regulation/physiology , Genetic Variation , Genotype , Humans , Insulin-Like Growth Factor II/genetics , Liver , Mexican Americans/genetics , Mexico , Protein Isoforms , Stem Cells , White People
...