Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 18.779
Filter
Add more filters

Publication year range
1.
Cell ; 182(2): 317-328.e10, 2020 07 23.
Article in English | MEDLINE | ID: mdl-32526205

ABSTRACT

Hepatocellular carcinoma (HCC) is an aggressive malignancy with its global incidence and mortality rate continuing to rise, although early detection and surveillance are suboptimal. We performed serological profiling of the viral infection history in 899 individuals from an NCI-UMD case-control study using a synthetic human virome, VirScan. We developed a viral exposure signature and validated the results in a longitudinal cohort with 173 at-risk patients who had long-term follow-up for HCC development. Our viral exposure signature significantly associated with HCC status among at-risk individuals in the validation cohort (area under the curve: 0.91 [95% CI 0.87-0.96] at baseline and 0.98 [95% CI 0.97-1] at diagnosis). The signature identified cancer patients prior to a clinical diagnosis and was superior to alpha-fetoprotein. In summary, we established a viral exposure signature that can predict HCC among at-risk patients prior to a clinical diagnosis, which may be useful in HCC surveillance.


Subject(s)
Carcinoma, Hepatocellular/pathology , Liver Neoplasms/pathology , Virus Diseases/pathology , Adult , Aged , Area Under Curve , Carcinoma, Hepatocellular/genetics , Carcinoma, Hepatocellular/metabolism , Case-Control Studies , Cohort Studies , Databases, Genetic , Female , Genome-Wide Association Study , Humans , Linkage Disequilibrium , Liver Neoplasms/genetics , Liver Neoplasms/metabolism , Male , Middle Aged , Polymorphism, Single Nucleotide , ROC Curve , Risk Factors , Virus Diseases/complications , Young Adult , alpha-Fetoproteins/analysis
2.
Cell ; 175(6): 1679-1687.e7, 2018 11 29.
Article in English | MEDLINE | ID: mdl-30343897

ABSTRACT

Multiple sclerosis is a complex neurological disease, with ∼20% of risk heritability attributable to common genetic variants, including >230 identified by genome-wide association studies. Multiple strands of evidence suggest that much of the remaining heritability is also due to additive effects of common variants rather than epistasis between these variants or mutations exclusive to individual families. Here, we show in 68,379 cases and controls that up to 5% of this heritability is explained by low-frequency variation in gene coding sequence. We identify four novel genes driving MS risk independently of common-variant signals, highlighting key pathogenic roles for regulatory T cell homeostasis and regulation, IFNγ biology, and NFκB signaling. As low-frequency variants do not show substantial linkage disequilibrium with other variants, and as coding variants are more interpretable and experimentally tractable than non-coding variation, our discoveries constitute a rich resource for dissecting the pathobiology of MS.


Subject(s)
Epistasis, Genetic , Genetic Predisposition to Disease , Linkage Disequilibrium , Multiple Sclerosis/genetics , Mutation , Open Reading Frames , Female , Genome-Wide Association Study , Humans , Male , Multiple Sclerosis/immunology , Risk Factors
3.
Cell ; 175(3): 848-858.e6, 2018 10 18.
Article in English | MEDLINE | ID: mdl-30318150

ABSTRACT

In familial searching in forensic genetics, a query DNA profile is tested against a database to determine whether it represents a relative of a database entrant. We examine the potential for using linkage disequilibrium to identify pairs of profiles as belonging to relatives when the query and database rely on nonoverlapping genetic markers. Considering data on individuals genotyped with both microsatellites used in forensic applications and genome-wide SNPs, we find that ∼30%-32% of parent-offspring pairs and ∼35%-36% of sib pairs can be identified from the SNPs of one member of the pair and the microsatellites of the other. The method suggests the possibility of performing familial searches of microsatellite databases using query SNP profiles, or vice versa. It also reveals that privacy concerns arising from computations across multiple databases that share no genetic markers in common entail risks, not only for database entrants, but for their close relatives as well.


Subject(s)
Family , Forensic Genetics/methods , Genetics, Population/methods , Genotyping Techniques/methods , Polymorphism, Single Nucleotide , Female , Humans , Linkage Disequilibrium , Male , Microsatellite Repeats , Models, Genetic , Models, Statistical , Pedigree
4.
Nat Immunol ; 20(7): 824-834, 2019 07.
Article in English | MEDLINE | ID: mdl-31209403

ABSTRACT

Multiple genome-wide studies have identified associations between outcome of human immunodeficiency virus (HIV) infection and polymorphisms in and around the gene encoding the HIV co-receptor CCR5, but the functional basis for the strongest of these associations, rs1015164A/G, is unknown. We found that rs1015164 marks variation in an activating transcription factor 1 binding site that controls expression of the antisense long noncoding RNA (lncRNA) CCR5AS. Knockdown or enhancement of CCR5AS expression resulted in a corresponding change in CCR5 expression on CD4+ T cells. CCR5AS interfered with interactions between the RNA-binding protein Raly and the CCR5 3' untranslated region, protecting CCR5 messenger RNA from Raly-mediated degradation. Reduction in CCR5 expression through inhibition of CCR5AS diminished infection of CD4+ T cells with CCR5-tropic HIV in vitro. These data represent a rare determination of the functional importance of a genome-wide disease association where expression of a lncRNA affects HIV infection and disease progression.


Subject(s)
Gene Expression Regulation , Genetic Variation , HIV Infections/genetics , HIV Infections/virology , HIV-1 , RNA, Antisense/genetics , RNA, Long Noncoding/genetics , Receptors, CCR5/genetics , 3' Untranslated Regions , Alleles , Biomarkers , CD4-Positive T-Lymphocytes/immunology , CD4-Positive T-Lymphocytes/metabolism , CD4-Positive T-Lymphocytes/virology , Cell Membrane/metabolism , Genes, Reporter , Genotype , HIV Infections/metabolism , Humans , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Population Groups/genetics , Prognosis , RNA Stability , RNA, Messenger/genetics , RNA, Messenger/metabolism , Receptors, CCR5/metabolism , Viral Load
5.
Nature ; 617(7962): 755-763, 2023 05.
Article in English | MEDLINE | ID: mdl-37198480

ABSTRACT

Despite broad agreement that Homo sapiens originated in Africa, considerable uncertainty surrounds specific models of divergence and migration across the continent1. Progress is hampered by a shortage of fossil and genomic data, as well as variability in previous estimates of divergence times1. Here we seek to discriminate among such models by considering linkage disequilibrium and diversity-based statistics, optimized for rapid, complex demographic inference2. We infer detailed demographic models for populations across Africa, including eastern and western representatives, and newly sequenced whole genomes from 44 Nama (Khoe-San) individuals from southern Africa. We infer a reticulated African population history in which present-day population structure dates back to Marine Isotope Stage 5. The earliest population divergence among contemporary populations occurred 120,000 to 135,000 years ago and was preceded by links between two or more weakly differentiated ancestral Homo populations connected by gene flow over hundreds of thousands of years. Such weakly structured stem models explain patterns of polymorphism that had previously been attributed to contributions from archaic hominins in Africa2-7. In contrast to models with archaic introgression, we predict that fossil remains from coexisting ancestral populations should be genetically and morphologically similar, and that only an inferred 1-4% of genetic differentiation among contemporary human populations can be attributed to genetic drift between stem populations. We show that model misspecification explains the variation in previous estimates of divergence times, and argue that studying a range of models is key to making robust inferences about deep history.


Subject(s)
Genetics, Population , Human Migration , Phylogeny , Humans , Africa/ethnology , Fossils , Gene Flow , Genetic Drift , Genetic Introgression , Genome, Human , History, Ancient , Human Migration/history , Linkage Disequilibrium/genetics , Polymorphism, Genetic , Time Factors
6.
Nature ; 606(7914): 527-534, 2022 06.
Article in English | MEDLINE | ID: mdl-35676474

ABSTRACT

Missing heritability in genome-wide association studies defines a major problem in genetic analyses of complex biological traits1,2. The solution to this problem is to identify all causal genetic variants and to measure their individual contributions3,4. Here we report a graph pangenome of tomato constructed by precisely cataloguing more than 19 million variants from 838 genomes, including 32 new reference-level genome assemblies. This graph pangenome was used for genome-wide association study analyses and heritability estimation of 20,323 gene-expression and metabolite traits. The average estimated trait heritability is 0.41 compared with 0.33 when using the single linear reference genome. This 24% increase in estimated heritability is largely due to resolving incomplete linkage disequilibrium through the inclusion of additional causal structural variants identified using the graph pangenome. Moreover, by resolving allelic and locus heterogeneity, structural variants improve the power to identify genetic factors underlying agronomically important traits leading to, for example, the identification of two new genes potentially contributing to soluble solid content. The newly identified structural variants will facilitate genetic improvement of tomato through both marker-assisted selection and genomic selection. Our study advances the understanding of the heritability of complex traits and demonstrates the power of the graph pangenome in crop breeding.


Subject(s)
Genetic Variation , Genome, Plant , Genome-Wide Association Study , Plant Breeding , Solanum lycopersicum , Alleles , Crops, Agricultural/genetics , Genome, Plant/genetics , Linkage Disequilibrium , Solanum lycopersicum/genetics , Solanum lycopersicum/metabolism
7.
Nature ; 610(7933): 704-712, 2022 10.
Article in English | MEDLINE | ID: mdl-36224396

ABSTRACT

Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes1. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel2) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10-20% (14-24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.


Subject(s)
Body Height , Chromosome Mapping , Polymorphism, Single Nucleotide , Humans , Body Height/genetics , Gene Frequency/genetics , Genome, Human/genetics , Genome-Wide Association Study , Haplotypes/genetics , Linkage Disequilibrium/genetics , Polymorphism, Single Nucleotide/genetics , Europe/ethnology , Sample Size , Phenotype
8.
Nature ; 602(7895): 106-111, 2022 02.
Article in English | MEDLINE | ID: mdl-34883497

ABSTRACT

Host genetic factors can confer resistance against malaria1, raising the question of whether this has led to evolutionary adaptation of parasite populations. Here we searched for association between candidate host and parasite genetic variants in 3,346 Gambian and Kenyan children with severe malaria caused by Plasmodium falciparum. We identified a strong association between sickle haemoglobin (HbS) in the host and three regions of the parasite genome, which is not explained by population structure or other covariates, and which is replicated in additional samples. The HbS-associated alleles include nonsynonymous variants in the gene for the acyl-CoA synthetase family member2-4 PfACS8 on chromosome 2, in a second region of chromosome 2, and in a region containing structural variation on chromosome 11. The alleles are in strong linkage disequilibrium and have frequencies that covary with the frequency of HbS across populations, in particular being much more common in Africa than other parts of the world. The estimated protective effect of HbS against severe malaria, as determined by comparison of cases with population controls, varies greatly according to the parasite genotype at these three loci. These findings open up a new avenue of enquiry into the biological and epidemiological significance of the HbS-associated polymorphisms in the parasite genome and the evolutionary forces that have led to their high frequency and strong linkage disequilibrium in African P. falciparum populations.


Subject(s)
Genotype , Hemoglobin, Sickle/genetics , Host Adaptation/genetics , Malaria, Falciparum/blood , Malaria, Falciparum/parasitology , Parasites/genetics , Plasmodium falciparum/genetics , Alleles , Animals , Child , Female , Gambia/epidemiology , Genes, Protozoan/genetics , Humans , Kenya/epidemiology , Linkage Disequilibrium , Malaria, Falciparum/epidemiology , Male , Polymorphism, Genetic
9.
Am J Hum Genet ; 111(5): 966-978, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38701746

ABSTRACT

Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.


Subject(s)
Asthma , Genome-Wide Association Study , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods , Humans , Asthma/genetics , Markov Chains , Colitis, Ulcerative/genetics , Reproducibility of Results , Phenotype , Genotype
10.
Am J Hum Genet ; 111(5): 990-995, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38636510

ABSTRACT

Since genotype imputation was introduced, researchers have been relying on the estimated imputation quality from imputation software to perform post-imputation quality control (QC). However, this quality estimate (denoted as Rsq) performs less well for lower-frequency variants. We recently published MagicalRsq, a machine-learning-based imputation quality calibration, which leverages additional typed markers from the same cohort and outperforms Rsq as a QC metric. In this work, we extended the original MagicalRsq to allow cross-cohort model training and named the new model MagicalRsq-X. We removed the cohort-specific estimated minor allele frequency and included linkage disequilibrium scores and recombination rates as additional features. Leveraging whole-genome sequencing data from TOPMed, specifically participants in the BioMe, JHS, WHI, and MESA studies, we performed comprehensive cross-cohort evaluations for predominantly European and African ancestral individuals based on their inferred global ancestry with the 1000 Genomes and Human Genome Diversity Project data as reference. Our results suggest MagicalRsq-X outperforms Rsq in almost every setting, with 7.3%-14.4% improvement in squared Pearson correlation with true R2, corresponding to 85-218 K variant gains. We further developed a metric to quantify the genetic distances of a target cohort relative to a reference cohort and showed that such metric largely explained the performance of MagicalRsq-X models. Finally, we found MagicalRsq-X saved up to 53 known genome-wide significant variants in one of the largest blood cell trait GWASs that would be missed using the original Rsq for QC. In conclusion, MagicalRsq-X shows superiority for post-imputation QC and benefits genetic studies by distinguishing well and poorly imputed lower-frequency variants.


Subject(s)
Gene Frequency , Genotype , Polymorphism, Single Nucleotide , Software , Humans , Cohort Studies , Linkage Disequilibrium , Genome-Wide Association Study/methods , Genome, Human , Quality Control , Machine Learning , Whole Genome Sequencing/standards , Whole Genome Sequencing/methods
11.
Genome Res ; 34(2): 300-309, 2024 Mar 20.
Article in English | MEDLINE | ID: mdl-38355307

ABSTRACT

Expression and splicing quantitative trait loci (e/sQTL) are large contributors to phenotypic variability. Achieving sufficient statistical power for e/sQTL mapping requires large cohorts with both genotypes and molecular phenotypes, and so, the genomic variation is often called from short-read alignments, which are unable to comprehensively resolve structural variation. Here we build a pangenome from 16 HiFi haplotype-resolved cattle assemblies to identify small and structural variation and genotype them with PanGenie in 307 short-read samples. We find high (>90%) concordance of PanGenie-genotyped and DeepVariant-called small variation and confidently genotype close to 21 million small and 43,000 structural variants in the larger population. We validate 85% of these structural variants (with MAF > 0.1) directly with a subset of 25 short-read samples that also have medium coverage HiFi reads. We then conduct e/sQTL mapping with this comprehensive variant set in a subset of 117 cattle that have testis transcriptome data, and find 92 structural variants as causal candidates for eQTL and 73 for sQTL. We find that roughly half of the top associated structural variants affecting expression or splicing are transposable elements, such as SV-eQTL for STN1 and MYH7 and SV-sQTL for CEP89 and ASAH2 Extensive linkage disequilibrium between small and structural variation results in only 28 additional eQTL and 17 sQTL discovered when including SVs, although many top associated SVs are compelling candidates.


Subject(s)
Quantitative Trait Loci , RNA Splicing , Male , Cattle/genetics , Animals , Genotype , Phenotype , Linkage Disequilibrium , Genomic Structural Variation
12.
Genome Res ; 34(1): 70-84, 2024 Feb 07.
Article in English | MEDLINE | ID: mdl-38071472

ABSTRACT

Meiotic recombination is crucial for human genetic diversity and chromosome segregation accuracy. Understanding its variation across individuals and the processes by which it goes awry are long-standing goals in human genetics. Current approaches for inferring recombination landscapes rely either on population genetic patterns of linkage disequilibrium (LD)-capturing a time-averaged view-or on direct detection of crossovers in gametes or multigeneration pedigrees, which limits data set scale and availability. Here, we introduce an approach for inferring sex-specific recombination landscapes using data from preimplantation genetic testing for aneuploidy (PGT-A). This method relies on low-coverage (<0.05×) whole-genome sequencing of in vitro fertilized (IVF) embryo biopsies. To overcome the data sparsity, our method exploits its inherent relatedness structure, knowledge of haplotypes from external population reference panels, and the frequent occurrence of monosomies in embryos, whereby the remaining chromosome is phased by default. Extensive simulations show our method's high accuracy, even at coverages as low as 0.02×. Applying this method to PGT-A data from 18,967 embryos, we mapped 70,660 recombination events with ∼150 kbp resolution, replicating established sex-specific recombination patterns. We observed a reduced total length of the female genetic map in trisomies compared with disomies, as well as chromosome-specific alterations in crossover distributions. Based on haplotype configurations in pericentromeric regions, our data indicate chromosome-specific propensities for different mechanisms of meiotic error. Our results provide a comprehensive view of the role of aberrant meiotic recombination in the origins of human aneuploidies and offer a versatile tool for mapping crossovers in low-coverage sequencing data from multiple siblings.


Subject(s)
Aneuploidy , Genetic Testing , Male , Humans , Female , Genetic Testing/methods , Chromosome Aberrations , Linkage Disequilibrium , Pedigree
13.
Nat Immunol ; 21(2): 112-114, 2020 02.
Article in English | MEDLINE | ID: mdl-31959978
14.
Nature ; 600(7890): 675-679, 2021 12.
Article in English | MEDLINE | ID: mdl-34887591

ABSTRACT

Increased blood lipid levels are heritable risk factors of cardiovascular disease with varied prevalence worldwide owing to different dietary patterns and medication use1. Despite advances in prevention and treatment, in particular through reducing low-density lipoprotein cholesterol levels2, heart disease remains the leading cause of death worldwide3. Genome-wideassociation studies (GWAS) of blood lipid levels have led to important biological and clinical insights, as well as new drug targets, for cardiovascular disease. However, most previous GWAS4-23 have been conducted in European ancestry populations and may have missed genetic variants that contribute to lipid-level variation in other ancestry groups. These include differences in allele frequencies, effect sizes and linkage-disequilibrium patterns24. Here we conduct a multi-ancestry, genome-wide genetic discovery meta-analysis of lipid levels in approximately 1.65 million individuals, including 350,000 of non-European ancestries. We quantify the gain in studying non-European ancestries and provide evidence to support the expansion of recruitment of additional ancestries, even with relatively small sample sizes. We find that increasing diversity rather than studying additional individuals of European ancestry results in substantial improvements in fine-mapping functional variants and portability of polygenic prediction (evaluated in approximately 295,000 individuals from 7 ancestry groupings). Modest gains in the number of discovered loci and ancestry-specific variants were also achieved. As GWAS expand emphasis beyond the identification of genes and fundamental biology towards the use of genetic variants for preventive and precision medicine25, we anticipate that increased diversity of participants will lead to more accurate and equitable26 application of polygenic scores in clinical practice.


Subject(s)
Cardiovascular Diseases , Genome-Wide Association Study , Cardiovascular Diseases/genetics , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study/methods , Humans , Linkage Disequilibrium , Multifactorial Inheritance , Polymorphism, Single Nucleotide/genetics , Population Groups
15.
PLoS Genet ; 20(1): e1010929, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38271473

ABSTRACT

Genome-wide association studies (GWASs) have achieved remarkable success in associating thousands of genetic variants with complex traits. However, the presence of linkage disequilibrium (LD) makes it challenging to identify the causal variants. To address this critical gap from association to causation, many fine-mapping methods have been proposed to assign well-calibrated probabilities of causality to candidate variants, taking into account the underlying LD pattern. In this manuscript, we introduce a statistical framework that incorporates expression quantitative trait locus (eQTL) information to fine-mapping, built on the sum of single-effects (SuSiE) regression model. Our new method, SuSiE2, connects two SuSiE models, one for eQTL analysis and one for genetic fine-mapping. This is achieved by first computing the posterior inclusion probabilities (PIPs) from an eQTL-based SuSiE model with the expression level of the candidate gene as the phenotype. These calculated PIPs are then utilized as prior inclusion probabilities for risk variants in another SuSiE model for the trait of interest. By prioritizing functional variants within the candidate region using eQTL information, SuSiE2 improves SuSiE by increasing the detection rate of causal SNPs and reducing the average size of credible sets. We compared the performance of SuSiE2 with other multi-trait fine-mapping methods with respect to power, coverage, and precision through simulations and applications to the GWAS results of Alzheimer's disease (AD) and body mass index (BMI). Our results demonstrate the better performance of SuSiE2, both when the in-sample linkage disequilibrium (LD) matrix and an external reference panel is used in inference.


Subject(s)
Genome-Wide Association Study , Quantitative Trait Loci , Quantitative Trait Loci/genetics , Genome-Wide Association Study/methods , Chromosome Mapping/methods , Linkage Disequilibrium , Phenotype , Polymorphism, Single Nucleotide
16.
PLoS Genet ; 20(4): e1011212, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38630784

ABSTRACT

Population differences in risk of disease are common, but the potential genetic basis for these differences is not well understood. A standard approach is to compare genetic risk across populations by testing for mean differences in polygenic scores, but existing studies that use this approach do not account for statistical noise in effect estimates (i.e., the GWAS betas) that arise due to the finite sample size of GWAS training data. Here, we show using Bayesian polygenic score methods that the level of uncertainty in estimates of genetic risk differences across populations is highly dependent on the GWAS training sample size, the polygenicity (number of causal variants), and genetic distance (FST) between the populations considered. We derive a Wald test for formally assessing the difference in genetic risk across populations, which we show to have calibrated type 1 error rates under a simplified assumption that all SNPs are independent, which we achieve in practise using linkage disequilibrium (LD) pruning. We further provide closed-form expressions for assessing the uncertainty in estimates of relative genetic risk across populations under the special case of an infinitesimal genetic architecture. We suggest that for many complex traits and diseases, particularly those with more polygenic architectures, current GWAS sample sizes are insufficient to detect moderate differences in genetic risk across populations, though more substantial differences in relative genetic risk (relative risk > 1.5) can be detected. We show that conventional approaches that do not account for sampling error from the training sample, such as using a simple t-test, have very high type 1 error rates. When applying our approach to prostate cancer, we demonstrate a higher genetic risk in African Ancestry men, with lower risk in men of European followed by East Asian ancestry.


Subject(s)
Multifactorial Inheritance , Prostatic Neoplasms , Male , Humans , Bayes Theorem , Risk Factors , Linkage Disequilibrium , Genome-Wide Association Study , Genetic Predisposition to Disease , Polymorphism, Single Nucleotide
17.
Am J Hum Genet ; 110(4): 575-591, 2023 04 06.
Article in English | MEDLINE | ID: mdl-37028392

ABSTRACT

Leveraging linkage disequilibrium (LD) patterns as representative of population substructure enables the discovery of additive association signals in genome-wide association studies (GWASs). Standard GWASs are well-powered to interrogate additive models; however, new approaches are required for invesigating other modes of inheritance such as dominance and epistasis. Epistasis, or non-additive interaction between genes, exists across the genome but often goes undetected because of a lack of statistical power. Furthermore, the adoption of LD pruning as customary in standard GWASs excludes detection of sites that are in LD but might underlie the genetic architecture of complex traits. We hypothesize that uncovering long-range interactions between loci with strong LD due to epistatic selection can elucidate genetic mechanisms underlying common diseases. To investigate this hypothesis, we tested for associations between 23 common diseases and 5,625,845 epistatic SNP-SNP pairs (determined by Ohta's D statistics) in long-range LD (>0.25 cM). Across five disease phenotypes, we identified one significant and four near-significant associations that replicated in two large genotype-phenotype datasets (UK Biobank and eMERGE). The genes that were most likely involved in the replicated associations were (1) members of highly conserved gene families with complex roles in multiple pathways, (2) essential genes, and/or (3) genes that were associated in the literature with complex traits that display variable expressivity. These results support the highly pleiotropic and conserved nature of variants in long-range LD under epistatic selection. Our work supports the hypothesis that epistatic interactions regulate diverse clinical mechanisms and might especially be driving factors in conditions with a wide range of phenotypic outcomes.


Subject(s)
Epistasis, Genetic , Genome-Wide Association Study , Linkage Disequilibrium/genetics , Genotype , Biological Specimen Banks , United Kingdom , Polymorphism, Single Nucleotide/genetics
18.
Am J Hum Genet ; 110(1): 30-43, 2023 01 05.
Article in English | MEDLINE | ID: mdl-36608683

ABSTRACT

Gene-based association tests aggregate multiple SNP-trait associations into sets defined by gene boundaries and are widely used in post-GWAS analysis. A common approach for gene-based tests is to combine SNPs associations by computing the sum of χ2 statistics. However, this strategy ignores the directions of SNP effects, which could result in a loss of power for SNPs with masking effects, e.g., when the product of two SNP effects and the linkage disequilibrium (LD) correlation is negative. Here, we introduce "mBAT-combo," a set-based test that is better powered than other methods to detect multi-SNP associations in the context of masking effects. We validate the method through simulations and applications to real data. We find that of 35 blood and urine biomarker traits in the UK Biobank, 34 traits show evidence for masking effects in a total of 4,273 gene-trait pairs, indicating that masking effects is common in complex traits. We further validate the improved power of our method in height, body mass index, and schizophrenia with different GWAS sample sizes and show that on average 95.7% of the genes detected only by mBAT-combo with smaller sample sizes can be identified by the single-SNP approach with a 1.7-fold increase in sample sizes. Eleven genes significant only in mBAT-combo for schizophrenia are confirmed by functionally informed fine-mapping or Mendelian randomization integrating gene expression data. The framework of mBAT-combo can be applied to any set of SNPs to refine trait-association signals hidden in genomic regions with complex LD structures.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Humans , Genome-Wide Association Study/methods , Phenotype , Linkage Disequilibrium , Genomics , Polymorphism, Single Nucleotide/genetics
19.
Genome Res ; 33(7): 1015-1022, 2023 07.
Article in English | MEDLINE | ID: mdl-37349109

ABSTRACT

Although rates of recombination events across the genome (genetic maps) are fundamental to genetic research, the majority of current studies only use one standard map. There is evidence suggesting population differences in genetic maps, and thus estimating population-specific maps, are of interest. Although the recent availability of biobank-scale data offers such opportunities, current methods are not efficient at leveraging very large sample sizes. The most accurate methods are still linkage disequilibrium (LD)-based methods that are only tractable for a few hundred samples. In this work, we propose a fast and memory-efficient method for estimating genetic maps from population genotyping data. Our method, FastRecomb, leverages the efficient positional Burrows-Wheeler transform (PBWT) data structure for counting IBD segment boundaries as potential recombination events. We used PBWT blocks to avoid redundant counting of pairwise matches. Moreover, we used a panel-smoothing technique to reduce the noise from errors and recent mutations. Using simulation, we found that FastRecomb achieves state-of-the-art performance at 10-kb resolution, in terms of correlation coefficients between the estimated map and the ground truth. This is mainly because FastRecomb can effectively take advantage of large panels comprising more than hundreds of thousands of haplotypes. At the same time, other methods lack the efficiency to handle such data. We believe further refinement of FastRecomb would deliver more accurate genetic maps for the genetics community.


Subject(s)
Biological Specimen Banks , Genome , Haplotypes , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Recombination, Genetic
20.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38888457

ABSTRACT

Large sample datasets have been regarded as the primary basis for innovative discoveries and the solution to missing heritability in genome-wide association studies. However, their computational complexity cannot consider all comprehensive effects and all polygenic backgrounds, which reduces the effectiveness of large datasets. To address these challenges, we included all effects and polygenic backgrounds in a mixed logistic model for binary traits and compressed four variance components into two. The compressed model combined three computational algorithms to develop an innovative method, called FastBiCmrMLM, for large data analysis. These algorithms were tailored to sample size, computational speed, and reduced memory requirements. To mine additional genes, linkage disequilibrium markers were replaced by bin-based haplotypes, which are analyzed by FastBiCmrMLM, named FastBiCmrMLM-Hap. Simulation studies highlighted the superiority of FastBiCmrMLM over GMMAT, SAIGE and fastGWA-GLMM in identifying dominant, small α (allele substitution effect), and rare variants. In the UK Biobank-scale dataset, we demonstrated that FastBiCmrMLM could detect variants as small as 0.03% and with α ≈ 0. In re-analyses of seven diseases in the WTCCC datasets, 29 candidate genes, with both functional and TWAS evidence, around 36 variants identified only by the new methods, strongly validated the new methods. These methods offer a new way to decipher the genetic architecture of binary traits and address the challenges outlined above.


Subject(s)
Algorithms , Genome-Wide Association Study , Genome-Wide Association Study/methods , Humans , Logistic Models , Case-Control Studies , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Genomics/methods , Computer Simulation , Haplotypes , Models, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL