Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 527
Filter
Add more filters

Publication year range
1.
Cell ; 184(18): 4784-4818.e17, 2021 09 02.
Article in English | MEDLINE | ID: mdl-34450027

ABSTRACT

Osteoarthritis affects over 300 million people worldwide. Here, we conduct a genome-wide association study meta-analysis across 826,690 individuals (177,517 with osteoarthritis) and identify 100 independently associated risk variants across 11 osteoarthritis phenotypes, 52 of which have not been associated with the disease before. We report thumb and spine osteoarthritis risk variants and identify differences in genetic effects between weight-bearing and non-weight-bearing joints. We identify sex-specific and early age-at-onset osteoarthritis risk loci. We integrate functional genomics data from primary patient tissues (including articular cartilage, subchondral bone, and osteophytic cartilage) and identify high-confidence effector genes. We provide evidence for genetic correlation with phenotypes related to pain, the main disease symptom, and identify likely causal genes linked to neuronal processes. Our results provide insights into key molecular players in disease processes and highlight attractive drug targets to accelerate translation.


Subject(s)
Genetic Predisposition to Disease , Genetics, Population , Osteoarthritis/genetics , Female , Genome-Wide Association Study , Humans , Osteoarthritis/drug therapy , Phenotype , Polymorphism, Single Nucleotide/genetics , Risk Factors , Sex Characteristics , Signal Transduction/genetics
3.
Am J Hum Genet ; 111(2): 213-226, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38171363

ABSTRACT

The aim of fine mapping is to identify genetic variants causally contributing to complex traits or diseases. Existing fine-mapping methods employ Bayesian discrete mixture priors and depend on a pre-specified maximum number of causal variants, which may lead to sub-optimal solutions. In this work, we propose a Bayesian fine-mapping method called h2-D2, utilizing a continuous global-local shrinkage prior. We also present an approach to define credible sets of causal variants in continuous prior settings. Simulation studies demonstrate that h2-D2 outperforms current state-of-the-art fine-mapping methods such as SuSiE and FINEMAP in accurately identifying causal variants and estimating their effect sizes. We further applied h2-D2 to prostate cancer analysis and discovered some previously unknown causal variants. In addition, we inferred 369 target genes associated with the detected causal variants and several pathways that were significantly over-represented by these genes, shedding light on their potential roles in prostate cancer development and progression.


Subject(s)
Prostatic Neoplasms , Quantitative Trait Loci , Male , Humans , Bayes Theorem , Polymorphism, Single Nucleotide/genetics , Computer Simulation , Prostatic Neoplasms/genetics , Genome-Wide Association Study/methods
4.
PLoS Genet ; 20(3): e1011189, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38484017

ABSTRACT

RNA sequencing (RNA-Seq) is widely used to capture transcriptome dynamics across tissues, biological entities, and conditions. Currently, few or no methods can handle multiple biological variables (e.g., tissues/ phenotypes) and their interactions simultaneously, while also achieving dimension reduction (DR). We propose INSIDER, a general and flexible statistical framework based on matrix factorization, which is freely available at https://github.com/kai0511/insider. INSIDER decomposes variation from different biological variables and their interactions into a shared low-rank latent space. Particularly, it introduces the elastic net penalty to induce sparsity while considering the grouping effects of genes. It can achieve DR of high-dimensional data (of > = 3 dimensions), as opposed to conventional methods (e.g., PCA/NMF) which generally only handle 2D data (e.g., sample × expression). Besides, it enables computing 'adjusted' expression profiles for specific biological variables while controlling variation from other variables. INSIDER is computationally efficient and accommodates missing data. INSIDER also performed similarly or outperformed a close competing method, SDA, as shown in simulations and can handle complex missing data in RNA-Seq data. Moreover, unlike SDA, it can be used when the data cannot be structured into a tensor. Lastly, we demonstrate its usefulness via real data analysis, including clustering donors for disease subtyping, revealing neuro-development trajectory using the BrainSpan data, and uncovering biological processes contributing to variables of interest (e.g., disease status and tissue) and their interactions.


Subject(s)
Algorithms , Transcriptome , Transcriptome/genetics , Sequence Analysis, RNA , Data Analysis , RNA/genetics , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Cluster Analysis
5.
Am J Hum Genet ; 110(9): 1534-1548, 2023 09 07.
Article in English | MEDLINE | ID: mdl-37633278

ABSTRACT

Despite extensive research on global heritability estimation for complex traits, few methods accurately dissect local heritability. A precise local heritability estimate is crucial for high-resolution mapping in genetics. Here, we report the effective heritability estimator (EHE) that can use p values from genome-wide association studies (GWASs) for local heritability estimation by directly converting marginal heritability estimates of SNPs to a non-redundant heritability estimate of a gene or a small genomic region. EHE provides higher accuracy and precision for local heritability estimation among seven compared methods. Importantly, EHE can be applied to estimate the conditional heritability of nearby genes, where redundant heritability among the genes can also be removed further. The conditional estimation can be guided by tissue-specific expression profiles (or other functional scores) to prioritize and quantify more functionally important genes of complex phenotypes. Applying EHE to 42 complex phenotypes from the UK Biobank, we revealed the existence of two types of distinct genetic architectures for various complex phenotypes and found that highly pleiotropic genes are not enriched for more heritability compared to other candidate susceptibility genes. EHE provides an accurate and robust way to dissect the genetic architecture of complex phenotypes.


Subject(s)
Genome-Wide Association Study , Genomics , Multifactorial Inheritance/genetics , Phenotype , Polymorphism, Single Nucleotide/genetics
6.
Mol Psychiatry ; 2024 Mar 15.
Article in English | MEDLINE | ID: mdl-38491343

ABSTRACT

A primary goal of psychiatry is to better understand the pathways that link genetic risk to psychiatric symptoms. Here, we tested association of diagnosis and endophenotypes with overall and neurotransmitter pathway-specific polygenic risk in patients with early-stage psychosis. Subjects included 205 demographically diverse cases with a psychotic disorder who underwent comprehensive psychiatric and neurological phenotyping and 115 matched controls. Following genotyping, we calculated polygenic scores (PGSs) for schizophrenia (SZ) and bipolar disorder (BP) using Psychiatric Genomics Consortium GWAS summary statistics. To test if overall genetic risk can be partitioned into affected neurotransmitter pathways, we calculated pathway PGSs (pPGSs) for SZ risk affecting each of four major neurotransmitter systems: glutamate, GABA, dopamine, and serotonin. Psychosis subjects had elevated SZ PGS versus controls; cases with SZ or BP diagnoses had stronger SZ or BP risk, respectively. There was no significant association within psychosis cases between individual symptom measures and overall PGS. However, neurotransmitter-specific pPGSs were moderately associated with specific endophenotypes; notably, glutamate was associated with SZ diagnosis and with deficits in cognitive control during task-based fMRI, while dopamine was associated with global functioning. Finally, unbiased endophenotype-driven clustering identified three diagnostically mixed case groups that separated on primary deficits of positive symptoms, negative symptoms, global functioning, and cognitive control. All clusters showed strong genome-wide risk. Cluster 2, characterized by deficits in cognitive control and negative symptoms, additionally showed specific risk concentrated in glutamatergic and GABAergic pathways. Due to the intensive characterization of our subjects, the present study was limited to a relatively small cohort. As such, results should be followed up with additional research at the population and mechanism level. Our study suggests pathway-based PGS analysis may be a powerful path forward to study genetic mechanisms driving psychiatric endophenotypes.

7.
Ann Hum Genet ; 2024 Apr 16.
Article in English | MEDLINE | ID: mdl-38624263

ABSTRACT

To investigate the association of attention-deficit/hyperactivity disorder (ADHD) with the 48-base pair (bp) variable number of tandem repeats (VNTR) in exon 3 of the dopamine receptor D4 (DRD4) gene, we genotyped 240 ADHD patients and their parents from Hong Kong. The 4R allele was most common, followed by 2R. We examined association between the 2R allele (relative to 4R) and ADHD by Transmission Disequilibrium Test (TDT). The odds ratio (OR) (95% confidence interval) was 0.90 (0.64-1.3). The p-value was 0.6. Examining subgroups revealed nominally significant association of 2R with inattentive ADHD: OR = 0.33 (0.12-0.92) and p = 0.03. Because our study used TDT analysis, we meta-analyzed the association of 2R with ADHD in Asians (1329 patient alleles), revealing results similar to ours: OR = 0.97 (0.80-1.2) and p = 0.8. To examine the association of 2R with inattentive ADHD, we meta-analyzed all studies (regardless of analysis type or ethnicity, in order to increase statistical power): 702 patient alleles, 1420 control alleles, OR = 0.81 (0.57-1.1) and p = 0.2. Overall, there is no evidence of association between ADHD and the 2R allele, but the suggestive association with the inattentive type warrants further investigation.

8.
Psychol Med ; 54(8): 1461-1474, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38639006

ABSTRACT

Mendelian randomization (MR) leverages genetic information to examine the causal relationship between phenotypes allowing for the presence of unmeasured confounders. MR has been widely applied to unresolved questions in epidemiology, making use of summary statistics from genome-wide association studies on an increasing number of human traits. However, an understanding of essential concepts is necessary for the appropriate application and interpretation of MR. This review aims to provide a non-technical overview of MR and demonstrate its relevance to psychiatric research. We begin with the origins of MR and the reasons for its recent expansion, followed by an overview of its statistical methodology. We then describe the limitations of MR, and how these are being addressed by recent methodological advances. We showcase the practical use of MR in psychiatry through three illustrative examples - the connection between cannabis use and psychosis, the link between intelligence and schizophrenia, and the search for modifiable risk factors for depression. The review concludes with a discussion of the prospects of MR, focusing on the integration of multi-omics data and its extension to delineating complex causal networks.


Subject(s)
Genome-Wide Association Study , Mendelian Randomization Analysis , Schizophrenia , Humans , Schizophrenia/genetics , Causality , Psychotic Disorders/genetics , Psychotic Disorders/epidemiology , Intelligence/genetics , Mental Disorders/genetics , Mental Disorders/epidemiology
9.
Psychol Med ; : 1-9, 2024 Mar 06.
Article in English | MEDLINE | ID: mdl-38445386

ABSTRACT

BACKGROUND: Over the past several decades, more research focuses have been made on the inflammation/immune hypothesis of schizophrenia. Building upon synaptic plasticity hypothesis, inflammation may contribute the underlying pathophysiology of schizophrenia. Yet, pinpointing the specific inflammatory agents responsible for schizophrenia remains a complex challenge, mainly due to medication and metabolic status. Multiple lines of evidence point to a wide-spread genetic association across genome underlying the phenotypic variations of schizophrenia. METHOD: We collected the latest genome-wide association analysis (GWAS) summary data of schizophrenia, cytokines, and longitudinal change of brain. We utilized the omnigenic model which takes into account all genomic SNPs included in the GWAS of trait, instead of traditional Mendelian randomization (MR) methods. We conducted two round MR to investigate the inflammatory triggers of schizophrenia and the resulting longitudinal changes in the brain. RESULTS: We identified seven inflammation markers linked to schizophrenia onset, which all passed the Bonferroni correction for multiple comparisons (bNGF, GROA(CXCL1), IL-8, M-CSF, MCP-3 (CCL7), TNF-ß, CRP). Moreover, CRP were found to significantly influence the linear rate of brain morphology changes, predominantly in the white matter of the cerebrum and cerebellum. CONCLUSION: With an omnigenic approach, our study sheds light on the immune pathology of schizophrenia. Although these findings need confirmation from future studies employing different methodologies, our work provides substantial evidence that pervasive, low-level neuroinflammation may play a pivotal role in schizophrenia, potentially leading to notable longitudinal changes in brain morphology.

10.
Brain Behav Immun ; 118: 22-30, 2024 May.
Article in English | MEDLINE | ID: mdl-38355025

ABSTRACT

BACKGROUND: Schizophrenia and white blood cell counts (WBC) are both complex and polygenic traits. Previous evidence suggests that increased WBC are associated with higher all-cause mortality, and other studies have found elevated WBC in first-episode psychosis and chronic schizophrenia. However, these observational findings may be confounded by antipsychotic exposures and their effects on WBC. Mendelian randomization (MR) is a useful method for examining the directions of genetically-predicted relationships between schizophrenia and WBC. METHODS: We performed a two-sample MR using summary statistics from genome-wide association studies (GWAS) conducted by the Psychiatric Genomics Consortium Schizophrenia Workgroup (N = 130,644) and the Blood Cell Consortium (N = 563,946). The MR methods included inverse variance weighted (IVW), MR Egger, weighted median, MR-PRESSO, contamination mixture, and a novel approach called mixture model reciprocal causal inference (MRCI). False discovery rate was employed to correct for multiple testing. RESULTS: Multiple MR methods supported bidirectional genetically-predicted relationships between lymphocyte count and schizophrenia: IVW (b = 0.026; FDR p-value = 0.008), MR Egger (b = 0.026; FDR p-value = 0.008), weighted median (b = 0.013; FDR p-value = 0.049), and MR-PRESSO (b = 0.014; FDR p-value = 0.010) in the forward direction, and IVW (OR = 1.100; FDR p-value = 0.021), MR Egger (OR = 1.231; FDR p-value < 0.001), weighted median (OR = 1.136; FDR p-value = 0.006) and MRCI (OR = 1.260; FDR p-value = 0.026) in the reverse direction. MR Egger (OR = 1.171; FDR p-value < 0.001) and MRCI (OR = 1.154; FDR p-value = 0.026) both suggested genetically-predicted eosinophil count is associated with schizophrenia, but MR Egger (b = 0.060; FDR p-value = 0.010) and contamination mixture (b = -0.013; FDR p-value = 0.045) gave ambiguous results on whether genetically predicted liability to schizophrenia would be associated with eosinophil count. MR Egger (b = 0.044; FDR p-value = 0.010) and MR-PRESSO (b = 0.009; FDR p-value = 0.045) supported genetically predicted liability to schizophrenia is associated with elevated monocyte count, and the opposite direction was also indicated by MR Egger (OR = 1.231; FDR p-value = 0.045). Lastly, unidirectional genetic liability from schizophrenia to neutrophil count were proposed by MR-PRESSO (b = 0.011; FDR p-value = 0.028) and contamination mixture (b = 0.011; FDR p-value = 0.045) method. CONCLUSION: This MR study utilised multiple MR methods to obtain results suggesting bidirectional genetic genetically-predicted relationships for elevated lymphocyte counts and schizophrenia risk. In addition, moderate evidence also showed bidirectional genetically-predicted relationships between schizophrenia and monocyte counts, and unidirectional effect from genetic liability for eosinophil count to schizophrenia and from genetic liability for schizophrenia to neutrophil count. The influence of schizophrenia to eosinophil count is less certain. Our findings support the role of WBC in schizophrenia and concur with the hypothesis of neuroinflammation in schizophrenia.


Subject(s)
Psychotic Disorders , Schizophrenia , Humans , Schizophrenia/genetics , Genome-Wide Association Study , Mendelian Randomization Analysis , Leukocyte Count
11.
Mol Psychiatry ; 28(7): 2913-2921, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37340172

ABSTRACT

Clinical epidemiological studies have found high co-occurrence between suicide attempts (SA) and opioid use disorder (OUD). However, the patterns of correlation and causation between them are still not clear due to psychiatric confounding. To investigate their cross-phenotype relationship, we utilized raw phenotypes and genotypes from >150,000 UK Biobank samples, and genome-wide association summary statistics from >600,000 individuals with European ancestry. Pairwise association and a potential bidirectional relationship between OUD and SA were evaluated with and without controlling for major psychiatric disease status (e.g., schizophrenia, major depressive disorder, and alcohol use disorder). Multiple statistical and genetics tools were used to perform epidemiological association, genetic correlation, polygenic risk score prediction, and Mendelian randomizations (MR) analyses. Strong associations between OUD and SA were observed at both the phenotypic level (overall samples [OR = 2.94, P = 1.59 ×10-14]; non-psychiatric subgroup [OR = 2.15, P = 1.07 ×10-3]) and the genetic level (genetic correlation rg = 0.38 and 0.5 with or without conditioning on psychiatric traits, respectively). Consistently, increasing polygenic susceptibility to SA is associated with increasing risk of OUD (OR = 1.08, false discovery rate [FDR] =1.71 ×10-3), and similarly, increasing polygenic susceptibility to OUD is associated with increasing risk of SA (OR = 1.09, FDR = 1.73 ×10-6). However, these polygenic associations were much attenuated after controlling for comorbid psychiatric diseases. A combination of MR analyses suggested a possible causal association from genetic liability for SA to OUD risk (2-sample univariable MR: OR = 1.14, P = 0.001; multivariable MR: OR = 1.08, P = 0.001). This study provided new genetic evidence to explain the observed OUD-SA comorbidity. Future prevention strategies for each phenotype needs to take into consideration of screening for the other one.


Subject(s)
Depressive Disorder, Major , Suicide, Attempted , Humans , Depressive Disorder, Major/genetics , Depressive Disorder, Major/psychology , Genome-Wide Association Study , Mendelian Randomization Analysis , Phenotype
12.
Mol Psychiatry ; 2023 Jul 13.
Article in English | MEDLINE | ID: mdl-37443193

ABSTRACT

Across the major psychiatric disorders (MPDs), a shared disruption in brain physiology is suspected. Here we investigate the neural variability at rest, a well-established behavior-relevant marker of brain function, and probe its basis in gene expression and neurotransmitter receptor profiles across the MPDs. We recruited 219 healthy controls and 279 patients with schizophrenia, major depressive disorder, or bipolar disorders (manic or depressive state). The standard deviation of blood oxygenation level-dependent signal (SDBOLD) obtained from resting-state fMRI was used to characterize neural variability. Transdiagnostic disruptions in SDBOLD patterns and their relationships with clinical symptoms and cognitive functions were tested by partial least-squares correlation. Moving beyond the clinical sample, spatial correlations between the observed patterns of SDBOLD disruption and postmortem gene expressions, Neurosynth meta-analytic cognitive functions, and neurotransmitter receptor profiles were estimated. Two transdiagnostic patterns of disrupted SDBOLD were discovered. Pattern 1 is exhibited in all diagnostic groups and is most pronounced in schizophrenia, characterized by higher SDBOLD in the language/auditory networks but lower SDBOLD in the default mode/sensorimotor networks. In comparison, pattern 2 is only exhibited in unipolar and bipolar depression, characterized by higher SDBOLD in the default mode/salience networks but lower SDBOLD in the sensorimotor network. The expression of pattern 1 related to the severity of clinical symptoms and cognitive deficits across MPDs. The two disrupted patterns had distinct spatial correlations with gene expressions (e.g., neuronal projections/cellular processes), meta-analytic cognitive functions (e.g., language/memory), and neurotransmitter receptor expression profiles (e.g., D2/serotonin/opioid receptors). In conclusion, neural variability is a potential transdiagnostic biomarker of MPDs with a substantial amount of its spatial distribution explained by gene expressions and neurotransmitter receptor profiles. The pathophysiology of MPDs can be traced through the measures of neural variability at rest, with varying clinical-cognitive profiles arising from differential spatial patterns of aberrant variability.

13.
Mol Psychiatry ; 28(5): 2095-2106, 2023 May.
Article in English | MEDLINE | ID: mdl-37062770

ABSTRACT

ABTRACT: Studies conducted in psychotic disorders have shown that DNA-methylation (DNAm) is sensitive to the impact of Childhood Adversity (CA). However, whether it mediates the association between CA and psychosis is yet to be explored. Epigenome wide association studies (EWAS) using the Illumina Infinium-Methylation EPIC array in peripheral blood tissue from 366 First-episode of psychosis and 517 healthy controls was performed. Adversity scores were created for abuse, neglect and composite adversity with the Childhood Trauma Questionnaire (CTQ). Regressions examining (I) CTQ scores with psychosis; (II) with DNAm EWAS level and (III) between DNAm and caseness, adjusted for a variety of confounders were conducted. Divide-Aggregate Composite-null Test for the composite null-hypothesis of no mediation effect was conducted. Enrichment analyses were conducted with missMethyl package and the KEGG database. Our results show that CA was associated with psychosis (Composite: OR = 1.68; p = <0.001; abuse: OR = 2.16; p < 0.001; neglect: OR = 2.27; p = <0.001). None of the CpG sites significantly mediated the adversity-psychosis association after Bonferroni correction (p < 8.1 × 10-8). However, 28, 34 and 29 differentially methylated probes associated with 21, 27, 20 genes passed a less stringent discovery threshold (p < 5 × 10-5) for composite, abuse and neglect respectively, with a lack of overlap between abuse and neglect. These included genes previously associated to psychosis in EWAS studies, such as PANK1, SPEG TBKBP1, TSNARE1 or H2R. Downstream gene ontology analyses did not reveal any biological pathways that survived false discovery rate correction. Although at a non-significant level, DNAm changes in genes previously associated with schizophrenia in EWAS studies may mediate the CA-psychosis association. These results and associated involved processes such as mitochondrial or histaminergic disfunction, immunity or neural signalling requires replication in well powered samples. The lack of overlap between mediating genes associated with abuse and neglect suggests differential biological trajectories linking CA subtypes and psychosis.


Subject(s)
Adverse Childhood Experiences , Psychological Tests , Psychotic Disorders , Self Report , Humans , Child , DNA Methylation/genetics , Epigenome , Psychotic Disorders/genetics
14.
Nucleic Acids Res ; 50(D1): D1408-D1416, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34570217

ABSTRACT

Interpreting the molecular mechanism of genomic variations and their causal relationship with diseases/traits are important and challenging problems in the human genetic study. To provide comprehensive and context-specific variant annotations for biologists and clinicians, here, by systematically integrating over 4TB genomic/epigenomic profiles and frequently-used annotation databases from various biological domains, we develop a variant annotation database, called VannoPortal. In general, the database has following major features: (i) systematically integrates 40 genome-wide variant annotations and prediction scores regarding allele frequency, linkage disequilibrium, evolutionary signature, disease/trait association, tissue/cell type-specific epigenome, base-wise functional prediction, allelic imbalance and pathogenicity; (ii) equips with our recent novel index system and parallel random-sweep searching algorithms for efficient management of backend databases and information extraction; (iii) greatly expands context-dependent variant annotation to incorporate large-scale epigenomic maps and regulatory profiles (such as EpiMap) across over 33 tissue/cell types; (iv) compiles many genome-scale base-wise prediction scores for regulatory/pathogenic variant classification beyond protein-coding region; (v) enables fast retrieval and direct comparison of functional evidence among linked variants using highly interactive web panel in addition to plain table; (vi) introduces many visualization functions for more efficient identification and interpretation of functional variants in single web page. VannoPortal is freely available at http://mulinlab.org/vportal.


Subject(s)
Databases, Genetic , Genetic Diseases, Inborn/genetics , Genetic Variation/genetics , Molecular Sequence Annotation , Algorithms , Epigenome/genetics , Genetic Diseases, Inborn/classification , Genome, Human/genetics , Genome-Wide Association Study , Genotype , Humans , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Software
15.
Nucleic Acids Res ; 50(6): e34, 2022 04 08.
Article in English | MEDLINE | ID: mdl-34931221

ABSTRACT

Identifying rare variants that contribute to complex diseases is challenging because of the low statistical power in current tests comparing cases with controls. Here, we propose a novel and powerful rare variants association test based on the deviation of the observed mutation burden of a gene in cases from a baseline predicted by a weighted recursive truncated negative-binomial regression (RUNNER) on genomic features available from public data. Simulation studies show that RUNNER is substantially more powerful than state-of-the-art rare variant association tests and has reasonable type 1 error rates even for stratified populations or in small samples. Applied to real case-control data, RUNNER recapitulates known genes of Hirschsprung disease and Alzheimer's disease missed by current methods and detects promising new candidate genes for both disorders. In a case-only study, RUNNER successfully detected a known causal gene of amyotrophic lateral sclerosis. The present study provides a powerful and robust method to identify susceptibility genes with rare risk variants for complex diseases.


Subject(s)
Genetic Predisposition to Disease , Genetic Variation , Models, Genetic , Software , Case-Control Studies , Computer Simulation , Humans , Mutation
16.
Int J Mol Sci ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38279346

ABSTRACT

Genome-wide association studies (GWAS) are commonly employed to study the genetic basis of complex traits/diseases, and a key question is how much heritability could be explained by all single nucleotide polymorphisms (SNPs) in GWAS. One widely used approach that relies on summary statistics only is linkage disequilibrium score regression (LDSC); however, this approach requires certain assumptions about the effects of SNPs (e.g., all SNPs contribute to heritability and each SNP contributes equal variance). More flexible modeling methods may be useful. We previously developed an approach recovering the "true" effect sizes from a set of observed z-statistics with an empirical Bayes approach, using only summary statistics. However, methods for standard error (SE) estimation are not available yet, limiting the interpretation of our results and the applicability of the approach. In this study, we developed several resampling-based approaches to estimate the SE of SNP-based heritability, including two jackknife and three parametric bootstrap methods. The resampling procedures are performed at the SNP level as it is most common to estimate heritability from GWAS summary statistics alone. Simulations showed that the delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. In particular, the parametric bootstrap approaches yield the lowest root-mean-squared-error (RMSE) of the true SE. We also explored various methods for constructing confidence intervals (CIs). In addition, we applied our method to estimate the SNP-based heritability of 12 immune-related traits (levels of cytokines and growth factors) to shed light on their genetic architecture. We also implemented the methods to compute the sum of heritability explained and the corresponding SE in an R package SumVg. In conclusion, SumVg may provide a useful alternative tool for calculating SNP heritability and estimating SE/CI, which does not rely on distributional assumptions of SNP effects.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Genome-Wide Association Study/methods , Bayes Theorem , Phenotype , Polymorphism, Single Nucleotide
17.
Genet Epidemiol ; 46(7): 372-389, 2022 10.
Article in English | MEDLINE | ID: mdl-35652173

ABSTRACT

As research in genetics has advanced, some findings have been unexpected or shown to be inconsistent between studies or datasets. The reasons these inconsistencies arise are complex. Results from genetic studies can be affected by various factors including statistical power, linkage disequilibrium, quality control, confounding and selection bias, as well as real differences from interactions and effect modifiers, which may be informative about the mechanisms of traits and disease. Statistical artefacts can manifest as differences between results but they can also conceal underlying differences, which implies that their critical examination is important for understanding the underpinnings of traits. In this review, we examine these factors and outline how they can be identified and conceptualised with structural causal models. We explain the consequences they have on genetic estimates, such as genetic associations, polygenic scores, family- and genome-wide heritability, and describe methods to address them to aid in the estimation of true effects of genetic variation. Clarifying these factors can help researchers anticipate when results are likely to diverge and aid researchers' understanding of causal relationships between genes and complex traits.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Humans , Linkage Disequilibrium , Multifactorial Inheritance , Phenotype , Polymorphism, Single Nucleotide
18.
Hum Mol Genet ; 30(9): 836-842, 2021 05 28.
Article in English | MEDLINE | ID: mdl-33693786

ABSTRACT

Genomic discovery efforts for hematological traits have been successfully conducted through genome-wide association study on samples of predominantly European ancestry. We sought to conduct unbiased genetic discovery for coding variants that influence hematological traits in a Han Chinese population. A total of 5257 Han Chinese subjects from Beijing, China were included in the discovery cohort and analyzed by an Illumina ExomeChip array. Replication analyses were conducted in 3827 independent Chinese subjects. We analyzed 12 hematological traits and identified 22 exome-wide significant single-nucleotide polymorphisms (SNP)-trait associations with 15 independent SNPs. Our study provides replication for two associations previously reported but not replicated. Further, one association was identified and replicated in the current study, of a coding variant in the myeloproliferative leukemia (MPL) gene, c.793C > T, p.Leu265Phe (L265F) with increased platelet count (ß = 20.6 109 cells/l, Pmeta-analysis = 2.6 × 10-13). This variant is observed at ~2% population frequency in East Asians, whereas it has not been reported in gnomAD European or African populations. Functional analysis demonstrated that expression of MPL L265F in Ba/F3 cells resulted in enhanced phosphorylation of Stat3 and ERK1/2 as compared with the reference MPL allele, supporting altered activation of the JAK-STAT signal transduction pathway as the mechanism underlying the novel association between MPL L265F and platelet count.


Subject(s)
Genome-Wide Association Study , Asian People/genetics , Humans , Platelet Count , Polymorphism, Single Nucleotide/genetics , Receptors, Thrombopoietin/genetics , Signal Transduction/genetics
19.
Genome Res ; 30(12): 1789-1801, 2020 12.
Article in English | MEDLINE | ID: mdl-33060171

ABSTRACT

The advances of large-scale genomics studies have enabled compilation of cell type-specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalability to process big genomic data, particularly when annotating whole-genome sequencing variants against a huge database with billions of genomic features. Here, we develop VarNote to rapidly annotate genome-scale variants in large and complex functional annotation resources. Equipped with a novel index system and a parallel random-sweep searching algorithm, VarNote shows substantial performance improvements (two to three orders of magnitude) over existing algorithms at different scales. It supports both region-based and allele-specific annotations and introduces advanced functions for the flexible extraction of annotations. By integrating massive base-wise and context-dependent annotations in the VarNote framework, we introduce three efficient and accurate pipelines to prioritize the causal regulatory variants for common diseases, Mendelian disorders, and cancers.


Subject(s)
Computational Biology/methods , Genetic Predisposition to Disease/genetics , Algorithms , Databases, Genetic , Genetic Variation , Genome, Human , Humans , Molecular Sequence Annotation , Whole Genome Sequencing
20.
Genome Res ; 30(11): 1618-1632, 2020 11.
Article in English | MEDLINE | ID: mdl-32948616

ABSTRACT

It is widely recognized that noncoding genetic variants play important roles in many human diseases, but there are multiple challenges that hinder the identification of functional disease-associated noncoding variants. The number of noncoding variants can be many times that of coding variants; many of them are not functional but in linkage disequilibrium with the functional ones; different variants can have epistatic effects; different variants can affect the same genes or pathways in different individuals; and some variants are related to each other not by affecting the same gene but by affecting the binding of the same upstream regulator. To overcome these difficulties, we propose a novel analysis framework that considers convergent impacts of different genetic variants on protein binding, which provides multiscale information about disease-associated perturbations of regulatory elements, genes, and pathways. Applying it to our whole-genome sequencing data of 918 short-segment Hirschsprung disease patients and matched controls, we identify various novel genes not detected by standard single-variant and region-based tests, functionally centering on neural crest migration and development. Our framework also identifies upstream regulators whose binding is influenced by the noncoding variants. Using human neural crest cells, we confirm cell stage-specific regulatory roles of three top novel regulatory elements on our list, respectively in the RET, RASGEF1A, and PIK3C2B loci. In the PIK3C2B regulatory element, we further show that a noncoding variant found only in the patients affects the binding of the gliogenesis regulator NFIA, with a corresponding up-regulation of multiple genes in the same topologically associating domain.


Subject(s)
Enhancer Elements, Genetic , Hirschsprung Disease/genetics , Promoter Regions, Genetic , Class II Phosphatidylinositol 3-Kinases/genetics , Class II Phosphatidylinositol 3-Kinases/metabolism , Genetic Variation , Humans , Introns , NFI Transcription Factors/metabolism , Proto-Oncogene Proteins c-ret/genetics , Whole Genome Sequencing , ras Guanine Nucleotide Exchange Factors/genetics
SELECTION OF CITATIONS
SEARCH DETAIL