RESUMO
The aim of fine mapping is to identify genetic variants causally contributing to complex traits or diseases. Existing fine-mapping methods employ Bayesian discrete mixture priors and depend on a pre-specified maximum number of causal variants, which may lead to sub-optimal solutions. In this work, we propose a Bayesian fine-mapping method called h2-D2, utilizing a continuous global-local shrinkage prior. We also present an approach to define credible sets of causal variants in continuous prior settings. Simulation studies demonstrate that h2-D2 outperforms current state-of-the-art fine-mapping methods such as SuSiE and FINEMAP in accurately identifying causal variants and estimating their effect sizes. We further applied h2-D2 to prostate cancer analysis and discovered some previously unknown causal variants. In addition, we inferred 369 target genes associated with the detected causal variants and several pathways that were significantly over-represented by these genes, shedding light on their potential roles in prostate cancer development and progression.
Assuntos
Neoplasias da Próstata , Locos de Características Quantitativas , Masculino , Humanos , Teorema de Bayes , Polimorfismo de Nucleotídeo Único/genética , Simulação por Computador , Neoplasias da Próstata/genética , Estudo de Associação Genômica Ampla/métodosRESUMO
MOTIVATION: Transcriptome-wide association study (TWAS) aims to identify trait-associated genes regulated by significant variants to explore the underlying biological mechanisms at a tissue-specific level. Despite the advancement of current TWAS methods to cover diverse traits, traditional approaches still face two main challenges: (i) the lack of methods that can guarantee finite-sample false discovery rate (FDR) control in identifying trait-associated genes; and (ii) the requirement for individual-level data, which is often inaccessible. RESULTS: To address this challenge, we propose a powerful knockoff inference method termed TWAS-GKF to identify candidate trait-associated genes with a guaranteed finite-sample FDR control. TWAS-GKF introduces the main idea of Ghostknockoff inference to generate knockoff variables using only summary statistics instead of individual-level data. In extensive studies, we demonstrate that TWAS-GKF successfully controls the finite-sample FDR under a pre-specified FDR level across all settings. We further apply TWAS-GKF to identify genes in brain cerebellum tissue from the Genotype-Tissue Expression (GTEx) v8 project associated with schizophrenia (SCZ) from the Psychiatric Genomics Consortium (PGC), and genes in liver tissue related to low-density lipoprotein cholesterol (LDL-C) from the UK Biobank, respectively. The results reveal that the majority of the identified genes are validated by Open Targets Validation Platform. AVAILABILITY AND IMPLEMENTATION: The R package TWAS.GKF is publicly available at https://github.com/AnqiWang2021/TWAS.GKF.
Assuntos
Estudo de Associação Genômica Ampla , Transcriptoma , Transcriptoma/genética , Humanos , Estudo de Associação Genômica Ampla/métodos , Esquizofrenia/genética , Perfilação da Expressão Gênica/métodos , Algoritmos , Polimorfismo de Nucleotídeo Único , Locos de Características QuantitativasRESUMO
Biological networks are important for the analysis of human diseases, which summarize the regulatory interactions and other relationships between different molecules. Understanding and constructing networks for molecules, such as DNA, RNA and proteins, can help elucidate the mechanisms of complex biological systems. The Gaussian Graphical Models (GGMs) are popular tools for the estimation of biological networks. Nonetheless, reconstructing GGMs from high-dimensional datasets is still challenging. The current methods cannot handle the sparsity and high-dimensionality issues arising from datasets very well. Here, we developed a new GGM, called the GR2D2 (Graphical $R^2$-induced Dirichlet Decomposition) model, based on the R2D2 priors for linear models. Besides, we provided a data-augmented block Gibbs sampler algorithm. The R code is available at https://github.com/RavenGan/GR2D2. The GR2D2 estimator shows superior performance in estimating the precision matrices compared with the existing techniques in various simulation settings. When the true precision matrix is sparse and of high dimension, the GR2D2 provides the estimates with smallest information divergence from the underlying truth. We also compare the GR2D2 estimator with the graphical horseshoe estimator in five cancer RNA-seq gene expression datasets grouped by three cancer types. Our results show that GR2D2 successfully identifies common cancer pathways and cancer-specific pathways for each dataset.
Assuntos
Algoritmos , Oncogenes , Humanos , Modelos Lineares , Simulação por Computador , RNARESUMO
While genome-wide association studies have identified susceptibility variants for numerous traits, their combined utility for predicting broad measures of health, such as mortality, remains poorly understood. We used data from the UK Biobank to combine polygenic risk scores (PRS) for 13 diseases and 12 mortality risk factors into sex-specific composite PRS (cPRS). These cPRS were moderately associated with all-cause mortality in independent data within the UK Biobank: the estimated hazard ratios per standard deviation were 1.10 (95% confidence interval: 1.05, 1.16) and 1.15 (1.10, 1.19) for women and men, respectively. Differences in life expectancy between the top and bottom 5% of the cPRS were estimated to be 4.79 (1.76, 7.81) years and 6.75 (4.16, 9.35) years for women and men, respectively. These associations were substantially attenuated after adjusting for non-genetic mortality risk factors measured at study entry (i.e., middle age for most participants). The cPRS may be useful in counseling younger individuals at higher genetic risk of mortality on modification of non-genetic factors.
Assuntos
Doenças Genéticas Inatas/mortalidade , Predisposição Genética para Doença , Herança Multifatorial/genética , Medição de Risco/estatística & dados numéricos , Bancos de Espécimes Biológicos , Feminino , Doenças Genéticas Inatas/genética , Doenças Genéticas Inatas/patologia , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Modelos de Riscos Proporcionais , Fatores de Risco , Reino UnidoRESUMO
MOTIVATION: Variable selection is a common statistical approach to identifying genes associated with clinical outcomes of scientific interest. There are thousands of genes in genomic studies, while only a limited number of individual samples are available. Therefore, it is important to develop a method to identify genes associated with outcomes of interest that can control finite-sample false discovery rate (FDR) in high-dimensional data settings. RESULTS: This article proposes a novel method named Grace-AKO for graph-constrained estimation (Grace), which incorporates aggregation of multiple knockoffs (AKO) with the network-constrained penalty. Grace-AKO can control FDR in finite-sample settings and improve model stability simultaneously. Simulation studies show that Grace-AKO has better performance in finite-sample FDR control than the original Grace model. We apply Grace-AKO to the prostate cancer data in The Cancer Genome Atlas program by incorporating prostate-specific antigen (PSA) pathways in the Kyoto Encyclopedia of Genes and Genomes as the prior information. Grace-AKO finally identifies 47 candidate genes associated with PSA level, and more than 75% of the detected genes can be validated.
Assuntos
Redes Reguladoras de Genes , Antígeno Prostático Específico , Humanos , Masculino , Simulação por Computador , Genômica , GenomaRESUMO
Functional enrichment results typically implicate tissue or cell-type-specific biological pathways in disease pathogenesis and as therapeutic targets. We propose generalized linkage disequilibrium score regression (g-LDSC) that requires only genome-wide association studies (GWASs) summary-level data to estimate functional enrichment. The method adopts the same assumptions and regression model formulation as stratified linkage disequilibrium score regression (s-LDSC). Although s-LDSC only partially uses LD information, our method uses the whole LD matrix, which accounts for possible correlated error structure via a feasible generalized least-squares estimation. We demonstrate through simulation studies under various scenarios that g-LDSC provides more precise estimates of functional enrichment than s-LDSC, regardless of model misspecification. In an application to GWAS summary statistics of 15 traits from the UK Biobank, estimates of functional enrichment using g-LDSC were lower and more realistic than those obtained from s-LDSC. In addition, g-LDSC detected more significantly enriched functional annotations among 24 functional annotations for the 15 traits than s-LDSC (118 vs. 51).
Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Análise dos Mínimos Quadrados , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único/genética , Simulação por Computador , FenótipoRESUMO
AIM: Myocarditis is a recognized safety concern following COVID-19 mRNA vaccination. However, there is limited research quantifying the risk associated with the third dose or comparing the risk between the three doses. The US Vaccine Adverse Event Reporting System (VAERS) is a passive surveillance system that monitors rare adverse events after US-licensed vaccination. However, studies analyzing VAERS data have often faced criticism for underreporting cases and lacking a control group to assess the increase in baseline risk. METHODS: The temporal association between myocarditis onset and COVID-19 vaccination was studied. To overcome limitations, a novel modified self-controlled case series method was employed, explicitly modeling the case reporting process in VAERS data. RESULTS: We found an increased risk of myocarditis during the 1- to 3-day period following the second and third doses of both the BNT162b2 vaccine and the mRNA-1273 vaccine. Following the second dose, the relative incidence (RI) was 4.89 (95% confidence interval (CI), 2.39-10.08) for the BNT162b2 vaccine and 2.86 (95% CI: 1.18-7.03) for the mRNA-1273 vaccine. Similarly, following the third dose, the RI was 9.04 (95% CI: 2.79-40.99) for the BNT162b2 vaccine and 4.71 (95% CI: 1.42-19.09) for the mRNA-1273 vaccine. No significant increase in risk was observed during other periods. Notably, our analysis also identified a similar increased risk of myocarditis among individuals aged below 30. CONCLUSIONS: These findings raise safety concerns regarding COVID-19 mRNA vaccines, provide insights into the quantification of myocarditis risk at different postvaccination periods, and offer a novel approach to interpreting passive surveillance system data.
Assuntos
Vacinas contra COVID-19 , COVID-19 , Miocardite , Humanos , Vacina de mRNA-1273 contra 2019-nCoV , Vacina BNT162 , COVID-19/prevenção & controle , Vacinas contra COVID-19/efeitos adversos , Vacinas de mRNA , Miocardite/epidemiologia , Miocardite/etiologia , Projetos de Pesquisa , Estados Unidos/epidemiologiaRESUMO
In oncology, it is commonplace to treat patients with a combination of drugs that deliver different effects from different disease-curing or cancer-elimination perspectives. Such drug combinations can often achieve higher efficacy in comparison with single-drug treatment due to synergy or non-overlapping toxicity. Due to the small sample size, there is a growing need for efficient designs for phase I clinical trials, especially for drug-combination trials. In the existing experimental design for phase I drug-combination trials, most of the proposed methods are parametric and model-based, either requiring tuning parameters or prior knowledge of the drug toxicity probabilities. We propose a two-dimensional calibration-free odds (2dCFO) design for drug-combination trials, which utilizes not only the current dose information but also that from all the neighborhood doses (i.e., along the left, right, up and down directions). In contrast to interval-based designs which only use the current dose information, the 2dCFO is more efficient and makes more accurate decisions because of its additional leverage over richer resources of neighborhood data. Because our design makes decisions completely based on odds ratios, it does not rely upon any dose-toxicity curve assumption. The simulations show that the 2dCFO delivers satisfactory performances in terms of accuracy and efficiency as well as demonstrating great robustness due to its non-parametric or model-free nature. More importantly, the 2dCFO only requires the minimal specification of the target toxicity probability, which greatly eases the design process from the clinicians' aspects.
RESUMO
In the severe acute respiratory coronavirus disease 2019 (COVID-19) pandemic, there is an urgent need to develop effective treatments. Through a network-based drug repurposing approach, several effective drug candidates are identified for treating COVID-19 patients in different clinical stages. The proposed approach takes advantage of computational prediction methods by integrating publicly available clinical transcriptome and experimental data. We identify 51 drugs that regulate proteins interacted with SARS-CoV-2 protein through biological pathways against COVID-19, some of which have been experimented in clinical trials. Among the repurposed drug candidates, lovastatin leads to differential gene expression in clinical transcriptome for mild COVID-19 patients, and estradiol cypionate mainly regulates hormone-related biological functions to treat severe COVID-19 patients. Multi-target mechanisms of drug candidates are also explored. Erlotinib targets the viral protein interacted with cytokine and cytokine receptors to affect SARS-CoV-2 attachment and invasion. Lovastatin and testosterone block the angiotensin system to suppress the SARS-CoV-2 infection. In summary, our study has identified effective drug candidates against COVID-19 for patients in different clinical stages and provides comprehensive understanding of potential drug mechanisms.
RESUMO
Mendelian randomization using GWAS summary statistics has become a popular method to infer causal relationships across complex diseases. However, the widespread pleiotropy observed in GWAS has made the selection of valid instrumental variables problematic, leading to possible violations of Mendelian randomization assumptions and thus potentially invalid inferences concerning causation. Furthermore, current MR methods can examine causation in only one direction, so that two separate analyses are required for bi-directional analysis. In this study, we propose a ststistical framework, MRCI (Mixture model Reciprocal Causation Inference), to estimate reciprocal causation between two phenotypes simultaneously using the genome-scale summary statistics of the two phenotypes and reference linkage disequilibrium information. Simulation studies, including strong correlated pleiotropy, showed that MRCI obtained nearly unbiased estimates of causation in both directions, and correct Type I error rates under the null hypothesis. In applications to real GWAS data, MRCI detected significant bi-directional and uni-directional causal influences between common diseases and putative risk factors.
Assuntos
Análise da Randomização Mendeliana , Causalidade , Fatores de Risco , Simulação por Computador , Desequilíbrio de LigaçãoRESUMO
The COVID-19 mRNA vaccine is one of the most effective strategies used to fight against COVID-19. Recently, venous thromboembolism (VTE) events after COVID-19 mRNA vaccination have been reported in various research. Such a concern may hamper the ongoing COVID-19 vaccination campaign. Based on the US Vaccine Adverse Event Reporting System data, this modified self-controlled case series study investigated the association of COVID-19 mRNA vaccination with VTE events among US adults. We found the VTE incidence rate in the recommended dose interval does not change significantly after receiving COVID-19 mRNA vaccines. This conclusion still holds if the analysis is stratified by age and gender. The VTE onset may not be significantly associated with COVID-19 mRNA vaccination.
RESUMO
Recurrent event data analysis plays an important role in many fields, e.g., medicine, social science, and economics. While the existing approaches under the proportional rates or mean model yield poor performance when the underlying model is misspecified, we propose a novel model-free approach by introducing a lower bound on the concordance index (C-Index). We develop an estimation method through deriving a continuous lower bound on the C-Index based on the log-sigmoid function and also provide a variable selection procedure in high dimensional settings. Under both low and high dimensional settings, simulation results show that the proposed methods outperform the gamma frailty recurrent event model when the proportional mean assumption is violated. Moreover, an application to the hospital readmission dataset shows results in line with previous studies and a higher C-Index value further assures model decency.
Assuntos
Modelos Estatísticos , Readmissão do Paciente , Simulação por Computador , HumanosRESUMO
Given the considerable cost of drug discovery, drug repurposing is becoming attractive as it can effectively shorten the development timeline and reduce the development cost. However, most existing drug-repurposing methods omitted the heterogeneous health conditions of different COVID-19 patients. In this study, we evaluated the adverse effect (AE) profiles of 106 COVID-19 drugs. We extracted four AE signatures to characterize the AE distribution of 106 COVID-19 drugs by non-negative matrix factorization (NMF). By integrating the information from four distinct databases (AE, bioassay, chemical structure, and gene expression information), we predicted the AE profiles of 91 drugs with inadequate AE feedback. For each of the drug clusters, discriminant genes accounting for mechanisms of different AE signatures were identified by sparse linear discriminant analysis. Our findings can be divided into three parts. First, drugs abundant with AE-signature 1 (for example, remdesivir) should be taken with caution for patients with poor liver, renal, or cardiac functions, where the functional genes accumulate in the RHO GTPases Activate NADPH Oxidases pathway. Second, drugs featuring AE-signature 2 (for example, hydroxychloroquine) are unsuitable for patients with vascular disorders, with relevant genes enriched in signal transduction pathways. Third, drugs characterized by AE signatures 3 and 4 have relatively mild AEs. Our study showed that NMF and network-based frameworks contribute to more precise drug recommendations.
RESUMO
Polygenic risk scores (PRS) leverage the genetic contribution of an individual's genotype to a complex trait by estimating disease risk. Traditional PRS prediction methods are predominantly for the European population. The accuracy of PRS prediction in non-European populations is diminished due to much smaller sample size of genome-wide association studies (GWAS). In this article, we introduced a novel method to construct PRS for non-European populations, abbreviated as TL-Multi, by conducting a transfer learning framework to learn useful knowledge from the European population to correct the bias for non-European populations. We considered non-European GWAS data as the target data and European GWAS data as the informative auxiliary data. TL-Multi borrows useful information from the auxiliary data to improve the learning accuracy of the target data while preserving the efficiency and accuracy. To demonstrate the practical applicability of the proposed method, we applied TL-Multi to predict the risk of systemic lupus erythematosus (SLE) in the Asian population and the risk of asthma in the Indian population by borrowing information from the European population. TL-Multi achieved better prediction accuracy than the competing methods, including Lassosum and meta-analysis in both simulations and real applications.
RESUMO
It has been nearly 2 years since the first case of COVID-19 was reported. Governments worldwide have introduced numerous non-pharmaceutical interventions (NPIs) to combat this disease. Many of these NPIs were designed in response to initial outbreaks but are unsustainable in the long term. Governments are exploring how to adjust their current NPIs to resume normal activities while effectively protecting their population. As one of the most controversial NPIs, the implementation of travel restrictions varies across regions. Some governments have abandoned their previous travel restrictions because of the induced costs to society and on the economy. Other areas, including Hong Kong (Special Administrative Region of China) and Singapore, continue employing these NPIs as a long-term disease prevention tactic. However, the multidimensional impacts of travel restrictions require careful consideration of how to apply restrictions more appropriately. We have proposed an adapted framework to examine Hong Kong and Singapore's travel restrictions. We aimed to study these two regions' experiences in balancing disease control efforts with easing the burden on lives and livelihoods. Based on the experiences of Hong Kong and Singapore, we have outlined six policy recommendations to serve as the cornerstone for future research and policy practices.
Assuntos
COVID-19 , Hong Kong/epidemiologia , Humanos , SARS-CoV-2 , Singapura/epidemiologia , ViagemRESUMO
Genome-wide association studies (GWAS) have led to the identification of hundreds of susceptibility loci across cancers, but the impact of further studies remains uncertain. Here we analyse summary-level data from GWAS of European ancestry across fourteen cancer sites to estimate the number of common susceptibility variants (polygenicity) and underlying effect-size distribution. All cancers show a high degree of polygenicity, involving at a minimum of thousands of loci. We project that sample sizes required to explain 80% of GWAS heritability vary from 60,000 cases for testicular to over 1,000,000 cases for lung cancer. The maximum relative risk achievable for subjects at the 99th risk percentile of underlying polygenic risk scores (PRS), compared to average risk, ranges from 12 for testicular to 2.5 for ovarian cancer. We show that PRS have potential for risk stratification for cancers of breast, colon and prostate, but less so for others because of modest heritability and lower incidence.