Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
1.
Am J Hum Genet ; 110(5): 741-761, 2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37030289

RESUMO

The advent of large-scale genome-wide association studies (GWASs) has motivated the development of statistical methods for phenotype prediction with single-nucleotide polymorphism (SNP) array data. These polygenic risk score (PRS) methods use a multiple linear regression framework to infer joint effect sizes of all genetic variants on the trait. Among the subset of PRS methods that operate on GWAS summary statistics, sparse Bayesian methods have shown competitive predictive ability. However, most existing Bayesian approaches employ Markov chain Monte Carlo (MCMC) algorithms, which are computationally inefficient and do not scale favorably to higher dimensions, for posterior inference. Here, we introduce variational inference of polygenic risk scores (VIPRS), a Bayesian summary statistics-based PRS method that utilizes variational inference techniques to approximate the posterior distribution for the effect sizes. Our experiments with 36 simulation configurations and 12 real phenotypes from the UK Biobank dataset demonstrated that VIPRS is consistently competitive with the state-of-the-art in prediction accuracy while being more than twice as fast as popular MCMC-based approaches. This performance advantage is robust across a variety of genetic architectures, SNP heritabilities, and independent GWAS cohorts. In addition to its competitive accuracy on the "White British" samples, VIPRS showed improved transferability when applied to other ethnic groups, with up to 1.7-fold increase in R2 among individuals of Nigerian ancestry for low-density lipoprotein (LDL) cholesterol. To illustrate its scalability, we applied VIPRS to a dataset of 9.6 million genetic markers, which conferred further improvements in prediction accuracy for highly polygenic traits, such as height.


Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Estudo de Associação Genômica Ampla/métodos , Teorema de Bayes , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Predisposição Genética para Doença
2.
Biostatistics ; 25(4): 1233-1253, 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-38400753

RESUMO

Determining causes of deaths (CODs) occurred outside of civil registration and vital statistics systems is challenging. A technique called verbal autopsy (VA) is widely adopted to gather information on deaths in practice. A VA consists of interviewing relatives of a deceased person about symptoms of the deceased in the period leading to the death, often resulting in multivariate binary responses. While statistical methods have been devised for estimating the cause-specific mortality fractions (CSMFs) for a study population, continued expansion of VA to new populations (or "domains") necessitates approaches that recognize between-domain differences while capitalizing on potential similarities. In this article, we propose such a domain-adaptive method that integrates external between-domain similarity information encoded by a prespecified rooted weighted tree. Given a cause, we use latent class models to characterize the conditional distributions of the responses that may vary by domain. We specify a logistic stick-breaking Gaussian diffusion process prior along the tree for class mixing weights with node-specific spike-and-slab priors to pool information between the domains in a data-driven way. The posterior inference is conducted via a scalable variational Bayes algorithm. Simulation studies show that the domain adaptation enabled by the proposed method improves CSMF estimation and individual COD assignment. We also illustrate and evaluate the method using a validation dataset. The article concludes with a discussion of limitations and future directions.


Assuntos
Autopsia , Teorema de Bayes , Causas de Morte , Humanos , Autopsia/métodos , Modelos Estatísticos , Bioestatística/métodos
3.
Biostatistics ; 2024 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-38916966

RESUMO

Bayesian graphical models are powerful tools to infer complex relationships in high dimension, yet are often fraught with computational and statistical challenges. If exploited in a principled way, the increasing information collected alongside the data of primary interest constitutes an opportunity to mitigate these difficulties by guiding the detection of dependence structures. For instance, gene network inference may be informed by the use of publicly available summary statistics on the regulation of genes by genetic variants. Here we present a novel Gaussian graphical modeling framework to identify and leverage information on the centrality of nodes in conditional independence graphs. Specifically, we consider a fully joint hierarchical model to simultaneously infer (i) sparse precision matrices and (ii) the relevance of node-level information for uncovering the sought-after network structure. We encode such information as candidate auxiliary variables using a spike-and-slab submodel on the propensity of nodes to be hubs, which allows hypothesis-free selection and interpretation of a sparse subset of relevant variables. As efficient exploration of large posterior spaces is needed for real-world applications, we develop a variational expectation conditional maximization algorithm that scales inference to hundreds of samples, nodes and auxiliary variables. We illustrate and exploit the advantages of our approach in simulations and in a gene network study which identifies hub genes involved in biological pathways relevant to immune-mediated diseases.

4.
BMC Bioinformatics ; 25(1): 104, 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38459430

RESUMO

The identification of tumor-specific molecular dependencies is essential for the development of effective cancer therapies. Genetic and chemical perturbations are powerful tools for discovering these dependencies. Even though chemical perturbations can be applied to primary cancer samples at large scale, the interpretation of experiment outcomes is often complicated by the fact that one chemical compound can affect multiple proteins. To overcome this challenge, Batzilla et al. (PLoS Comput Biol 18(8): e1010438, 2022) proposed DepInfeR, a regularized multi-response regression model designed to identify and estimate specific molecular dependencies of individual cancers from their ex-vivo drug sensitivity profiles. Inspired by their work, we propose a Bayesian extension to DepInfeR. Our proposed approach offers several advantages over DepInfeR, including e.g. the ability to handle missing values in both protein-drug affinity and drug sensitivity profiles without the need for data pre-processing steps such as imputation. Moreover, our approach uses Gaussian Processes to capture more complex molecular dependency structures, and provides probabilistic statements about whether a protein in the protein-drug affinity profiles is informative to the drug sensitivity profiles. Simulation studies demonstrate that our proposed approach achieves better prediction accuracy, and is able to discover unreported dependency structures.


Assuntos
Neoplasias , Humanos , Teorema de Bayes , Neoplasias/tratamento farmacológico , Neoplasias/metabolismo , Simulação por Computador
5.
BMC Bioinformatics ; 25(1): 119, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38509499

RESUMO

BACKGROUND: High-dimensional omics data are increasingly utilized in clinical and public health research for disease risk prediction. Many previous sparse methods have been proposed that using prior knowledge, e.g., biological group structure information, to guide the model-building process. However, these methods are still based on a single model, offen leading to overconfident inferences and inferior generalization. RESULTS: We proposed a novel stacking strategy based on a non-negative spike-and-slab Lasso (nsslasso) generalized linear model (GLM) for disease risk prediction in the context of high-dimensional omics data. Briefly, we used prior biological knowledge to segment omics data into a set of sub-data. Each sub-model was trained separately using the features from the group via a proper base learner. Then, the predictions of sub-models were ensembled by a super learner using nsslasso GLM. The proposed method was compared to several competitors, such as the Lasso, grlasso, and gsslasso, using simulated data and two open-access breast cancer data. As a result, the proposed method showed robustly superior prediction performance to the optimal single-model method in high-noise simulated data and real-world data. Furthermore, compared to the traditional stacking method, the proposed nsslasso stacking method can efficiently handle redundant sub-models and identify important sub-models. CONCLUSIONS: The proposed nsslasso method demonstrated favorable predictive accuracy, stability, and biological interpretability. Additionally, the proposed method can also be used to detect new biomarkers and key group structures.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Modelos Lineares , Neoplasias da Mama/genética
6.
Ann Hum Genet ; 88(3): 212-246, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38161273

RESUMO

OBJECTIVE: The genome-wide association studies (GWAS) analysis, the most successful technique for discovering disease-related genetic variation, has some statistical concerns, including multiple testing, the correlation among variants (single-nucleotide polymorphisms) based on linkage disequilibrium and omitting the important variants when fitting the model with just one variant. To eliminate these problems in a small sample-size study, we used a sparse Bayesian learning model for finding bipolar disorder (BD) genetic variants. METHODS: This study used the Wellcome Trust Case Control Consortium data set, including 1998 BD cases and 1500 control samples, and after quality control, 380,628 variants were analysed. In this GWAS, a Bayesian logistic model with hierarchical shrinkage spike and slab priors was used, with all variants considered simultaneously in one model. In order to decrease the computational burden, an alternative inferential method, Bayesian variational inference, has been used. RESULTS: Thirteen variants were selected as associated with BD. The three of them (rs7572953, rs1378850 and rs4148944) were reported in previous GWAS. Eight of which were related to hemogram parameters, such as lymphocyte percentage, plateletcrit and haemoglobin concentration. Among selected related genes, GABPA, ELF3 and JAM2 were enriched in the platelet-derived growth factor pathway. These three genes, along with APP, ARL8A, CDH23 and GPR37L1, could be differential diagnostic variants for BD. CONCLUSIONS: By reducing the statistical restrictions of GWAS analysis, the application of the Bayesian variational spike and slab models can offer insight into the genetic link with BD even with a small sample size. To uncover related variations with other traits, this model needs to be further examined.


Assuntos
Transtorno Bipolar , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Transtorno Bipolar/genética , Transtorno Bipolar/metabolismo , Teorema de Bayes , Predisposição Genética para Doença , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Receptores Acoplados a Proteínas G/genética
7.
Hum Brain Mapp ; 45(10): e26763, 2024 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-38943369

RESUMO

In this article, we develop an analytical approach for estimating brain connectivity networks that accounts for subject heterogeneity. More specifically, we consider a novel extension of a multi-subject Bayesian vector autoregressive model that estimates group-specific directed brain connectivity networks and accounts for the effects of covariates on the network edges. We adopt a flexible approach, allowing for (possibly) nonlinear effects of the covariates on edge strength via a novel Bayesian nonparametric prior that employs a weighted mixture of Gaussian processes. For posterior inference, we achieve computational scalability by implementing a variational Bayes scheme. Our approach enables simultaneous estimation of group-specific networks and selection of relevant covariate effects. We show improved performance over competing two-stage approaches on simulated data. We apply our method on resting-state functional magnetic resonance imaging data from children with a history of traumatic brain injury (TBI) and healthy controls to estimate the effects of age and sex on the group-level connectivities. Our results highlight differences in the distribution of parent nodes. They also suggest alteration in the relation of age, with peak edge strength in children with TBI, and differences in effective connectivity strength between males and females.


Assuntos
Teorema de Bayes , Lesões Encefálicas Traumáticas , Conectoma , Imageamento por Ressonância Magnética , Humanos , Lesões Encefálicas Traumáticas/diagnóstico por imagem , Lesões Encefálicas Traumáticas/fisiopatologia , Feminino , Masculino , Criança , Adolescente , Conectoma/métodos , Encéfalo/diagnóstico por imagem , Encéfalo/fisiopatologia , Rede Nervosa/diagnóstico por imagem , Rede Nervosa/fisiopatologia , Modelos Neurológicos
8.
Biometrics ; 80(4)2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39377518

RESUMO

In this paper, we propose Varying Effects Regression with Graph Estimation (VERGE), a novel Bayesian method for feature selection in regression. Our model has key aspects that allow it to leverage the complex structure of data sets arising from genomics or imaging studies. We distinguish between the predictors, which are the features utilized in the outcome prediction model, and the subject-level covariates, which modulate the effects of the predictors on the outcome. We construct a varying coefficients modeling framework where we infer a network among the predictor variables and utilize this network information to encourage the selection of related predictors. We employ variable selection spike-and-slab priors that enable the selection of both network-linked predictor variables and covariates that modify the predictor effects. We demonstrate through simulation studies that our method outperforms existing alternative methods in terms of both feature selection and predictive accuracy. We illustrate VERGE with an application to characterizing the influence of gut microbiome features on obesity, where we identify a set of microbial taxa and their ecological dependence relations. We allow subject-level covariates, including sex and dietary intake variables to modify the coefficients of the microbiome predictors, providing additional insight into the interplay between these factors.


Assuntos
Teorema de Bayes , Simulação por Computador , Microbioma Gastrointestinal , Obesidade , Humanos , Análise de Regressão , Modelos Estatísticos
9.
Stat Med ; 43(18): 3484-3502, 2024 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-38857904

RESUMO

The rise of cutting-edge precision cancer treatments has led to a growing significance of the optimal biological dose (OBD) in modern oncology trials. These trials now prioritize the consideration of both toxicity and efficacy simultaneously when determining the most desirable dosage for treatment. Traditional approaches in early-phase oncology trials have conventionally relied on the assumption of a monotone relationship between treatment efficacy and dosage. However, this assumption may not hold valid for novel oncology therapies. In reality, the dose-efficacy curve of such treatments may reach a plateau at a specific dose, posing challenges for conventional methods in accurately identifying the OBD. Furthermore, achieving reliable identification of the OBD is typically not possible based on a single small-sample trial. With data from multiple phase I and phase I/II trials, we propose a novel Bayesian random-effects dose-optimization meta-analysis (REDOMA) approach to identify the OBD by synthesizing toxicity and efficacy data from each trial. The REDOMA method can address trials with heterogeneous characteristics. We adopt a curve-free approach based on a Gamma process prior to model the average dose-toxicity relationship. In addition, we utilize a Bayesian model selection framework that uses the spike-and-slab prior as an automatic variable selection technique to eliminate monotonic constraints on the dose-efficacy curve. The good performance of the REDOMA method is confirmed by extensive simulation studies.


Assuntos
Teorema de Bayes , Relação Dose-Resposta a Droga , Humanos , Neoplasias/tratamento farmacológico , Metanálise como Assunto , Simulação por Computador , Ensaios Clínicos Fase I como Assunto/métodos , Antineoplásicos/uso terapêutico , Antineoplásicos/administração & dosagem , Ensaios Clínicos Fase II como Assunto/métodos , Modelos Estatísticos
10.
Stat Med ; 2024 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-39260448

RESUMO

Data irregularity in cancer genomics studies has been widely observed in the form of outliers and heavy-tailed distributions in the complex traits. In the past decade, robust variable selection methods have emerged as powerful alternatives to the nonrobust ones to identify important genes associated with heterogeneous disease traits and build superior predictive models. In this study, to keep the remarkable features of the quantile LASSO and fully Bayesian regularized quantile regression while overcoming their disadvantage in the analysis of high-dimensional genomics data, we propose the spike-and-slab quantile LASSO through a fully Bayesian spike-and-slab formulation under the robust likelihood by adopting the asymmetric Laplace distribution (ALD). The proposed robust method has inherited the prominent properties of selective shrinkage and self-adaptivity to the sparsity pattern from the spike-and-slab LASSO (Roc̆ková and George, J Am Stat Associat, 2018, 113(521): 431-444). Furthermore, the spike-and-slab quantile LASSO has a computational advantage to locate the posterior modes via soft-thresholding rule guided Expectation-Maximization (EM) steps in the coordinate descent framework, a phenomenon rarely observed for robust regularization with nondifferentiable loss functions. We have conducted comprehensive simulation studies with a variety of heavy-tailed errors in both homogeneous and heterogeneous model settings to demonstrate the superiority of the spike-and-slab quantile LASSO over its competing methods. The advantage of the proposed method has been further demonstrated in case studies of the lung adenocarcinomas (LUAD) and skin cutaneous melanoma (SKCM) data from The Cancer Genome Atlas (TCGA).

11.
Stat Med ; 43(21): 4013-4026, 2024 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-38963094

RESUMO

In addition to considering the main effects, understanding gene-environment (G × E) interactions is imperative for determining the etiology of diseases and the factors that affect their prognosis. In the existing statistical framework for censored survival outcomes, there are several challenges in detecting G × E interactions, such as handling high-dimensional omics data, diverse environmental factors, and algorithmic complications in survival analysis. The effect heredity principle has widely been used in studies involving interaction identification because it incorporates the dependence of the main and interaction effects. However, Bayesian survival models that incorporate the assumption of this principle have not been developed. Therefore, we propose Bayesian heredity-constrained accelerated failure time (BHAFT) models for identifying main and interaction (M-I) effects with novel spike-and-slab or regularized horseshoe priors to incorporate the assumption of effect heredity principle. The R package rstan was used to fit the proposed models. Extensive simulations demonstrated that BHAFT models had outperformed other existing models in terms of signal identification, coefficient estimation, and prognosis prediction. Biologically plausible G × E interactions associated with the prognosis of lung adenocarcinoma were identified using our proposed model. Notably, BHAFT models incorporating the effect heredity principle could identify both main and interaction effects, which are highly useful in exploring G × E interactions in high-dimensional survival analysis. The code and data used in our paper are available at https://github.com/SunNa-bayesian/BHAFT.


Assuntos
Teorema de Bayes , Simulação por Computador , Interação Gene-Ambiente , Neoplasias Pulmonares , Humanos , Análise de Sobrevida , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/mortalidade , Modelos Estatísticos , Prognóstico , Adenocarcinoma de Pulmão/genética , Adenocarcinoma de Pulmão/mortalidade , Algoritmos
12.
Stat Med ; 2024 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-39422157

RESUMO

We propose a phase I/II trial design to support dose-finding when the optimal biological dose (OBD) may differ in two prespecified patient subgroups. The proposed design uses a utility function to quantify efficacy-toxicity trade-offs, and a Bayesian model with spike and slab prior distributions for the subgroup effect on toxicity and efficacy to guide dosing and to facilitate identifying either subgroup-specific OBDs or a common OBD depending on the resulting trial data. In a simulation study, we find the proposed design performs nearly as well as a design that ignores subgroups when the dose-toxicity and dose-efficacy relationships are the same in both subgroups, and nearly as well as a design with independent dose-finding within each subgroup when these relationships differ across subgroups. In other words, the proposed adaptive design performs similarly to the design that would be chosen if investigators possessed foreknowledge about whether the dose-toxicity and/or dose-efficacy relationship differs across two prespecified subgroups. Thus, the proposed design may be effective for OBD selection when uncertainty exists about whether the OBD differs in two prespecified subgroups.

13.
Funct Integr Genomics ; 23(1): 62, 2023 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-36805328

RESUMO

Exosomes-related long non-coding RNAs (lncRNAs) have been reported to play significant roles in clear cell renal cell carcinoma (ccRCC). However, there is little known about the relationship between exosomes-related lncRNAs and ccRCC. This study aimed to select optimal prognostic model based on exosomes-related lncRNAs to provide a methodological reference for high-dimensional data. Based on the Cancer Genome Atlas (TCGA) database of 515 ccRCC patients, two risk score models were generated underlying Bayesian spike-and-slab lasso and lasso regression. The optimal model was determined by calculating the area of time-dependent receiver-operating characteristic (ROC) curves in the TCGA and ArrayExpress databases. The immune patterns and sensitivity of immunotherapy between the high and low groups were further explored. Initially, we constructed two risk score models containing 11 and 7 exosomes-related lncRNAs according to Bayesian spike-and-slab lasso and lasso regression respectively. ROC curves revealed that the model constructed by Bayesian spike-and-slab lasso regression was more reliable in predicting survival at 1, 3, and 5 years, yielding an area under the curves (AUCs) of 0.796, 0.732, and 0.742, respectively. Kaplan-Meier (K-M) curves presented that prognosis was poorer in the high-risk score group (P < 0.001). Additionally, the high-risk score group patients were enriched in immune-activating phenotypes and more sensitive to immunotherapy. The exosomes-related lncRNAs model constructed with Bayesian spike-and-slab lasso regression has higher predictive power for ccRCC patients' prognosis, which provides methodological reference for the analysis of high-dimensional data in bioinformatics and guides the tailored treatment of ccRCC patients.


Assuntos
Carcinoma de Células Renais , Exossomos , Neoplasias Renais , RNA Longo não Codificante , Humanos , Carcinoma de Células Renais/genética , Exossomos/genética , RNA Longo não Codificante/genética , Teorema de Bayes , Neoplasias Renais/genética
14.
Biometrics ; 79(1): 264-279, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-34658017

RESUMO

This paper is concerned with using multivariate binary observations to estimate the probabilities of unobserved classes with scientific meanings. We focus on the setting where additional information about sample similarities is available and represented by a rooted weighted tree. Every leaf in the given tree contains multiple samples. Shorter distances over the tree between the leaves indicate a priori higher similarity in class probability vectors. We propose a novel data integrative extension to classical latent class models with tree-structured shrinkage. The proposed approach enables (1) borrowing of information across leaves, (2) estimating data-driven leaf groups with distinct vectors of class probabilities, and (3) individual-level probabilistic class assignment given the observed multivariate binary measurements. We derive and implement a scalable posterior inference algorithm in a variational Bayes framework. Extensive simulations show more accurate estimation of class probabilities than alternatives that suboptimally use the additional sample similarity information. A zoonotic infectious disease application is used to illustrate the proposed approach. The paper concludes by a brief discussion on model limitations and extensions.


Assuntos
Algoritmos , Teorema de Bayes , Probabilidade
15.
Biometrics ; 79(2): 1370-1382, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-35191539

RESUMO

Recent advancements in miniaturized fluorescence microscopy have made it possible to investigate neuronal responses to external stimuli in awake behaving animals through the analysis of intracellular calcium signals. An ongoing challenge is deconvolving the temporal signals to extract the spike trains from the noisy calcium signals' time series. In this article, we propose a nested Bayesian finite mixture specification that allows the estimation of spiking activity and, simultaneously, reconstructing the distributions of the calcium transient spikes' amplitudes under different experimental conditions. The proposed model leverages two nested layers of random discrete mixture priors to borrow information between experiments and discover similarities in the distributional patterns of neuronal responses to different stimuli. Furthermore, the spikes' intensity values are also clustered within and between experimental conditions to determine the existence of common (recurring) response amplitudes. Simulation studies and the analysis of a dataset from the Allen Brain Observatory show the effectiveness of the method in clustering and detecting neuronal activities.


Assuntos
Encéfalo , Cálcio , Animais , Teorema de Bayes , Simulação por Computador , Análise por Conglomerados
16.
Stat Med ; 42(30): 5616-5629, 2023 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-37806971

RESUMO

A wealth of gene expression data generated by high-throughput techniques provides exciting opportunities for studying gene-gene interactions systematically. Gene-gene interactions in a biological system are tightly regulated and are often highly dynamic. The interactions can change flexibly under various internal cellular signals or external stimuli. Previous studies have developed statistical methods to examine these dynamic changes in gene-gene interactions. However, due to the massive number of possible gene combinations that need to be considered in a typical genomic dataset, intensive computation is a common challenge for exploring gene-gene interactions. On the other hand, oftentimes only a small proportion of gene combinations exhibit dynamic co-expression changes. To solve this problem, we propose Bayesian variable selection approaches based on spike-and-slab priors. The proposed algorithms reduce the computational intensity by focusing on identifying subsets of promising gene combinations in the search space. We also adopt a Bayesian multiple hypothesis testing procedure to identify strong dynamic gene co-expression changes. Simulation studies are performed to compare the proposed approaches with existing exhaustive search heuristics. We demonstrate the implementation of our proposed approach to study the association between gene co-expression patterns and overall survival using the RNA-sequencing dataset from The Cancer Genome Atlas breast cancer BRCA-US project.


Assuntos
Algoritmos , Genômica , Humanos , Teorema de Bayes , Simulação por Computador , Heurística
17.
Stat Med ; 42(26): 4867-4885, 2023 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-37643728

RESUMO

Polygenicity refers to the phenomenon that multiple genetic variants have a nonzero effect on a complex trait. It is defined as the proportion of genetic variants with a nonzero effect on the trait. Evaluation of polygenicity can provide valuable insights into the genetic architecture of the trait. Several recent works have attempted to estimate polygenicity at the single nucleotide polymorphism level. However, evaluating polygenicity at the gene level can be biologically more meaningful. We propose the notion of gene-level polygenicity, defined as the proportion of genes having a nonzero effect on the trait under the framework of a transcriptome-wide association study. We introduce a Bayesian approach genepoly to estimate this quantity for a trait. The method is based on spike and slab prior and simultaneously estimates the subset of non-null genes. Our simulation study shows that genepoly efficiently estimates gene-level polygenicity. The method produces a downward bias for small choices of trait heritability due to a non-null gene, which diminishes rapidly with an increase in the genome-wide association study (GWAS) sample size. While identifying the subset of non-null genes, genepoly offers a high level of specificity and an overall good level of sensitivity-the sensitivity increases as the sample size of the reference panel expression and GWAS data increase. We applied the method to seven phenotypes in the UK Biobank, integrating expression data. We find height to be the most polygenic and asthma to be the least polygenic.

18.
Clin Trials ; 20(6): 681-688, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37485950

RESUMO

BACKGROUND/AIMS: The motivating randomized controlled phase I trial evaluates three sodium nitroprusside doses in a novel sodium nitroprusside-enhanced cardiopulmonary resuscitation strategy for improved end-organ perfusion relative to local standard of care. Sodium nitroprusside is a vasodilator with an established safety profile in other indications, whereas the local standard of care uses vasoconstrictors, typically epinephrine. The purpose of the proposed trial is to identify the highest safe dose of sodium nitroprusside in this new context as excessive doses may cause severe hypotension with compromised end-organ perfusion. METHODS: The proposed phase I trial design expands upon traditional dose-finding designs to include a randomized control arm, which is needed to assess safety through the relative increase in serum lactate on hospital admission. For guiding dose escalation, we propose and compare six Bayesian models which characterize expected serum lactate as a function of sodium nitroprusside dose and randomization group. Each model makes a different assumption about the expected change in serum lactate across control cohorts concurrently randomized with each dose. Model selection aims to minimize the expected number of times that a dose is incorrectly classified as safe or unsafe while sample size selection targets an expected number of incorrectly classified doses. Randomization is 1:1 for the initial cohort, and for subsequent cohorts is chosen to maximize the lower confidence bound. RESULTS: The spike-and-slab model minimizes the expected number of times that a dose is incorrectly classified as safe or unsafe under the most scenarios in the motivating three-dose trial, but all six models exhibit relatively similar performance. A 2:1 randomization ratio for the second and third cohorts maximizes the lower confidence bound when using the spike-and-slab model. With the optimal design, on average, 70 individuals will ensure 1 incorrectly classified dose in 6 opportunities. CONCLUSION: We recommend that the motivating trial use the spike-and-slab model with a 1:1 randomization ratio for the initial cohort and 2:1 randomization ratio for subsequent cohorts; however, the simpler fixed effects approaches performed similarly well.


Assuntos
Reanimação Cardiopulmonar , Humanos , Nitroprussiato/uso terapêutico , Teorema de Bayes , Projetos de Pesquisa , Lactatos
19.
Behav Res Methods ; 55(4): 2125-2142, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35830000

RESUMO

This research introduces the fully and partially exploratory factor analysis (EFA) with bi-level Bayesian regularization. The proposed models enable factor selection with a sparse model by conceptualizing the factor and loading as the group and individual levels, respectively. They offer a series of benefits such as factor extraction and parameter estimation in one step, simultaneous estimation of the model and tuning parameters, and the availability of interval estimates. Moreover, partial knowledge can be incorporated together with unknown number of factors in the partially EFA. Simulation studies and real-data analyses demonstrated that both models performed satisfactorily under reasonable conditions and were robust to interference of local dependence, while the partially EFA with appropriate information can outperform the fully version and work well under more extreme conditions. The proposed models have been implemented in the R package LAWBL.


Assuntos
Teorema de Bayes , Humanos , Simulação por Computador
20.
Entropy (Basel) ; 25(9)2023 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-37761609

RESUMO

Developing an efficient computational scheme for high-dimensional Bayesian variable selection in generalised linear models and survival models has always been a challenging problem due to the absence of closed-form solutions to the marginal likelihood. The Reversible Jump Markov Chain Monte Carlo (RJMCMC) approach can be employed to jointly sample models and coefficients, but the effective design of the trans-dimensional jumps of RJMCMC can be challenging, making it hard to implement. Alternatively, the marginal likelihood can be derived conditional on latent variables using a data-augmentation scheme (e.g., Pólya-gamma data augmentation for logistic regression) or using other estimation methods. However, suitable data-augmentation schemes are not available for every generalised linear model and survival model, and estimating the marginal likelihood using a Laplace approximation or a correlated pseudo-marginal method can be computationally expensive. In this paper, three main contributions are presented. Firstly, we present an extended Point-wise implementation of Adaptive Random Neighbourhood Informed proposal (PARNI) to efficiently sample models directly from the marginal posterior distributions of generalised linear models and survival models. Secondly, in light of the recently proposed approximate Laplace approximation, we describe an efficient and accurate estimation method for marginal likelihood that involves adaptive parameters. Additionally, we describe a new method to adapt the algorithmic tuning parameters of the PARNI proposal by replacing Rao-Blackwellised estimates with the combination of a warm-start estimate and the ergodic average. We present numerous numerical results from simulated data and eight high-dimensional genetic mapping data-sets to showcase the efficiency of the novel PARNI proposal compared with the baseline add-delete-swap proposal.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA