Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Biometrics ; 79(4): 3624-3636, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37553770

RESUMO

Missing data are a pervasive issue in observational studies using electronic health records or patient registries. It presents unique challenges for statistical inference, especially causal inference. Inappropriately handling missing data in causal inference could potentially bias causal estimation. Besides missing data problems, observational health data structures typically have mixed-type variables - continuous and categorical covariates - whose joint distribution is often too complex to be modeled by simple parametric models. The existence of missing values in covariates and outcomes makes the causal inference even more challenging, while most standard causal inference approaches assume fully observed data or start their works after imputing missing values in a separate preprocessing stage. To address these problems, we introduce a Bayesian nonparametric causal model to estimate causal effects with missing data. The proposed approach can simultaneously impute missing values, account for multiple outcomes, and estimate causal effects under the potential outcomes framework. We provide three simulation studies to show the performance of our proposed method under complicated data settings whose features are similar to our case studies. For example, Simulation Study 3 assumes the case where missing values exist in both outcomes and covariates. Two case studies were conducted applying our method to evaluate the comparative effectiveness of treatments for chronic disease management in juvenile idiopathic arthritis and cystic fibrosis.


Assuntos
Modelos Estatísticos , Humanos , Teorema de Bayes , Interpretação Estatística de Dados , Simulação por Computador , Causalidade
2.
Front Genet ; 14: 1079198, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37501720

RESUMO

Genome-wide association studies (GWAS) have successfully identified a large number of genetic variants associated with traits and diseases. However, it still remains challenging to fully understand the functional mechanisms underlying many associated variants. This is especially the case when we are interested in variants shared across multiple phenotypes. To address this challenge, we propose graph-GPA 2.0 (GGPA 2.0), a statistical framework to integrate GWAS datasets for multiple phenotypes and incorporate functional annotations within a unified framework. Our simulation studies showed that incorporating functional annotation data using GGPA 2.0 not only improves the detection of disease-associated variants, but also provides a more accurate estimation of relationships among diseases. Next, we analyzed five autoimmune diseases and five psychiatric disorders with the functional annotations derived from GenoSkyline and GenoSkyline-Plus, along with the prior disease graph generated by biomedical literature mining. For autoimmune diseases, GGPA 2.0 identified enrichment for blood-related epigenetic marks, especially B cells and regulatory T cells, across multiple diseases. Psychiatric disorders were enriched for brain-related epigenetic marks, especially the prefrontal cortex and the inferior temporal lobe for bipolar disorder and schizophrenia, respectively. In addition, the pleiotropy between bipolar disorder and schizophrenia was also detected. Finally, we found that GGPA 2.0 is robust to the use of irrelevant and/or incorrect functional annotations. These results demonstrate that GGPA 2.0 can be a powerful tool to identify genetic variants associated with each phenotype or those shared across multiple phenotypes, while also promoting an understanding of functional mechanisms underlying the associated variants.

3.
Biometrics ; 79(3): 1775-1787, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-35895854

RESUMO

High throughput spatial transcriptomics (HST) is a rapidly emerging class of experimental technologies that allow for profiling gene expression in tissue samples at or near single-cell resolution while retaining the spatial location of each sequencing unit within the tissue sample. Through analyzing HST data, we seek to identify sub-populations of cells within a tissue sample that may inform biological phenomena. Existing computational methods either ignore the spatial heterogeneity in gene expression profiles, fail to account for important statistical features such as skewness, or are heuristic-based network clustering methods that lack the inferential benefits of statistical modeling. To address this gap, we develop SPRUCE: a Bayesian spatial multivariate finite mixture model based on multivariate skew-normal distributions, which is capable of identifying distinct cellular sub-populations in HST data. We further implement a novel combination of Pólya-Gamma data augmentation and spatial random effects to infer spatially correlated mixture component membership probabilities without relying on approximate inference techniques. Via a simulation study, we demonstrate the detrimental inferential effects of ignoring skewness or spatial correlation in HST data. Using publicly available human brain HST data, SPRUCE outperforms existing methods in recovering expertly annotated brain layers. Finally, our application of SPRUCE to human breast cancer HST data indicates that SPRUCE can distinguish distinct cell populations within the tumor microenvironment. An R package spruce for fitting the proposed models is available through The Comprehensive R Archive Network.


Assuntos
Modelos Estatísticos , Transcriptoma , Humanos , Teorema de Bayes , Simulação por Computador , Perfilação da Expressão Gênica
4.
Bioinformatics ; 38(4): 1067-1074, 2022 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-34849578

RESUMO

MOTIVATION: In spite of great success of genome-wide association studies (GWAS), multiple challenges still remain. First, complex traits are often associated with many single nucleotide polymorphisms (SNPs), each with small or moderate effect sizes. Second, our understanding of the functional mechanisms through which genetic variants are associated with complex traits is still limited. To address these challenges, we propose GPA-Tree and it simultaneously implements association mapping and identifies key combinations of functional annotations related to risk-associated SNPs by combining a decision tree algorithm with a hierarchical modeling framework. RESULTS: First, we implemented simulation studies to evaluate the proposed GPA-Tree method and compared its performance with existing statistical approaches. The results indicate that GPA-Tree outperforms existing statistical approaches in detecting risk-associated SNPs and identifying the true combinations of functional annotations with high accuracy. Second, we applied GPA-Tree to a systemic lupus erythematosus (SLE) GWAS and functional annotation data including GenoSkyline and GenoSkylinePlus. The results from GPA-Tree highlight the dysregulation of blood immune cells, including but not limited to primary B, memory helper T, regulatory T, neutrophils and CD8+ memory T cells in SLE. These results demonstrate that GPA-Tree can be a powerful tool that improves association mapping while facilitating understanding of the underlying genetic architecture of complex traits and potential mechanisms linking risk-associated SNPs with complex traits. AVAILABILITY AND IMPLEMENTATION: The GPATree software is available at https://dongjunchung.github.io/GPATree/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Software , Estudo de Associação Genômica Ampla/métodos , Algoritmos , Simulação por Computador , Polimorfismo de Nucleotídeo Único
5.
Nat Commun ; 11(1): 346, 2020 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-31937790

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

6.
Nat Commun ; 10(1): 4352, 2019 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-31554810

RESUMO

Circadian clock mechanisms have been extensively investigated but the main rate-limiting step that determines circadian period remains unclear. Formation of a stable complex between clock proteins and CK1 is a conserved feature in eukaryotic circadian mechanisms. Here we show that the FRQ-CK1 interaction, but not FRQ stability, correlates with circadian period in Neurospora circadian clock mutants. Mutations that specifically affect the FRQ-CK1 interaction lead to severe alterations in circadian period. The FRQ-CK1 interaction has two roles in the circadian negative feedback loop. First, it determines the FRQ phosphorylation profile, which regulates FRQ stability and also feeds back to either promote or reduce the interaction itself. Second, it determines the efficiency of circadian negative feedback process by mediating FRQ-dependent WC phosphorylation. Our conclusions are further supported by mathematical modeling and in silico experiments. Together, these results suggest that the FRQ-CK1 interaction is a major rate-limiting step in circadian period determination.


Assuntos
Caseína Quinase I/genética , Ritmo Circadiano/genética , Proteínas Fúngicas/genética , Neurospora crassa/genética , Caseína Quinase I/metabolismo , Relógios Circadianos/genética , Retroalimentação Fisiológica , Proteínas Fúngicas/metabolismo , Mutação , Neurospora crassa/metabolismo , Fosforilação , Ligação Proteica , Fatores de Tempo
7.
Stat Med ; 38(3): 339-353, 2019 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-30232820

RESUMO

Individuals may vary in their responses to treatment, and identification of subgroups differentially affected by a treatment is an important issue in medical research. The risk of misleading subgroup analyses has become well known, and some exploratory analyses can be helpful in clarifying how covariates potentially interact with the treatment. Motivated by a real data study of pediatric kidney transplant, we consider a semiparametric Bayesian latent model and examine its utility for an exploratory subgroup effect analysis using secondary data. The proposed method is concerned with a clinical setting where the number of subgroups is much smaller than that of potential predictors and subgroups are only latently associated with observed covariates. The semiparametric model is flexible in capturing the latent structure driven by data rather than dictated by parametric modeling assumptions. Since it is difficult to correctly specify the conditional relationship between the response and a large number of confounders in modeling, we use propensity score matching to improve the model robustness by balancing the covariates distribution. Simulation studies show that the proposed analysis can find the latent subgrouping structure and, with propensity score matching adjustment, yield robust estimates even when the outcome model is misspecified. In the real data analysis, the proposed analysis reports significant subgroup effects on steroid avoidance in kidney transplant patients, whereas standard proportional hazards regression analysis does not.


Assuntos
Estudos Observacionais como Assunto , Resultado do Tratamento , Teorema de Bayes , Criança , Interpretação Estatística de Dados , Feminino , Rejeição de Enxerto/prevenção & controle , Humanos , Terapia de Imunossupressão/métodos , Transplante de Rim , Masculino , Modelos Estatísticos , Estudos Observacionais como Assunto/métodos , Pontuação de Propensão
8.
Bioinformatics ; 34(12): 2139-2141, 2018 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-29432514

RESUMO

Summary: Integration of genetic studies for multiple phenotypes is a powerful approach to improving the identification of genetic variants associated with complex traits. Although it has been shown that leveraging shared genetic basis among phenotypes, namely pleiotropy, can increase statistical power to identify risk variants, it remains challenging to effectively integrate genome-wide association study (GWAS) datasets for a large number of phenotypes. We previously developed graph-GPA, a Bayesian hierarchical model that integrates multiple GWAS datasets to boost statistical power for the identification of risk variants and to estimate pleiotropic architecture within a unified framework. Here we propose a novel improvement of graph-GPA which incorporates external knowledge about phenotype-phenotype relationship to guide the estimation of genetic correlation and the association mapping. The application of graph-GPA to GWAS datasets for 12 complex diseases with a prior disease graph obtained from a text mining of biomedical literature illustrates its power to improve the identification of risk genetic variants and to facilitate understanding of genetic relationship among complex diseases. Availability and implementation: graph-GPA is implemented as an R package 'GGPA', which is publicly available at http://dongjunchung.github.io/GGPA/. DDNet, a web interface to query diseases of interest and download a prior disease graph obtained from a text mining of biomedical literature, is publicly available at http://www.chunglab.io/ddnet/. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Software , Teorema de Bayes , Biologia Computacional/métodos , Mineração de Dados , Visualização de Dados
9.
PLoS One ; 13(1): e0190949, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29309429

RESUMO

In spite of accumulating evidence suggesting that different complex traits share a common risk basis, namely pleiotropy, effective investigation of pleiotropic architecture still remains challenging. In order to address this challenge, we developed ShinyGPA, an interactive and dynamic visualization toolkit to investigate pleiotropic structure. ShinyGPA requires only the summary statistics from genome-wide association studies (GWAS), which reduces the burden on researchers using this tool. ShinyGPA allows users to effectively investigate genetic relationships among phenotypes using a flexible low-dimensional visualization and an intuitive user interface. In addition, ShinyGPA provides joint association mapping functionality that can facilitate biological understanding of the pleiotropic architecture. We analyzed GWAS summary statistics for 12 phenotypes using ShinyGPA and obtained visualization results and joint association mapping results that are well supported by the literature. The visualization produced by ShinyGPA can also be used as a hypothesis generating tool for relationships between phenotypes, which might also be used to improve the design of future genetic studies. ShinyGPA is currently available at https://dongjunchung.github.io/GPA/.


Assuntos
Bases de Dados Genéticas , Pleiotropia Genética , Estudo de Associação Genômica Ampla , Algoritmos , Humanos , Polimorfismo de Nucleotídeo Único , Interface Usuário-Computador
10.
PLoS Comput Biol ; 13(2): e1005388, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-28212402

RESUMO

Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/.


Assuntos
Mapeamento Cromossômico/métodos , Gráficos por Computador , Estudos de Associação Genética/métodos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Interface Usuário-Computador , Algoritmos , Simulação por Computador , Pleiotropia Genética , Modelos Estatísticos , Linguagens de Programação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA