Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Stat Med ; 41(23): 4716-4743, 2022 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-35908775

RESUMO

Causal discovery algorithms estimate causal graphs from observational data. This can provide a valuable complement to analyses focusing on the causal relation between individual treatment-outcome pairs. Constraint-based causal discovery algorithms rely on conditional independence testing when building the graph. Until recently, these algorithms have been unable to handle missing values. In this article, we investigate two alternative solutions: test-wise deletion and multiple imputation. We establish necessary and sufficient conditions for the recoverability of causal structures under test-wise deletion, and argue that multiple imputation is more challenging in the context of causal discovery than for estimation. We conduct an extensive comparison by simulating from benchmark causal graphs: as one might expect, we find that test-wise deletion and multiple imputation both clearly outperform list-wise deletion and single imputation. Crucially, our results further suggest that multiple imputation is especially useful in settings with a small number of either Gaussian or discrete variables, but when the dataset contains a mix of both neither method is uniformly best. The methods we compare include random forest imputation and a hybrid procedure combining test-wise deletion and multiple imputation. An application to data from the IDEFICS cohort study on diet- and lifestyle-related diseases in European children serves as an illustrating example.


Assuntos
Algoritmos , Projetos de Pesquisa , Causalidade , Criança , Estudos de Coortes , Humanos
2.
Stat Med ; 41(24): 4924-4940, 2022 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-35968913

RESUMO

Causal relationships are of crucial importance for biological and medical research. Algorithms have been proposed for causal structure learning with graphical visualizations. While much of the literature focuses on biological studies where data often follow the same distribution, for example, the normal distribution for all variables, challenges emerge from epidemiological and clinical studies where data are often mixed with continuous, binary, and ordinal variables. We propose to use a mixed latent Gaussian copula model to estimate the underlying correlation structure via the rank correlation for mixed data. This correlation structure is then incorporated into a popular causal discovery algorithm, the PC algorithm, to identify causal structures. The proposed algorithm, called the latent-PC algorithm, is able to discover the true causal structure consistently under mild conditions in high dimensional settings. From simulation studies, the latent-PC algorithm delivers a competitive performance in terms of a similar or higher true positive rate and a similar or lower false positive rate, compared with other variants of the PC algorithm. In the high dimensional settings where the number of variables is more than the number of observations, the causal graphs identified by the latent-PC algorithm are closer to the true causal structures, compared to other competing algorithms. Further, we demonstrate the utility of the latent-PC algorithm in a real dataset for hepatocellular carcinoma. Causal structures for patient survival are visualized and connected with clinical interpretations in the literature.


Assuntos
Algoritmos , Causalidade , Simulação por Computador , Humanos , Distribuição Normal
3.
Entropy (Basel) ; 24(3)2022 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-35327862

RESUMO

The PC and FCI algorithms are popular constraint-based methods for learning the structure of directed acyclic graphs (DAGs) in the absence and presence of latent and selection variables, respectively. These algorithms (and their order-independent variants, PC-stable and FCI-stable) have been shown to be consistent for learning sparse high-dimensional DAGs based on partial correlations. However, inferring conditional independences from partial correlations is valid if the data are jointly Gaussian or generated from a linear structural equation model-an assumption that may be violated in many applications. To broaden the scope of high-dimensional causal structure learning, we propose nonparametric variants of the PC-stable and FCI-stable algorithms that employ the conditional distance covariance (CdCov) to test for conditional independence relationships. As the key theoretical contribution, we prove that the high-dimensional consistency of the PC-stable and FCI-stable algorithms carry over to general distributions over DAGs when we implement CdCov-based nonparametric tests for conditional independence. Numerical studies demonstrate that our proposed algorithms perform nearly as good as the PC-stable and FCI-stable for Gaussian distributions, and offer advantages in non-Gaussian graphical models.

4.
Biometrics ; 75(1): 36-47, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30081434

RESUMO

The directed acyclic graph (DAG) is a powerful tool to model the interactions of high-dimensional variables. While estimating edge directions in a DAG often requires interventional data, one can estimate the skeleton of a DAG (i.e., an undirected graph formed by removing the direction of each edge in a DAG) using observational data. In real data analyses, the samples of the high-dimensional variables may be collected from a mixture of multiple populations. Each population has its own DAG while the DAGs across populations may have significant overlap. In this article, we propose a two-step approach to jointly estimate the DAG skeletons of multiple populations while the population origin of each sample may or may not be labeled. In particular, our method allows a probabilistic soft label for each sample, which can be easily computed and often leads to more accurate skeleton estimation than hard labels. Compared with separate estimation of skeletons for each population, our method is more accurate and robust to labeling errors. We study the estimation consistency for our method, and demonstrate its performance using simulation studies in different settings. Finally, we apply our method to analyze gene expression data from breast cancer patients of multiple cancer subtypes.


Assuntos
Gráficos por Computador/estatística & dados numéricos , Projetos de Pesquisa Epidemiológica , Modelos Estatísticos , Neoplasias da Mama/genética , Simulação por Computador , Feminino , Expressão Gênica , Genes Neoplásicos , Humanos
5.
BMC Med Res Methodol ; 18(1): 67, 2018 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-29969993

RESUMO

BACKGROUND: Recently, the intervention calculus when the DAG is absent (IDA) method was developed to estimate lower bounds of causal effects from observational high-dimensional data. Originally it was introduced to assess the effect of baseline biomarkers which do not vary over time. However, in many clinical settings, measurements of biomarkers are repeated at fixed time points during treatment and, therefore, this method needs to be extended. The purpose of this paper is to extend the first step of the IDA, the Peter Clarks (PC)-algorithm, to a time-dependent exposure in the context of a binary outcome. METHODS: We generalised the so-called "PC-algorithm" to take into account the chronological order of repeated measurements of the exposure and proposed to apply the IDA with our new version, the chronologically ordered PC-algorithm (COPC-algorithm). The extension includes Firth's correction. A simulation study has been performed before applying the method for estimating causal effects of time-dependent immunological biomarkers on toxicity, death and progression in patients with metastatic melanoma. RESULTS: The simulation study showed that the completed partially directed acyclic graphs (CPDAGs) obtained using COPC-algorithm were structurally closer to the true CPDAG than CPDAGs obtained using PC-algorithm. Also, causal effects were more accurate when they were estimated based on CPDAGs obtained using COPC-algorithm. Moreover, CPDAGs obtained by COPC-algorithm allowed removing non-chronological arrows with a variable measured at a time t pointing to a variable measured at a time t´ where t´ < t. Bidirected edges were less present in CPDAGs obtained with the COPC-algorithm, supporting the fact that there was less variability in causal effects estimated from these CPDAGs. In the example, a threshold of the per-comparison error rate of 0.5% led to the selection of an interpretable set of biomarkers. CONCLUSIONS: The COPC-algorithm provided CPDAGs that keep the chronological structure present in the data and thus allowed to estimate lower bounds of the causal effect of time-dependent immunological biomarkers on early toxicity, premature death and progression.


Assuntos
Algoritmos , Simulação por Computador , Interpretação Estatística de Dados , Modelos Estatísticos , Biomarcadores/análise , Humanos , Avaliação de Resultados em Cuidados de Saúde/métodos , Avaliação de Resultados em Cuidados de Saúde/estatística & dados numéricos , Fatores de Tempo
6.
Biometrics ; 72(1): 146-55, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26406114

RESUMO

Estimation of the skeleton of a directed acyclic graph (DAG) is of great importance for understanding the underlying DAG and causal effects can be assessed from the skeleton when the DAG is not identifiable. We propose a novel method named PenPC to estimate the skeleton of a high-dimensional DAG by a two-step approach. We first estimate the nonzero entries of a concentration matrix using penalized regression, and then fix the difference between the concentration matrix and the skeleton by evaluating a set of conditional independence hypotheses. For high-dimensional problems where the number of vertices p is in polynomial or exponential scale of sample size n, we study the asymptotic property of PenPC on two types of graphs: traditional random graphs where all the vertices have the same expected number of neighbors, and scale-free graphs where a few vertices may have a large number of neighbors. As illustrated by extensive simulations and applications on gene expression data of cancer patients, PenPC has higher sensitivity and specificity than the state-of-the-art method, the PC-stable algorithm.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Mama/epidemiologia , Neoplasias da Mama/genética , Perfilação da Expressão Gênica/métodos , Predisposição Genética para Doença/genética , Modelos Estatísticos , Simulação por Computador , Interpretação Estatística de Dados , Feminino , Marcadores Genéticos/genética , Predisposição Genética para Doença/epidemiologia , Humanos , Proteínas de Neoplasias/genética , Prevalência , Reprodutibilidade dos Testes , Fatores de Risco , Sensibilidade e Especificidade
7.
Sci Rep ; 14(1): 6822, 2024 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-38514750

RESUMO

Childhood obesity is a complex disorder that appears to be influenced by an interacting system of many factors. Taking this complexity into account, we aim to investigate the causal structure underlying childhood obesity. Our focus is on identifying potential early, direct or indirect, causes of obesity which may be promising targets for prevention strategies. Using a causal discovery algorithm, we estimate a cohort causal graph (CCG) over the life course from childhood to adolescence. We adapt a popular method, the so-called PC-algorithm, to deal with missing values by multiple imputation, with mixed discrete and continuous variables, and that takes background knowledge such as the time-structure of cohort data into account. The algorithm is then applied to learn the causal structure among 51 variables including obesity, early life factors, diet, lifestyle, insulin resistance, puberty stage and cultural background of 5112 children from the European IDEFICS/I.Family cohort across three waves (2007-2014). The robustness of the learned causal structure is addressed in a series of alternative and sensitivity analyses; in particular, we use bootstrap resamples to assess the stability of aspects of the learned CCG. Our results suggest some but only indirect possible causal paths from early modifiable risk factors, such as audio-visual media consumption and physical activity, to obesity (measured by age- and sex-adjusted BMI z-scores) 6 years later.


Assuntos
Resistência à Insulina , Obesidade Infantil , Humanos , Criança , Adolescente , Obesidade Infantil/epidemiologia , Estudos Longitudinais , Fatores de Risco , Dieta , Índice de Massa Corporal
8.
Microorganisms ; 12(4)2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38674742

RESUMO

The global dissemination of SARS-CoV-2 resulted in the emergence of several variants, including Alpha, Alpha + E484K, Beta, and Omicron. Our research integrated the study of eukaryotic translation factors and fundamental components in general protein synthesis with the analysis of SARS-CoV-2 variants and vaccination status. Utilizing statistical methods, we successfully differentiated between variants in infected individuals and, to a lesser extent, between vaccinated and non-vaccinated infected individuals, relying on the expression profiles of translation factors. Additionally, our investigation identified common causal relationships among the translation factors, shedding light on the interplay between SARS-CoV-2 variants and the host's translation machinery.

9.
Environ Sci Pollut Res Int ; 29(9): 13504-13522, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34595709

RESUMO

Energy intensity reduction is an exigent issue for Iran, where energy consumption is so high. Therefore, finding effective policies to reduce energy intensity is essential. With this in mind, the impact of financial development, government investment, oil revenues, and trade openness on energy intensity is assessed in this study. We combined structural vector error correction model (SVECM) and directed acyclic graphs (DAG) technique to examine the relationships between study variables. The results of DAG prove that financial development, government investment, oil revenues, and trade openness influence the intensity of energy. Besides, the significant and long-run relationships among variables allowed us to apply SVECM. Impulse response functions and variance decomposition analysis indicate that government investment, oil revenues, and trade openness are negatively associated with the intensity of energy. Also, financial development positively influences energy intensity. Meanwhile, the impact of government investment is more significant than oil revenues, trade openness, and financial development impacts. So, government investment is the most effective policy regarding optimizing the consumption of energy and reducing energy intensity. We also advise policymakers to use oil revenues to increase government investment, enhance trade openness, and tax the private sector to improve the level of energy intensity.


Assuntos
Dióxido de Carbono , Desenvolvimento Econômico , Dióxido de Carbono/análise , Investimentos em Saúde , Irã (Geográfico) , Fatores de Tempo
10.
Cells ; 11(1)2021 12 29.
Artigo em Inglês | MEDLINE | ID: mdl-35011654

RESUMO

Genome-wide transcriptome analysis is a method that produces important data on plant biology at a systemic level. The lack of understanding of the relationships between proteins and genes in plants necessitates a further thorough analysis at the proteogenomic level. Recently, our group generated a quantitative proteogenomic atlas of 15 sweet cherry (Prunus avium L.) cv. 'Tragana Edessis' tissues represented by 29,247 genes and 7584 proteins. The aim of the current study was to perform a targeted analysis at the gene/protein level to assess the structure of their relation, and the biological implications. Weighted correlation network analysis and causal modeling were employed to, respectively, cluster the gene/protein pairs, and reveal their cause-effect relations, aiming to assess the associated biological functions. To the best of our knowledge, this is the first time that causal modeling has been employed within the proteogenomics concept in plants. The analysis revealed the complex nature of causal relations among genes/proteins that are important for traits of interest in perennial fruit trees, particularly regarding the fruit softening and ripening process in sweet cherry. Causal discovery could be used to highlight persistent relations at the gene/protein level, stimulating biological interpretation and facilitating further study of the proteogenomic atlas in plants.


Assuntos
Frutas/genética , Genes de Plantas , Modelos Biológicos , Proteínas de Plantas/genética , Proteogenômica , Prunus avium/genética , Árvores/genética , Frutas/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento , Regulação da Expressão Gênica de Plantas , Ontologia Genética , Redes Reguladoras de Genes , Proteínas de Plantas/metabolismo , Prunus avium/crescimento & desenvolvimento , Árvores/crescimento & desenvolvimento
11.
J Bioinform Comput Biol ; 18(4): 2050023, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32706288

RESUMO

Many biological and biomedical research areas such as drug design require analyzing the Gene Regulatory Networks (GRNs) to provide clear insight and understanding of the cellular processes in live cells. Under normality assumption for the genes, GRNs can be constructed by assessing the nonzero elements of the inverse covariance matrix. Nevertheless, such techniques are unable to deal with non-normality, multi-modality and heavy tailedness that are commonly seen in current massive genetic data. To relax this limitative constraint, one can apply copula function which is a multivariate cumulative distribution function with uniform marginal distribution. However, since the dependency structures of different pairs of genes in a multivariate problem are very different, the regular multivariate copula will not allow for the construction of an appropriate model. The solution to this problem is using Pair-Copula Constructions (PCCs) which are decompositions of a multivariate density into a cascade of bivariate copula, and therefore, assign different bivariate copula function for each local term. In fact, in this paper, we have constructed inverse covariance matrix based on the use of PCCs when the normality assumption can be moderately or severely violated for capturing a wide range of distributional features and complex dependency structure. To learn the non-Gaussian model for the considered GRN with non-Gaussian genomic data, we apply modified version of copula-based PC algorithm in which normality assumption of marginal densities is dropped. This paper also considers the Dynamic Time Warping (DTW) algorithm to determine the existence of a time delay relation between two genes. Breast cancer is one of the most common diseases in the world where GRN analysis of its subtypes is considerably important; Since by revealing the differences in the GRNs of these subtypes, new therapies and drugs can be found. The findings of our research are used to construct GRNs with high performance, for various subtypes of breast cancer rather than simply using previous models.


Assuntos
Neoplasias da Mama/genética , Redes Reguladoras de Genes , Análise em Microsséries/métodos , Algoritmos , Teorema de Bayes , Neoplasias da Mama/patologia , Biologia Computacional/métodos , Feminino , Humanos , Modelos Genéticos
12.
Int J Biostat ; 14(2)2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30173203

RESUMO

One of the basic aims of science is to unravel the chain of cause and effect of particular systems. Especially for large systems, this can be a daunting task. Detailed interventional and randomized data sampling approaches can be used to resolve the causality question, but for many systems, such interventions are impossible or too costly to obtain. Recently, Maathuis et al. (2010), following ideas from Spirtes et al. (2000), introduced a framework to estimate causal effects in large scale Gaussian systems. By describing the causal network as a directed acyclic graph it is a possible to estimate a class of Markov equivalent systems that describe the underlying causal interactions consistently, even for non-Gaussian systems. In these systems, causal effects stop being linear and cannot be described any more by a single coefficient. In this paper, we derive the general functional form of a causal effect in a large subclass of non-Gaussian distributions, called the non-paranormal. We also derive a convenient approximation, which can be used effectively in estimation. We show that the estimate is consistent under certain conditions and we apply the method to an observational gene expression dataset of the Arabidopsis thaliana circadian clock system.


Assuntos
Bioestatística/métodos , Interpretação Estatística de Dados , Modelos Estatísticos , Distribuições Estatísticas , Arabidopsis/fisiologia , Ritmo Circadiano/fisiologia , Regulação da Expressão Gênica de Plantas/fisiologia
13.
Cancer Inform ; 14(Suppl 1): 23-35, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25861215

RESUMO

Identification of molecular-based signatures is one of the critical steps toward finding therapeutic targets in cancer. In this paper, we propose methods to discover prognostic gene signatures under a causal structure learning framework across the whole genome. The causal structures are represented by directed acyclic graphs (DAGs), wherein we construct gene-specific network modules that constitute a gene and its corresponding regulators. The modules are then subsequently used to correlate with survival times, thus, allowing for a network-oriented approach to gene selection to adjust for potential confounders, as opposed to univariate (gene-by-gene) approaches. Our methods are motivated by and applied to a clear cell renal cell carcinoma (ccRCC) study from The Cancer Genome Atlas (TCGA) where we find several prognostic genes associated with cancer progression - some of which are novel while others confirm existing findings.

14.
Stat Methods Med Res ; 22(5): 466-92, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-22116340

RESUMO

Guarding against false positive selections is important in many applications. We discuss methods based on subsampling and sample splitting for controlling the expected number of false positives and assigning p-values. They are generic and especially useful for high-dimensional settings. We review encouraging results for regression, and we discuss new adaptations and remaining challenges for selecting relevant variables, based on observational data, having a causal or interventional effect on a response of interest.


Assuntos
Causalidade , Reações Falso-Positivas , Modelos Estatísticos
15.
Proc IEEE Int Symp Biomed Imaging ; 2012: 1551-1554, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-26753056

RESUMO

Insight into brain development and organization can be gained by computing correlations between structural and funtional measures in parcellated cortex. Partial correlations can often reduce ambiguity in correlation data by identifying those pairs of regions whose similarity cannot be explained by the influence of other regions with which they may both interact. Consequently a graph with edges indicating nonzero partial correlations may reveal important subnetworks obscured in the correlation data. Here we describe and investigate PC*, a graph pruning algorithm for identification of the partial correlation network in comparison to direct calculation of partial correlations from the inverse of the sample correlation matrix. We show that PC* is far more robust and illustrate its use in the study of covariation in cortical thickness in ROIs defined on a parcellated cortex.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA