Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Proc Mach Learn Res ; 97: 2901-2910, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31497778

RESUMO

In many scientific fields, such as economics and neuroscience, we are often faced with nonstationary time series, and concerned with both finding causal relations and forecasting the values of variables of interest, both of which are particularly challenging in such nonstationary environments. In this paper, we study causal discovery and forecasting for nonstationary time series. By exploiting a particular type of state-space model to represent the processes, we show that nonstationarity helps to identify causal structure and that forecasting naturally benefits from learned causal knowledge. Specifically, we allow changes in both causal strengths and noise variances in the nonlinear state-space models, which, interestingly, renders both the causal structure and model parameters identifiable. Given the causal model, we treat forecasting as a problem in Bayesian inference in the causal model, which exploits the timevarying property of the data and adapts to new observations in a principled manner. Experimental results on synthetic and real-world data sets demonstrate the efficacy of the proposed methods.

2.
Nat Commun ; 10(1): 2553, 2019 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-31201306

RESUMO

The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In large-scale complex dynamical systems such as the Earth system, real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated data opens up the use of novel data-driven causal methods beyond the commonly adopted correlation techniques. Here, we give an overview of causal inference frameworks and identify promising generic application cases common in Earth system sciences and beyond. We discuss challenges and initiate the benchmark platform causeme.net to close the gap between method users and developers.

3.
Front Genet ; 10: 524, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31214249

RESUMO

A fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search. This paper aims to give a introduction to and a brief review of the computational methods for causal discovery that were developed in the past three decades, including constraint-based and score-based methods and those based on functional causal models, supplemented by some illustrations and applications.

4.
Netw Neurosci ; 3(2): 274-306, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30793083

RESUMO

We test the adequacies of several proposed and two new statistical methods for recovering the causal structure of systems with feedback from synthetic BOLD time series. We compare an adaptation of the first correct method for recovering cyclic linear systems; Granger causal regression; a multivariate autoregressive model with a permutation test; the Group Iterative Multiple Model Estimation (GIMME) algorithm; the Ramsey et al. non-Gaussian methods; two non-Gaussian methods by Hyvärinen and Smith; a method due to Patel et al.; and the GlobalMIT algorithm. We introduce and also compare two new methods, Fast Adjacency Skewness (FASK) and Two-Step, both of which exploit non-Gaussian features of the BOLD signal. We give theoretical justifications for the latter two algorithms. Our test models include feedback structures with and without direct feedback (2-cycles), excitatory and inhibitory feedback, models using experimentally determined structural connectivities of macaques, and empirical human resting-state and task data. We find that averaged over all of our simulations, including those with 2-cycles, several of these methods have a better than 80% orientation precision (i.e., the probability of a directed edge is in the true structure given that a procedure estimates it to be so) and the two new methods also have better than 80% recall (probability of recovering an orientation in the true structure).

5.
Bioinformatics ; 35(7): 1204-1212, 2019 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-30192904

RESUMO

MOTIVATION: Integration of data from different modalities is a necessary step for multi-scale data analysis in many fields, including biomedical research and systems biology. Directed graphical models offer an attractive tool for this problem because they can represent both the complex, multivariate probability distributions and the causal pathways influencing the system. Graphical models learned from biomedical data can be used for classification, biomarker selection and functional analysis, while revealing the underlying network structure and thus allowing for arbitrary likelihood queries over the data. RESULTS: In this paper, we present and test new methods for finding directed graphs over mixed data types (continuous and discrete variables). We used this new algorithm, CausalMGM, to identify variables directly linked to disease diagnosis and progression in various multi-modal datasets, including clinical datasets from chronic obstructive pulmonary disease (COPD). COPD is the third leading cause of death and a major cause of disability and thus determining the factors that cause longitudinal lung function decline is very important. Applied on a COPD dataset, mixed graphical models were able to confirm and extend previously described causal effects and provide new insights on the factors that potentially affect the longitudinal lung function decline of COPD patients. AVAILABILITY AND IMPLEMENTATION: The CausalMGM package is available on http://www.causalmgm.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Modelos Biológicos , Doença Pulmonar Obstrutiva Crônica , Algoritmos , Humanos , Prognóstico , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Biologia de Sistemas
6.
KDD ; 2018: 1551-1560, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-30191079

RESUMO

Discovery of causal relationships from observational data is a fundamental problem. Roughly speaking, there are two types of methods for causal discovery, constraint-based ones and score-based ones. Score-based methods avoid the multiple testing problem and enjoy certain advantages compared to constraint-based ones. However, most of them need strong assumptions on the functional forms of causal mechanisms, as well as on data distributions, which limit their applicability. In practice the precise information of the underlying model class is usually unknown. If the above assumptions are violated, both spurious and missing edges may result. In this paper, we introduce generalized score functions for causal discovery based on the characterization of general (conditional) independence relationships between random variables, without assuming particular model classes. In particular, we exploit regression in RKHS to capture the dependence in a non-parametric way. The resulting causal discovery approach produces asymptotically correct results in rather general cases, which may have nonlinear causal mechanisms, a wide class of data distributions, mixed continuous and discrete data, and multidimensional variables. Experimental results on both synthetic and real-world data demonstrate the efficacy of our proposed approach.

7.
Int J Data Sci Anal ; 6(1): 33-45, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30148202

RESUMO

Modern technologies allow large, complex biomedical datasets to be collected from patient cohorts. These datasets are comprised of both continuous and categorical data ("Mixed Data"), and essential variables may be unobserved in this data due to the complex nature of biomedical phenomena. Causal inference algorithms can identify important relationships from biomedical data; however, handling the challenges of causal inference over mixed data with unmeasured confounders in a scalable way is still an open problem. Despite recent advances into causal discovery strategies that could potentially handle these challenges; individually, no study currently exists that comprehensively compares these approaches in this setting. In this paper, we present a comparative study that addresses this problem by comparing the accuracy and efficiency of different strategies in large, mixed datasets with latent confounders. We experiment with two extensions of the Fast Causal Inference algorithm: a maximum probability search procedure we recently developed to identify causal orientations more accurately, and a strategy which quickly eliminates unlikely adjacencies in order to achieve scalability to high-dimensional data. We demonstrate that these methods significantly outperform the state of the art in the field by achieving both accurate edge orientations and tractable running time in simulation experiments on datasets with up to 500 variables. Finally, we demonstrate the usability of the best performing approach on real data by applying it to a biomedical dataset of HIV-infected individuals.

9.
IJCAI (U S) ; 2017: 1347-1353, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28966540

RESUMO

It is commonplace to encounter nonstationary or heterogeneous data, of which the underlying generating process changes over time or across data sets (the data sets may have different experimental conditions or data collection conditions). Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper we develop a principled framework for causal discovery from such data, called Constraint-based causal Discovery from Nonstationary/heterogeneous Data (CD-NOD), which addresses two important questions. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a way to determine causal orientations by making use of independence changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods.

10.
Int J Data Sci Anal ; 3(2): 121-129, 2017 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-28393106

RESUMO

We describe two modifications that parallelize and reorganize caching in the well-known Greedy Equivalence Search (GES) algorithm for discovering directed acyclic graphs on random variables from sample values. We apply one of these modifications, the Fast Greedy Search (FGS) assuming faithfulness, to an i.i.d. sample of 1,000 units to recover with high precision and good recall an average degree 2 directed acyclic graph (DAG) with one million Gaussian variables. We describe a modification of the algorithm to rapidly find the Markov Blanket of any variable in a high dimensional system. Using 51,000 voxels that parcellate an entire human cortex, we apply the FGS algorithm to Blood Oxygenation Level Dependent (BOLD) time series obtained from resting state fMRI.

11.
Uncertain Artif Intell ; 20172017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-29899680

RESUMO

Discovering causal structure of a dynamical system from observed time series is a traditional and important problem. In many practical applications, observed data are obtained by applying subsampling or temporally aggregation to the original causal processes, making it difficult to discover the underlying causal relations. Subsampling refers to the procedure that for every k consecutive observations, one is kept, the rest being skipped, and recently some advances have been made in causal discovery from such data. With temporal aggregation, the local averages or sums of k consecutive, non-overlapping observations in the causal process are computed as new observations, and causal discovery from such data is even harder. In this paper, we investigate how to recover causal relations at the original causal frequency from temporally aggregated data when k is known. Assuming the time series at the causal frequency follows a vector autoregressive (VAR) model, we show that the causal structure at the causal frequency is identifiable from aggregated time series if the noise terms are independent and non-Gaussian and some other technical conditions hold. We then present an estimation method based on non-Gaussian state-space modeling and evaluate its performance on both synthetic and real data.

12.
Proc IEEE Int Conf Data Min ; 2017: 913-918, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-31068766

RESUMO

We address two important issues in causal discovery from nonstationary or heterogeneous data, where parameters associated with a causal structure may change over time or across data sets. First, we investigate how to efficiently estimate the "driving force" of the nonstationarity of a causal mechanism. That is, given a causal mechanism that varies over time or across data sets and whose qualitative structure is known, we aim to extract from data a low-dimensional and interpretable representation of the main components of the changes. For this purpose we develop a novel kernel embedding of nonstationary conditional distributions that does not rely on sliding windows. Second, the embedding also leads to a measure of dependence between the changes of causal modules that can be used to determine the directions of many causal arrows. We demonstrate the power of our methods with experiments on both synthetic and real data.

13.
JMLR Workshop Conf Proc ; 48: 2839-2848, 2016 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28239433

RESUMO

Domain adaptation arises in supervised learning when the training (source domain) and test (target domain) data have different distributions. Let X and Y denote the features and target, respectively, previous work on domain adaptation mainly considers the covariate shift situation where the distribution of the features P(X) changes across domains while the conditional distribution P(Y∣X) stays the same. To reduce domain discrepancy, recent methods try to find invariant components [Formula: see text] that have similar [Formula: see text] on different domains by explicitly minimizing a distribution discrepancy measure. However, it is not clear if [Formula: see text] in different domains is also similar when P(Y∣X) changes. Furthermore, transferable components do not necessarily have to be invariant. If the change in some components is identifiable, we can make use of such components for prediction in the target domain. In this paper, we focus on the case where P(X∣Y) and P(Y) both change in a causal system in which Y is the cause for X. Under appropriate assumptions, we aim to extract conditional transferable components whose conditional distribution [Formula: see text] is invariant after proper location-scale (LS) transformations, and identify how P(Y) changes between domains simultaneously. We provide theoretical analysis and empirical evaluation on both synthetic and real-world data to show the effectiveness of our method.

14.
J Am Med Inform Assoc ; 22(6): 1132-6, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26138794

RESUMO

The Big Data to Knowledge (BD2K) Center for Causal Discovery is developing and disseminating an integrated set of open source tools that support causal modeling and discovery of biomedical knowledge from large and complex biomedical datasets. The Center integrates teams of biomedical and data scientists focused on the refinement of existing and the development of new constraint-based and Bayesian algorithms based on causal Bayesian networks, the optimization of software for efficient operation in a supercomputing environment, and the testing of algorithms and software developed using real data from 3 representative driving biomedical projects: cancer driver mutations, lung disease, and the functional connectome of the human brain. Associated training activities provide both biomedical and data scientists with the knowledge and skills needed to apply and extend these tools. Collaborative activities with the BD2K Consortium further advance causal discovery tools and integrate tools and resources developed by other centers.


Assuntos
Algoritmos , Conjuntos de Dados como Assunto , Pesquisa Translacional Biomédica , Pesquisa Biomédica , Humanos , Estados Unidos
15.
Philos Sci ; 82(4): 556-586, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-27313331

RESUMO

Using Gebharter's (2014) representation, we consider aspects of the problem of discovering the structure of unmeasured sub-mechanisms when the variables in those sub-mechanisms have not been measured. Exploiting an early insight of Sober's (1998), we provide a correct algorithm for identifying latent, endogenous structure-sub-mechanisms-for a restricted class of structures. The algorithm can be merged with other methods for discovering causal relations among unmeasured variables, and feedback relations between measured variables and unobserved causes can sometimes be learned.

17.
Neuroimage ; 84: 986-1006, 2014 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-24099845

RESUMO

We consider several alternative ways of exploiting non-Gaussian distributional features, including some that can in principle identify direct, positive feedback relations (graphically, 2-cycles) and combinations of methods that can identify high dimensional graphs. All of the procedures are implemented in the TETRAD freeware (Ramsey et al., 2013). We show that in most cases the limited accuracy of the several non-Gaussian methods in the Smith et al. (2011) simulations can be attributed to the high-pass Butterworth filter used in that study. Without that filter, or with the filter in the widely used FSL program (Jenkinson et al., 2012), the directional accuracies of several of the non-Gaussian methods are at or near ceiling in many conditions of the Smith et al. simulation. We show that the improvement of an apparently Gaussian method (Patel et al., 2006) when filtering is removed is due to non-Gaussian features of that method introduced by the Smith et al. implementation. We also investigate some conditions in which multi-subject data help with causal structure identification using higher moments, notably with non-stationary time series or with 2-cycles. We illustrate the accuracy of the methods with more complex graphs with and without 2-cycles, and with a 500 node graph; to illustrate applicability and provide a further test we apply the methods to an empirical case for which aspects of the causal structure are known. Finally, we note a number of cautions and issues that remain to be investigated, and some outstanding problems for determining the structure of effective connections from fMRI data.


Assuntos
Algoritmos , Encéfalo/fisiologia , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética , Modelos Neurológicos , Humanos , Vias Neurais/fisiologia
18.
Brain Connect ; 3(6): 578-89, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24093627

RESUMO

Failing to engage in joint attention is a strong marker of impaired social cognition associated with autism spectrum disorder (ASD). The goal of this study was to localize the source of impaired joint attention in individuals with ASD by examining both behavioral and fMRI data collected during various tasks involving eye gaze, directional cuing, and face processing. The tasks were designed to engage three brain networks associated with social cognition [face processing, theory of mind (TOM), and action understanding]. The behavioral results indicate that even high-functioning individuals with ASD perform less accurately and more slowly than neurotypical (NT) controls when processing eyes, but not when processing a directional cue (an arrow) that did not involve eyes. Behavioral differences between the NT and ASD groups were consistent with differences in the effective connectivity of FACE, TOM, and ACTION networks. An independent multiple-sample greedy equivalence search was used to examine these social brain networks and found that whereas NTs produced stable patterns of response across tasks designed to engage a given brain network, ASD participants did not. Moreover, ASD participants recruited all three networks in a manner highly dissimilar to that of NTs. These results extend a growing literature that describes disruptions in general brain connectivity in individuals with autism by targeting specific networks hypothesized to underlie the social cognitive impairments observed in these individuals.


Assuntos
Atenção/fisiologia , Transtorno Autístico/fisiopatologia , Encéfalo/fisiopatologia , Sinais (Psicologia) , Olho , Expressão Facial , Adolescente , Adulto , Mapeamento Encefálico/métodos , Estudos de Casos e Controles , Humanos , Imageamento por Ressonância Magnética/métodos , Tempo de Reação , Comportamento Social , Córtex Visual/fisiopatologia , Adulto Jovem
19.
Neuroimage ; 76: 450-1, 2013 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-21835247

RESUMO

Lindquist and Sobel claim that the graphical causal models they call "agnostic" do not imply any counterfactual conditionals. They doubt that "causal effects" can be discovered using graphical causal models typical of SEMs, DCMs, Bayes nets, Granger causal models, etc. Each of these claims is false or exaggerated. They recommend instead that investigators adopt the "potential outcomes" framework. The potential outcomes framework is an obstacle rather than an aid to discovering causal relations in fMRI contexts.


Assuntos
Artefatos , Encéfalo/fisiologia , Simulação por Computador , Interpretação de Imagem Assistida por Computador/métodos , Metanálise como Assunto , Humanos
20.
Neuroimage ; 58(3): 838-48, 2011 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-21745580

RESUMO

Smith et al. report a large study of the accuracy of 38 search procedures for recovering effective connections in simulations of DCM models under 28 different conditions. Their results are disappointing: no method reliably finds and directs connections without large false negatives, large false positives, or both. Using multiple subject inputs, we apply a previously published search algorithm, IMaGES, and novel orientation algorithms, LOFS, in tandem to all of the simulations of DCM models described by Smith et al. (2011). We find that the procedures accurately identify effective connections in almost all of the conditions that Smith et al. simulated and, in most conditions, direct causal connections with precision greater than 90% and recall greater than 80%.


Assuntos
Mapeamento Encefálico/métodos , Encéfalo/anatomia & histologia , Encéfalo/fisiologia , Interpretação de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética/métodos , Modelos Neurológicos , Algoritmos , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...