Causal Inference by Direction of Information

Abstract. We focus on non-parametric causal inference in multivariate data. That is, given two multivariate random variables which are known to be correlated, we aim to efficiently and reliably infer their causal direction – without having to assume a distribution. To this end, we propose a new information theoretic approach for causal inference based on Kolmogorov complexity. In a nutshell, we consider how much information one variable gives about the other, and vice versa, in order to determine the strongest direction of information.

To apply this in practice we propose Ergo, an efficient instantiation based on cumulative and Shannon entropy. We do not restrict the type of correlation between random variables, nor do we have to assume any distribution. Extensive empirical evaluation on synthetic, benchmark, and real-world data shows that Ergo is robust against both noise and dimensionality, efficient, and outperforms the state of the art by a wide margin.

Implementation

the Java source code (October 2015) by Hoang-Vu Nguyen and Jilles Vreeken.

Related Publications

Vreeken, J Causal Inference by Direction of Information. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp 909-917, SIAM, 2015.