Reliably Testing Conditional Independence on Discrete Data using Stochastic Complexity

Abstract. Testing for conditional independence is a core part of constraint-based causal discovery. We specifically focus on discrete data. Although commonly used tests are perfect in theory, due to data sparsity they often fail to reject independence in practice—especially when conditioning on multiple variables.

We propose a new conditional independence test based on the notion of algorithmic independence. To instantiate this ideal formulation in practice, we use stochastic complexity. We show that our proposed test Sci is an asymptotically unbiased estimator for conditional mutual information (CMI) as well as \(L_2\) consistent. Further, we show that Sci can be reformulated to find a sensible threshold for CMI that works well given only limited data.

Empirical evaluation shows that Sci has a lower type II error than commonly used tests, which leads to a higher recall when we use it in causal discovery algorithms; without compromising the precision.

Implementation

SCCI on CRAN
the R source code (March 2019) by Alexander Marx.

Related Publications

Marx, A & Vreeken, J Testing Conditional Independence on Discrete Data using Stochastic Complexity. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, 2019. (31% acceptance rate)
Marx, A & Vreeken, J Stochastic Complexity for Testing Conditional Independence on Discrete Data. In: Proceedings of the NeurIPS 2018 workshop on Causal Learning, pp 1-12, 2018.