Description
Causal discovery from observational data is a challenging problem, with many proposed methods lacking thorough real-world evaluation. Most studies rely on synthetic data or limited real-world examples under idealized assumptions, which do not accurately reflect the complexity of real-world systems.
To address this issue, we introduce CausalRivers, a comprehensive benchmarking kit for causal discovery in time series data. Our dataset consists of extensive river discharge measurements from 1,160 stations in eastern Germany and Bavaria, spanning 2019-2023 with 15-minute temporal resolution. We also include data from a recent flood event around the Elbe River, which exhibits a pronounced distributional shift. By leveraging multiple data sources and time-series metadata, we constructed two distinct causal ground truth graphs for Bavaria and eastern Germany, which can be sampled to generate thousands of subgraphs for benchmarking.
We demonstrate the utility of CausalRivers through multiple experiments and introduce effective baselines, highlighting areas for improvement in causal discovery methods. Our benchmarking kit has the potential to facilitate robust evaluations and comparisons of causal discovery approaches. We also anticipate its relevance to related areas such as time series forecasting and anomaly detection. With this, we hope to establish benchmark-driven method development in the field of causal discovery, as is the case for many other areas of machine learning.