Using Causal Impact For Deeper Insights Beyond A/B Testing

Rédigé par Julien Perichon

Using Causal Impact For Deeper Insights Beyond A/B Testing

Let me tell you a story to explain the importance of Causal impact. There is probably no better way to convince a project sponsor than an impact measurement of your solution! Such a step is paramount, as it helps you know as quickly as possible if you have good results, or if your project needs a redirection.

There are lots of statistical testing methods to do such a measurement, with A/B testing being probably the most widely known of them all. Almost all of them use some sort of control group for comparison: some use a group with no intervention given, others use the past data and often make assumptions on the data distribution.

But what if you’re in a project where you can't make these assumptions? For example, one of my projects made recommendations on which items to sell in grocery stores. Here, testing methods fell short because:

using only raw past data is inefficient: the sales are very seasonal for some items, and very sensitive to policy changes and shortages;
using a real control group is impossible as other stores have very different conditions which can impact sales and different sales volumes;

There was other additional difficulties:

we don’t want to compare just how the mean evolves, but how a time series evolves;
we need detailed results: what is the impact and how confident are we in measuring this impact?

Methods such as A/B testing or Student’s t-test aren’t fit for such a case, and this is where Causal Impact came to the rescue!

What is Causal Impact?

Causal impact is an approach built by Google “to estimating the causal effect of a designed intervention on a time series”, such as a marketing campaign for instance.

All it takes as input is:

a response time series: the one on which you want to see if there is a causal effect;
some set of control time series;
a date of intervention: the model will train on data before this date, and evaluate the causal effect after this date.

Now all it does is estimate a control time series for the response, which represents the response time series if the intervention hadn’t happened. This estimation is made using a Bayesian structural time series model trained on the input set of control time series. It is a Bayesian model, thus it gives a whole distribution for each point in the time series, which helps in knowing how sure you can be on the prediction. Finally, comparing this prediction with the real observations gives the distribution of the causal effect, as shown in the following graph from the original paper:

Output of Causal Impact, from [1] (figure 1)

The important part from a business perspective is the third subgraph. It shows the cumulative effect (the impact you want to know about) by summing over data from the second subgraph, taking into account that the cumulative effect prior to the intervention date is zero by definition. The blue line is the mean cumulative effect, and the blue area is its credibility interval.

Most implementations will give you another measure, which is the p-value for the hypothesis testing that there's signal in the observed data. It is measured by comparing the part of simulations from the post-distribution which yield the same type of impact (positive or negative). This p-value can vary from 0% to 50%, the latter occuring in cases where the impact is mostly random, and the former in cases where the impact is asserted.

Basic study on a toy dataset

The original Google package for Causal impact is written in R, but there are Python implementations such as tfp-causalimpact, which you can install with the following command:

pip install tfp-causalimpact
pip install 'tensorflow-probability[tf]'

Let’s generate some random time series with a control time series!

Code generating a toy dataset for the example here

Let’s have a look at our dataset:

From this alone, it is pretty hard to see if there is any shift in the response time series post-intervention, because we forced the control time series to be smaller. Let’s run Causal impact to evaluate the impact:

The Causal impact graph shows that there is a positive impact in our case, as expected. You can also generate a detailed report with the following code:

The 95% confidence interval for the average absolute effect is pretty large, but its mean (+3.7) is close to the ground truth (+3). Moreover, the report gives the posterior probability of a causal effect, which is 96.82%, meaning that Causal impact is confident in displaying a non-zero impact.

This shows something which could be obvious to the reader: the smaller the real impact, the more difficult it is for Causal impact to tell you that the impact is not random.

Tips for a good Causal Impact study in the real world

Causal impact is very easy to use once all inputs are defined, but choosing the right control series can be hard. This is actually the hardest problem of the study, but here are the checks for a good control series:

it is correlated to the response time series, i.e. it is useful for the prediction;
it is not impacted by the intervention;
when studying the pre-intervention period, you should not see any significative impact;
when studying the post-intervention period on a response time series which you know is not impacted by the intervention (when possible), you should not see any significative impact.

Finally, as Causal impact uses a Bayesian model, it has to infer the posterior distribution before showing any result. This is done by drawing posterior samples to emulate the posterior distribution. The higher the number of samples, the better the estimation of the posterior distribution, but the higher the compute time. Hopefully, as it can affect the stability of you result, this number is tweakable very easily (actually I already showed it earlier):

Now you have all you need to start using Causal impact analyses on your projets. If you want to go further, you can read this article on Machine Learning metrics on why custom metrics are importants for the stakeholders, or this article about LightGBM if you are more interested in predicting a time series.

Are you looking for machine learning experts? Don’t hesitate to contact us!

References

[1] Brodersen KH, Gallusser F, Koehler J, Remy N, Scott SL. Inferring causal impact using Bayesian structural time-series models. Annals of Applied Statistics, 2015, Vol. 9, No. 1, 247-274. http://research.google.com/pubs/pub41854.html

Cet article a été écrit par

Julien Perichon