A/B Testing Email Campaign
You’re on the data science team at Uber and want to test whether email campaigns for a new feature, UberEats, increases conversion rate on signups for the feature. Assume you have a very large sample to work with. Describe, from a statistical standpoint, how you would run the experiment. Explain what tests you would use, how to assess statistical significance, and potential shortcomings of the experiment for informing the product decision.
Remember that signups are a binary event per user, and the experiment will be run across many users.
What is the null hypothesis in question? Remember that in general, it dictates that there is no significant observable change between groups.
There exist many kinds of statistical tests. Which is most relevant, given that the sample size is very large?
Does your experiment have specific distribution assumptions about the underlying variable of interest?
What specific statistical test are you running based on those variables of interest?
Did you make sure to describe how to assess significance given the experimental assumptions?
We can start out with a control and test group (with email campaigns). Variables about the users (demographics, location, etc.) should be controlled for to ensure there's no deviance in the different sets of users.
After assuring that the two groups are even with respect to those variables, we can run an A/B test as follows. Our variable of interest will be the conversion rate for UberEats within each group. Since signups are a binary event per user, then we can model our variable of interest per user as a Bernoulli random variable (a coin flip, with probability p of signup happening). Therefore, the total number of signups in each group, assuming a group size of n users, will be of a Binomial distribution with parameters n and p. From the Central Limit Theorem, the overall conversion rate will tend towards a normal distribution with enough users.
The null hypothesis here is that the two groups have equal conversion rates. Let group A denote the control group and group B denote the test group. Then to compare whether the two population means are equal (i.e. conversion rate of group A versus that of group B), we can run the experiment for a set amount of time, and use a z-test to assess the difference in empirical population means (conversion rates). The z-test is most appropriate since we assume there is large enough of a sample size. For a given level of statistical significance, say at the 95% level, we can compute the sample z-test statistic and compare it to the given thresholds (of the confidence interval for that threshold). The resulting p-value, if low enough based on the threshold, can allow us to reject the null hypothesis. If the null hypothesis is rejected, and the mean of B’s conversion rate is statistically significantly higher than that of A’s, then we can say the email campaign has increased the conversion rate for UberEats.
There are several caveats worth discussing here as edge cases and nuances to the answer:
-
Modelling as a Bernoulli random variable works for assessing whether or not a user signs up, but it doesn't capture users who unsubscribe / remove themselves from the listserv. It's worth understanding both the positive uptick in metrics as well as potential loss in lifetime value of the subscribers that may drop off from sending the campaign. These unsubscribe numbers could be measured with a separate Bernoulli random variable and used to track the "cost" of running the new campaign.
-
Time delay was ignored with respect to this answer. It may be worth diving into how the experiment should be run over a longer time period, especially for something like email marketing, which can be a delayed process (perhaps a user doesn't convert until the 4th email in the campaign, and they don't see that email until 20 days after the email was sent). These experiments may need to be played out over an extended period of time for accuracy.
-
Just because a particular email campaign outperforms along the metrics of conversion, doesn't necessarily mean the email campaign should be launched. For instance, return rates may be higher, which would counteract the benefit of the positive increase in conversions. There are also qualitative measures (e.g. perhaps the email campaign is aggressive and converts but decreases lifetime value of the customer). This A/B test should be considered in context when making the product/marketing decision.