Non-Normal Distribution
Question: What if your test metric is not normally distributed?
Not all A/B test metrics follow a normal distribution, especially when sample sizes are small or when the metric is inherently skewed—such as revenue, session length, or time to conversion. While the Central Limit Theorem (CLT) suggests that averages of large enough samples tend to be normally distributed, this assumption may not hold when data is sparse or heavily tailed. In such cases, using traditional parametric tests like the t-test may lead to incorrect conclusions.
To address this, consider non-parametric alternatives such as the Wilcoxon rank-sum test (also known as the Mann-Whitney U test), which does not assume normality and compares the distributions of two groups based on rank order. Another robust approach is bootstrapping, which involves resampling your observed data with replacement many times to construct an empirical distribution of your metric. This can be used to estimate confidence intervals or perform hypothesis tests without relying on parametric assumptions.
Additionally, you can apply data transformations—such as log, square root, or Box-Cox—to reduce skewness and approximate normality. These methods make the distribution of the metric more symmetric and amenable to standard statistical testing. Increasing your sample size also helps because, with more data, the CLT becomes more reliable, and parametric tests become more robust.