Skip to main content

Alternatives to A/B Testing

Premium

A/B tests aren’t ideal or appropriate for certain scenarios. To assess your understanding of alternatives to A/B testing, your interviewer may ask follow-up (e.g. “How will you measure the impact if running an A/B test is not possible?”) or mini-case study questions.

While you’re not expected to describe these methods in detail, you should be prepared to discuss what factors to consider before implementing A/B testing and alternative methods to A/B tests.

When A/B tests aren’t appropriate

Potential reasons for not running an A/B test include:

  • Low traffic
  • Technical limitations
  • Complex user behavior
  • Ethical constraints

Low traffic

For websites/apps/features with low visitor traffic, running A/B tests might take an impractically long time to gather sufficient data, making the process inefficient, and not meaningful from a business perspective.

Technical limitations

Sometimes, implementing A/B tests can be technically challenging, especially in complex systems or with limited resources and a lack of infrastructure.

Running an A/B test to compare the effectiveness of two different checkout processes, one with a single-page checkout and another with a multi-step checkout, could be technically challenging. The checkout process likely involves interactions with various backend systems such as inventory management, payment processing, and order fulfillment.

Complex user behavior

If user behavior is influenced by numerous factors that are difficult to control or measure, A/B testing might not accurately capture the nuances of user interaction.

If a travel platform wants to measure the impact of a feature on repeat user bookings, running an A/B test might be difficult because most users probably only take 2-3 vacations per year. Bookings also depend on various external factors.

Ethical constraints

Sometimes it may not be ethical to run an A/B test, especially without participant consent.

If a social media platform wants to understand the user impact of showing negative content in its home feed vs. positive content, it may not be ethical to deliberately show primarily negative content to a group of users.

Alternative test methods

Although A/B testing is the gold standard for causal inference, there are alternative causal inference methods using observational data. Each method has its own assumptions and limitations. Careful consideration should be given to the appropriateness of the method, given the data and hypotheses.

Common causal inference methods include:

  • Propensity score matching
  • Instrumental variables (IV) analysis
  • Difference-in-differences (DID) analysis
  • Synthetic control methods

Propensity score matching

Propensity score matching estimates the probability of receiving the treatment (e.g. exposure to a new feature) based on observed covariates or confounding variables. Treated and control units are then matched based on their propensity scores, creating balanced comparison groups. The average treatment effect can be estimated by comparing outcomes between the matched groups.

IV analysis

IV analysis identifies instrumental variables, which are correlated with the treatment assignment but not directly with the outcome of interest. These instruments isolate the causal effect of the treatment by removing bias due to unobserved confounding factors. IV analysis requires the identification of valid instruments and assumptions about their relationship with the treatment and outcome.

DID analysis

DID analysis compares changes in outcomes over time between a treatment group and a control group, before and after the introduction of the treatment. By controlling for time-invariant confounders and trends affecting both groups, DID estimates the causal effect of the treatment on the outcome.

DID assumes parallel trends, i.e. that the trends in outcomes for the treatment and control groups would be the same without the treatment.

Synthetic control methods

Synthetic control methods construct a synthetic control unit that closely matches the characteristics of the treated unit(s) before the treatment. By comparing outcomes between the treated unit and its synthetic control, the causal effect of the treatment can be estimated.

Synthetic control methods are particularly useful for evaluating policy interventions or changes at the aggregate level.

Senior candidates should be aware of more advanced experimentation techniques, which are typically faster than A/B tests. Examples include sequential testing, multi-arm bandit, and variance reduction techniques, such as Controlled-experiment Using Pre-Existing Data (CUPED).