Skip to main content

How to Answer Root Cause Analysis Questions

Premium

The most common type of execution question is root cause analysis.

Root cause analysis (RCA) questions present a scenario. Something unexpected has happened with a product, and as a PM, it’s your responsibility to figure out what’s going on. For example, an interviewer might ask:

"Imagine you are a PM at Lyft and there’s been a 20% increase in ride cancellations. What would you do about this?"

These interviews are unlike most others in the PM loop in that they are conducted via a “role-play” interviewing style. Your interviewer expects you to ask questions about the context of the problem, the product, and any relevant metrics. Using the information you gather, you’ll iterate through increasingly precise hypotheses until you identify a root cause. Often, your interviewer will play the role of an uninformed data analyst, requiring you to ask concrete questions about reasonably available data.

For example, one of your first clarifying questions might be “Is there anything unusual going on with drivers?” In other PM interviews your interviewer would answer directly, but in execution interviews, they'll likely respond with something like “As a data analyst, I can help you if you clarify what you want to know in terms of data we might have about drivers?”

The interviewer will push you to be more specific and concrete about your requests for data throughout the interview.

What interviewers are looking for

Your ultimate goal in working through RCA questions is to identify the root cause of the problem, but your interviewer will care most about your approach.

RCA questions are different from product design, strategy, and analytical questions in that they do have a correct answer, but a logical and thoughtful approach to problem-solving can land you a "strong hire" recommendation even if you don’t arrive at the exact right solution. Be sure to demonstrate your ability to:

  • Make sense of a situation given limited data
  • Generate and continually refine reasonable hypotheses
  • Gather information iteratively
  • Identify a root cause

A framework for answering RCA questions

RCA questions at most companies are designed to be solved in 25 minutes so a framework to keep you on track is critical. Follow this 6-step process.

  • Step 1: Clarify and gather context
  • Step 2: Form high-level hypotheses
  • Step 3: Gather data
  • Step 4: Refine hypotheses and repeat
  • Step 5: Identify the root cause
  • Step 6: Evaluate

How to Answer Root Cause Analysis Questions Framework

Let’s continue the Lyft RCA question as an example. Assume you’ve been asked:

"Imagine you are a PM at Lyft and there’s been a 20% increase in ride cancellations. What would you do about this?"

Step 1: Clarify and gather context

First, ask clarifying questions about anything that will help you make sense of the issue, scope the problem, and align with your interviewer.

Don’t hesitate to ask broad questions like “How does this affect our core business?” but be aware that your interviewer may not give you direct answers. Instead, they’re likely to prompt you to ask for specific data that you’d reasonably have access to as PM.

A helpful tip for getting into the right mindset is to envision what it’d really be like to be in the situation you’ve been given. What would your priorities be as PM? What kind of data would you need to begin to diagnose the issue?

For our example question “Imagine you are a PM at Lyft and there’s been a 20% increase in ride cancellations. What would you do about this?”

You’d want to clarify what "ride cancellations" means since this could be interpreted in multiple ways. You'd also want to know what the baseline is for comparison when considering a “20% increase.” Ask:

  • “What counts as a ride cancellation?”
  • “Are we seeing cancellations both before matching with a driver and after?”
  • Are we only seeing riders canceling, or does this number include drivers too?”
  • “What is this 20% increase in comparison to? Week over week, for example?”
  • Have cancellations been steadily increasing, or are we seeing a sudden spike?”

Note that this question is very specific. Completed rides are central to Lyft’s business, so it’s obvious that a 20% increase in ride cancellations is bad. You may get a more ambiguous question, like “10% of Netflix users are inactive. What would you do?” In this case, ask clarifying questions about how this affects business. You may find that what’s been framed as a problem may not be the core issue. How to Answer Product Root Cause Analysis Whiteboard 1

Step 2: Form high-level hypotheses

Once you’ve gathered some context, you can start forming high-level hypotheses about what may have happened. Keep in mind that you have limited time to narrow your scope, so a good strategy is to first eliminate broad areas from consideration.

At this stage, you’re simply breaking up the universe of possible root causes into meaningful chunks that you can dive into later and eventually prioritize or discard. Here are a few common sources of problems:

  • An unintended technical issue or a product bug
  • A product change — possibly intended — but with unintended consequences
  • An operational change within the company
  • An external event

Remember that communicating your process is critical, so share your full list of hypotheses with your interviewer before you begin prioritizing in the next step.

For our example question regarding Lyft cancellations, your broad hypotheses might be:

  • A product bug or technical issue such as rides being automatically canceled or an issue with the app not showing users that a driver is on the way.
  • A product change in the Lyft app such as making the “cancel” button more prominent.
  • A change to rider/driver operations which is a key component of our service. An example might be a change in driver pay that leads drivers to cancel less profitable rides.
  • An external factor like a Taylor Swift concert has created massive traffic causing users to cancel when they see long wait times. How to Answer Product Root Cause Analysis Whiteboard 2

Step 3: Gather data

At this point, you’ve got a solid set of high-level hypotheses. The next step is to determine which of these is worth expanding on. You’ll do that by gathering more data.

One helpful strategy for identifying what’s driving the problem is to isolate key variables. For the sake of an execution interview, a key variable is a variable that correlates highly with the issue at hand.

For example, say you’re diagnosing a problem with a smartphone app and the problem is evenly distributed between users across age groups and geographic locations. If, when you look at the distribution of errors experienced by device, you see that 90% of users experiencing the issue use a Google Pixel, you might consider device to be a key variable.

You’d then gather more data to figure out what Google Pixel usage correlates with the issue. It could be a certain version of the app installed on Pixels, something about the device itself, or even a factor that users who buy Pixels have in common. To isolate key variables, ask yourself these questions:

  • Which variables could reasonably be correlated with the variation I’m seeing?
  • How might I test whether these variables are having an impact without confounding variables affecting the outcome?

When you’re ready, ask your interviewer whatever questions feel meaningful to help you gather the right data. Be sure to explain why you’re asking each question and how the answer changes your understanding of the problem.

Recall that you brainstormed the following list of high-level hypotheses that might explain the increase in cancellations:

  • A product bug or technical issue such as rides being canceled or an issue with the app not showing users that a driver is on the way.
  • A product change in the Lyft app such as making the “cancel” button more prominent.
  • A change to our rider/driver operations such as a change in driver pay that leads drivers to cancel less profitable rides.
  • An external factor like a Taylor Swift concert has created massive traffic causing users to cancel when they see long wait times.

Continuing the example:

"If a product or technical issue caused this, we should be able to attribute it to a particular launch or update. Were there any releases that happened around the time of the spike? Was there a particular version of the app where this started? If so, app version would be a variable we’ve isolated as affecting this metric, which gives us a clear direction to explore.

If a change in our operations is causing the cancellations, it’s likely that other metrics have been affected. Have we seen a change in the number of active drivers? Have we seen a change in the number of ride requests?”

Assume the interviewer tells you:

  • There’s no clear release or app version that caused the issue.
  • We have seen a drop in the number of drivers who complete a ride, but no drop in ride requests. How to Answer Product Root Cause Analysis Whiteboard 3

Step 4: Refine hypotheses and repeat

The goal here is to gather evidence to either support or deprioritize each hypothesis as you drive toward the root cause of the problem.

With more data, refine your hypotheses, generate new questions to ask, and refine further. You may find yourself repeating this step a few times. This is normal. Every additional piece of data helps to make your hypotheses more specific.

Be sure to communicate every step of your thought process to your interviewer and check in frequently. Doing so can preempt tough follow-up questions after you give your final answer.

Continuing our Lyft example, recall that your interviewer confirmed that:

  • There has been no clear release or app version that caused the increase in ride cancellations.
  • There has been a drop in number of drivers who complete a ride, but no drop in ride requests.

Refine your hypothesis as follows:

“Based on these insights, I’d like to deprioritize my hypotheses that the increase in cancellations is due to either a technical issue or a product change, as there’s no evidence that a release or app version is causing the increase in cancellations, and we haven’t seen a decrease in ride requests.

There is more evidence that the issue is related to our operations or an external event. Specifically, it seems like something that’s particularly affecting drivers. Perhaps riders are seeing longer wait times (because there are fewer drivers) so they’re canceling.”

Let’s assume that your interviewer confirms that your analysis is correct, and that drivers are being affected. How to Answer Product Root Cause Analysis Whiteboard 4

Step 5: Identify the root cause

Once you have enough data, explain what you think the root cause of the issue is. If you’ve communicated your thought process throughout, you’ll have made a logical case for your choice, so you’ll only need to give a quick summary here before moving on to evaluation.

Continuing our Lyft example, recall that you have confirmation from your interviewer that the problem is:

  • Likely related to Lyft’s operations, an external event, or both
  • Affecting drivers specifically

Given this insight, it would be helpful to consider what factors would affect drivers’ willingness and ability to drive. Here are a few specific hypotheses that fit the information you have:

  • An external event is causing traffic difficulties
  • Drivers are unhappy with Lyft and leaving
  • Drivers are being paid or treated better on another platform and so are being pulled away

Assume you’ve found no evidence that an external event is causing traffic difficulties over the time period in question, and there is no strong evidence to support drivers are unhappier with Lyft than they were prior to the increase in cancellations. You’d be left with the hypothesis that drivers are being paid or treated better elsewhere, and are leaving Lyft for another platform. How to Answer Product Root Cause Analysis Whiteboard 5

Step 6: Evaluate

If you followed the framework and communicated throughout, your answer will be backed up by data. Quickly recap your major findings, and then spend a few minutes discussing what to do about the root cause you’ve identified. Consider:

  • Whether the root cause should be fixed. Sometimes, a change in a metric is caused by a product change that the company is otherwise happy with. The next steps may just be to keep an eye on things and to act only when it’s clear user experience is being degraded.
  • If a fix is needed, consider what mitigation makes sense. Many candidates reach for a product change immediately, but that’s not always the best option. Use your judgment as a PM.

Wrapping up our Lyft example, let’s say that your interviewer confirms that your hypothesis is correct and that Uber recently increased driver pay substantially, drawing drivers away from Lyft. Close with:

"In the short term, Lyft is losing out — but given margins, I can’t imagine Uber would be able to sustain the economics of such an increase for the long term. We could choose to do nothing for now and wait for Uber to eventually reduce driver pay.

If we did want to respond, I would consider small changes to the app such as making the “cancel” button slightly less prominent. We might also consider more drastic measures such as charging a fee for cancellation, but this could drastically harm the user experience.

Another option could be to raise driver pay to match Uber’s, but we would need to understand the business implications in much more depth before choosing to go that route.” How to Answer Product Root Cause Analysis Whiteboard 6

“Okay” vs. “good” vs. “great” answers

  • In an okay answer, the candidate asks questions that are relevant and offers a light interpretation of the situation but doesn’t explain their overall thinking on the problem. It feels like they have an intuitive sense of what could be going on, but it’s not clear if their approach is comprehensive or deeply thought out. Alternately, they may quickly jump to plausible but specific explanations and focus on proving or disproving them, slowing down their ability to make sense of the situation.
  • A good answer is one where the candidate shares what possibilities they’re considering and it is clear why they’re asking the questions they do. The candidate interprets responses and explains how they change their understanding. The candidate confidently navigates the possibility space and it feels like they make steady progress toward an answer.
  • A great answer does the above but also displays deep insight and ability to contextualize the problem and uses that to guide their approach. For instance, the candidate might draw on product insights, explaining why and when users or drivers cancel, then apply that to quickly narrow in on the most important metrics to check. While regular answers to these questions usually feel like an exploration, great answers often feel like they quickly and intelligently cut through the possibilities and lock on to the heart of the problem.

Common pitfalls

  • Immediately guessing at the root cause. If you are trying to prove or disprove a narrow hypothesis, it will take a long time to find the answer (or the interviewer will give up and guide you toward the answer to keep moving.)
  • Not explaining what you’re considering or why you’re asking particular questions. If you don’t explain your approach, you’re more likely to get lost. To the interviewer, it may feel like there’s no rhyme or reason to your process. It also makes it harder for the interviewer to guide or help you.