Exponent Team • Last updated In this mock interview, George Perantatos (Product Director, Redfin) answers a product-execution interview question.
The question is, "Increase customer retention for Disney+ or Netflix."
He's being interviewed by Chris Wilson (Senior+ Product Manager).
What separates Georges' approach wasn't technical knowledge about p-values or sample sizes. It was his judgment about when experiments matter and when they don't.
Here's what he did right, and what you can learn from it.
The interviewer's verdict: "I would love to have George on my team."
Most candidates treat experimentation as the default answer to every product decision.
George established a more nuanced framework:
"I would not use experiments all the time. As a product manager, you need to have conviction in what you're doing and where you're heading.
Some things just don't lend themselves well to experimentation. It is not a hammer and nail where everything has to be hit by this hammer called experiments."
Use experiments when:
Don't use experiments when:
"A new zero-to-one exploratory AI-based chat experience? That matters less for A/B testing and measurement.
What matters more is what users are doing. Let's learn. It's very early stage, it's speculative. The value in talking to people, not looking at statistics."
"PMs are often not grounded on the problem they're trying to solve. They jump to solutions and say, 'Oh, I can use experiments to test things and see what happens.'
Without a good grounding on what problem we're really trying to solve, it just becomes a bit of a dart game."
This manifests in two ways:
"Teams generate ideas in planning meetings and test whatever ideas come up. They move elements around, change colors, try different copy, etc.
But they're not solving a specific user problem.
When an experiment shows mixed results, they don't know what to do because they never articulated what success would mean."
"Sometimes we have divergent results. Some metric is going up and some metric is going down.
PMs don't feel like they can use intuition and judgment in combination with data to make a call, and they feel frozen."
Most candidates would immediately start defining control groups and metrics.
George did something different.
"Let's talk about user problems. I would imagine your scenario includes some problem where people are not finding what they're looking for in this service. They don't feel like it's for them, and they're not coming back."
George re-framed the exercise before accepting the hypothesis.
There's a business problem (retention) and a user problem (not finding content). The recommendation section might address that, but only if you understand what's actually broken.
"I would expect we'd want to validate first. Is something up with Recommended For You? Are a lot of people seeing it and not using it?
Or is it under-exposed? Meaning, not a lot of people are seeing this feature but it's very popular when they do."
Before designing any test, understand the current state:
George made his assumptions explicit:
"Let's assume Recommended For You is pretty visible in the app. It's on every first-time load, but we believe the recommendations just don't feel very up-to-date. They don't feel very relevant."
"The next question is why? Why is it not being used as much? The recommendations could be bad quality. Or it's not very discoverable. Or it's a UI problem."
George chose one: the quality hypothesis.
"If we had one additional piece of information to feed into the recommendation model, it would spit out better recommendations, which means more clicks, which means more people finding things to view."
George's specific example: adding watch duration data to the model. Not just which shows users watch, but how long they watch them. One input change. One model improvement.
"Don't test too many things at once because it makes the experiment unclear. If you combine too many changes in one variant, what actually changed? What drove the change? You can stack learnings more quickly versus doing one giant experiment."
What not to do:
"People want to test two or three different signals in the ML model. I'm like 'no, let's test one. What's the most promising one?' Or 'what if we do a UI test and an ML model change?' No, no, let's keep those separate."
What to do:
"What is the smallest and most valuable learning we can have? What's the biggest ROI thing we can do that we think will drive movement in the metric? Let's test one thing. If that's a dud, we can take a different route."
The typical approach: "We're testing retention, so we'll measure weekly active users and subscription churn."
George's approach:
"I would love to know if retention is affected. But I would first want to say—that's an output metric. What's the input? The input is: are people clicking on Recommended For You? Are people scrolling? Are people viewing?"
His reasoning:
"I would look at the actual engagement metrics within the UI that we're changing and see if we are actually improving what we believe are signals that people are more engaged. That should correlate to more daily active users. It should correlate to lower churn."
Why this matters:
"Don't test too many things at once. You can stack learnings more quickly versus doing one giant experiment."
George also acknowledged when to slow down:
"If a problem is very novel or we're very uncertain, I would want to at least see for my own self—do these feel better? It's not worth running an experiment if you're way off. It's cheaper to test with a prototype with users than to have engineers build it and ship it as an experiment only to realize we were so far off the mark."
The interviewer, Chris, explained what made George stand out:
After the interview, George reflected on what he could have done better:
"It's hard to do in the moment, but maybe being a little bit more structured in the scenario would be good. My tendency is just to dive in and figure it out. I could have paused and said 'I'm going to first talk about the user problem, then I'm going to talk about potential solutions and how we'll measure them.'"
The benefit of signposting:
"It would have given Chris a sense of what I'm about to say for the next few minutes. In case I'm off with what he's looking for, he can redirect. He's also thinking about how he's going to grade me, maybe he has a follow-up question."
Chris agreed:
"Pausing, even writing stuff down, shows you're thinking through it. It also gives the interviewer a breather to think about next questions. You've led them down a path with your answers, so it gives them a second to breathe."
George's final insight about what separates good from great in these interviews:
"Don't forget about those higher-level skills. Even if it is a deep 'what's the hypothesis, show me a dashboard' type of question—take a few moments to express the higher level. Should we even run this experiment? How do I decide what's in my roadmap to experiment on? It shows you have higher-level thinking. It really rounds out a candidate. It could be a tiebreaker between two candidates equally strong on the technical side."
Anyone can learn statistical significance and sample size calculations. Those are table stakes.
The skills that distinguish senior leadership are:
Before the interview:
During setup questions:
During scenario questions:
When discussing experiment design:
The interviewer isn't evaluating whether you can execute experiments flawlessly. They're evaluating your judgment about when to use them and how they fit into building products.
Show that you can think strategically, not just tactically.
Create your free Exponent account and learn how to ace your interviews.
Get started free ->Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.
Create your free account