Data Scientist Interview Questions

Review this list of 96 data scientist interview questions and answers verified by hiring managers and candidates.

+ Add interview

Product

Engineering

Operations

Design

Marketing

Data

Sales

Finance

Consulting

Add interview

Data Scientist Machine Learning Engineer Data Analyst Product Analyst Business Analyst Software Engineer Product Manager BizOps & Strategy Data Engineer Solutions Architect

Asked at Meta (Facebook) • 3 months ago
You're designing an A/B test to evaluate the impact of showing content from non-friends in users' feeds. How would you test this with proper randomization?
Data Scientist
Statistics & Experimentation
1 answer I was asked this
"Before proceeding, I just wanted to clarify we wanted to check for the impact of showing content from non-friends in users’ feeds, and here non-friends I would assume could be anyone, but mainly like content creators, and I am not including ads here. But I wanted to ask if there is any current logic as to what posts to show based on users' affinity to those posts, maybe basis the user engagement to Insta feed. now objective of this would be to improve the engagement of the platform, as if users"
Dhruv S. - "Before proceeding, I just wanted to clarify we wanted to check for the impact of showing content from non-friends in users’ feeds, and here non-friends I would assume could be anyone, but mainly like content creators, and I am not including ads here. But I wanted to ask if there is any current logic as to what posts to show based on users' affinity to those posts, maybe basis the user engagement to Insta feed. now objective of this would be to improve the engagement of the platform, as if users"See full answer
Data Scientist
Statistics & Experimentation
Asked at Meta (Facebook) • 9 months ago
Imagine you're a data scientist at Meta. There's been a sudden 10% drop in Facebook's daily post views. How would you investigate?
Data Scientist
Data Analysis
3 answers I was asked this
"Define: How is daily post view calculated Isolate Issues: Data issue Time period Geo IOS vs Android vs Web Correlated Metrics in the funnel DAU Time spent/ scrolls Engagement - likes, comments External factors Competitor actions Big events Internal factors Product launch Feature change"
Steve Y. - "Define: How is daily post view calculated Isolate Issues: Data issue Time period Geo IOS vs Android vs Web Correlated Metrics in the funnel DAU Time spent/ scrolls Engagement - likes, comments External factors Competitor actions Big events Internal factors Product launch Feature change"See full answer
Data Scientist
Data Analysis
Asked at Google • 14 days ago
A PM at Google asked you to describe the distribution of daily search queries per user. How would you describe it?
Data Scientist
Statistics & Experimentation
1 answer I was asked this
"Daily search queries per user is expected to be skewed, such that a long tail of users have higher than average number of queries per user while a large majority of the users have fewer queries. The distribution is likely to be right skewed."
Saurabh K. - "Daily search queries per user is expected to be skewed, such that a long tail of users have higher than average number of queries per user while a large majority of the users have fewer queries. The distribution is likely to be right skewed."See full answer
Data Scientist
Statistics & Experimentation
SQL Stored Procedures
Data Scientist
Coding
+2 more
2 answers I was asked this
"it is really good explanation thanks it is really good explanation thanks"
Amney M. - "it is really good explanation thanks it is really good explanation thanks"See full answer
Data Scientist
Coding
+2 more
Asked at Meta (Facebook) • 9 months ago
Imagine you are a data scientist for Instagram. How would you balance ads and follower posts and how would you monitor its effectiveness.
Data Scientist
Data Analysis
1 answer I was asked this
"Product Understanding - Ads are what you see from companies as stories, posts, reels. Post are from users (connections). We have to design an experience which produces maximum engagement while generating ad revenue. Clarifying Questions - Is it specific to posts/stories/reels ? Is there an existing post to ads ratio or do we have to start from scratch? Is it specific to a device/OS? Is it specific to a region/user demographic? Assumption - Existing posts to ads ratio"
Vishal S. - "Product Understanding - Ads are what you see from companies as stories, posts, reels. Post are from users (connections). We have to design an experience which produces maximum engagement while generating ad revenue. Clarifying Questions - Is it specific to posts/stories/reels ? Is there an existing post to ads ratio or do we have to start from scratch? Is it specific to a device/OS? Is it specific to a region/user demographic? Assumption - Existing posts to ads ratio"See full answer
Data Scientist
Data Analysis

🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

How would you model the expected ROI of a new product launch?
Data Scientist
Data Analysis
+3 more
2 answers I was asked this
"To model ROI for a product launch, the first step is to define the timeline you're targeting Example 6 months post-launch, 1 year, or even 5 years. Tip: Start with a 1-year ROI projection to estimate near-term returns, and build a 3-year projection to evaluate growth and scalability. ROI is essentially the net return over that period: Profit=Revenue (within timeline)−Total Cost (from project start) Total Cost includes both fixed and variable costs incurred since t"
Himanshu G. - "To model ROI for a product launch, the first step is to define the timeline you're targeting Example 6 months post-launch, 1 year, or even 5 years. Tip: Start with a 1-year ROI projection to estimate near-term returns, and build a 3-year projection to evaluate growth and scalability. ROI is essentially the net return over that period: Profit=Revenue (within timeline)−Total Cost (from project start) Total Cost includes both fixed and variable costs incurred since t"See full answer
Data Scientist
Data Analysis
+3 more
Asked at DoorDash • 5 months ago
On DoorDash, there are missing item and wrong item issues for deliveries. How would you analyze each of them?
Data Scientist
Statistics & Experimentation
+1 more
1 answer I was asked this
"Missing Item - User ordered multiple items, few items are missing Wrong Item - Entire order is wrong / there are items in the order that were never ordered How is this measured ? CSAT Missing Items Wrong Items Step 1 : Collect data on orders that reported missing / wrong items. Dive deep to understand if the problem is isolated to a specific metro/zip code/restaurant type (say fast food vs fine dine), time of day (lunch vs dinner), tenure of the courier on th"
Saurabh K. - "Missing Item - User ordered multiple items, few items are missing Wrong Item - Entire order is wrong / there are items in the order that were never ordered How is this measured ? CSAT Missing Items Wrong Items Step 1 : Collect data on orders that reported missing / wrong items. Dive deep to understand if the problem is isolated to a specific metro/zip code/restaurant type (say fast food vs fine dine), time of day (lunch vs dinner), tenure of the courier on th"See full answer
Data Scientist
Statistics & Experimentation
+1 more
Asked at Meta (Facebook) • 10 months ago
How would you determine if Facebook Messenger should introduce group calling?
Data Scientist
Data Analysis
+3 more
2 answers I was asked this
"Clarifying Questions and possible responses: both audio and video goals: increase engagement time among groups/communitites and not require another platform to do group call (be one-stop for communication) region-TBD ios/android only available to users in a group to call users within the group who can intitiate these calls?- only admin? or anyone? metrics:NSM: feature engagement (C), number of calls made in a week per user (C). PM: % of people joining the call in a group"
theproductguy - "Clarifying Questions and possible responses: both audio and video goals: increase engagement time among groups/communitites and not require another platform to do group call (be one-stop for communication) region-TBD ios/android only available to users in a group to call users within the group who can intitiate these calls?- only admin? or anyone? metrics:NSM: feature engagement (C), number of calls made in a week per user (C). PM: % of people joining the call in a group"See full answer
Data Scientist
Data Analysis
+3 more
Asked at Lyft • a year ago
A $5 discount coupon is given to N riders. The probability of using a coupon is P. What is the expected cost for the company?
Data Scientist
Statistics & Experimentation
3 answers I was asked this
"Is there a reason a confidence interval was used to solve this problem over just using the mean/expected value directly?"
Aarav G. - "Is there a reason a confidence interval was used to solve this problem over just using the mean/expected value directly?"See full answer
Data Scientist
Statistics & Experimentation
Asked at Microsoft • 8 months ago
In the transformer architecture, what makes the decoder different from the encoder?
Data Scientist
Statistics & Experimentation
2 answers I was asked this
"In the Transformer architecture, the decoder differs from the encoder primarily in its additional mechanisms designed to handle autoregressive sequence generation. Here's a breakdown of the key differences: Self-Attention Mechanism: Encoder: The encoder has a standard self-attention mechanism that allows each token to attend to all other tokens in the input sequence. Decoder: The decoder has two types of self-attention. The first is the same as in the encoder, but the second is mas"
Ranj A. - "In the Transformer architecture, the decoder differs from the encoder primarily in its additional mechanisms designed to handle autoregressive sequence generation. Here's a breakdown of the key differences: Self-Attention Mechanism: Encoder: The encoder has a standard self-attention mechanism that allows each token to attend to all other tokens in the input sequence. Decoder: The decoder has two types of self-attention. The first is the same as in the encoder, but the second is mas"See full answer
Data Scientist
Statistics & Experimentation
Asked at DoorDash • a month ago
You're a PM at a food delivery app where conversion rates have declined over the past week. How would you investigate the causes? (Conversion: From users browsing to placing orders.)
Data Scientist
Behavioral
+2 more
Add answer I was asked this
Data Scientist
Behavioral
+2 more
Two machines produce light bulbs: Machine A produces 60% of the bulbs while machine B produces the remaining. Machine A produces defective bulbs at a rate of 5%, while machine B produces defective ...
Data Scientist
Statistics & Experimentation
1 answer I was asked this
"P(A) = 0.6 P(B) = 0.4 P(D|A) = 0.05 P(D|B) = 0.03 Question asks to solve for P(A|D) P(A|D) = (P(D|A) x P(A))/P(D) = (0.05 x 0.6)/(P(D|A) x P(A) + P(D|B) x P(B)) = (0.05 x 0.6)/(0.05 x 0.6+0.03 x 0.4) = 30/42 = 5/7 = 0.714 Notice above that P(D) = P(D|A) x P(A) + P(D|B) x P (B)"
Saurabh K. - "P(A) = 0.6 P(B) = 0.4 P(D|A) = 0.05 P(D|B) = 0.03 Question asks to solve for P(A|D) P(A|D) = (P(D|A) x P(A))/P(D) = (0.05 x 0.6)/(P(D|A) x P(A) + P(D|B) x P(B)) = (0.05 x 0.6)/(0.05 x 0.6+0.03 x 0.4) = 30/42 = 5/7 = 0.714 Notice above that P(D) = P(D|A) x P(A) + P(D|B) x P (B)"See full answer
Data Scientist
Statistics & Experimentation
Asked at Robinhood • 7 months ago
Robinhood is planning to introduce a new feature which allows users to trade fractional shares. How would you decide whether this is a good idea or not?
Data Scientist
Statistics & Experimentation
1 answer I was asked this
"I would use A/B testing to see if the new feature would be incrementally beneficial. To begin the testing, we should define what's the goal of this testing. Let's say the new feature would increase the average number of trade by X. Then randomly assign the clients to two groups, control and test group. Control group doesn't see the new feature and the test group see the new feature. We could also stratified sampling if we want to make sure cover different customer segmentation. During this desig"
Jiin S. - "I would use A/B testing to see if the new feature would be incrementally beneficial. To begin the testing, we should define what's the goal of this testing. Let's say the new feature would increase the average number of trade by X. Then randomly assign the clients to two groups, control and test group. Control group doesn't see the new feature and the test group see the new feature. We could also stratified sampling if we want to make sure cover different customer segmentation. During this desig"See full answer
Data Scientist
Statistics & Experimentation
What are outliers and how do you detect and handle them?
Data Scientist
Statistics & Experimentation
2 answers I was asked this
"Outliers are data points that significantly deviate from the majority of the data distribution. They can arise due to various reasons, such as measurement errors, natural variability, or rare events. Outliers can distort statistical analyses and machine learning models, making it crucial to detect and handle them properly."
Cesar F. - "Outliers are data points that significantly deviate from the majority of the data distribution. They can arise due to various reasons, such as measurement errors, natural variability, or rare events. Outliers can distort statistical analyses and machine learning models, making it crucial to detect and handle them properly."See full answer
Data Scientist
Statistics & Experimentation
Asked at Netflix • 9 months ago
Imagine you are a data scientist for Netflix. How would you use data to decide whether a TV series is worth renewing?
Data Scientist
Data Analysis
Add answer I was asked this
Data Scientist
Data Analysis
Asked at Meta (Facebook) • 14 days ago
A PM at Meta asked you to describe the distribution of daily minutes spent on Facebook per user. How would you describe it?
Data Scientist
Statistics & Experimentation
Add answer I was asked this
Data Scientist
Statistics & Experimentation
Asked at Meta (Facebook), Goldman Sachs, LinkedIn • 7 months ago
Explain Bayes' theorem.
Data Scientist
Concept
+2 more
3 answers I was asked this
"Is it bad to get the answer a different way? Will they mark that as not knowing Bayes Theorem or just correct as it is an easier way to get the answer? The way I went is to look at what happens when the factory makes 100 light bulbs. Machine A makes 60 of which 3 are faulty, Machine B makes 40 of which 1.2 are faulty. Therefore the pool of faulty lightbulbs is 3/4.2 = 5/7 from machine A and 1.2/4.2 = 3/7 from Machine B."
Will I. - "Is it bad to get the answer a different way? Will they mark that as not knowing Bayes Theorem or just correct as it is an easier way to get the answer? The way I went is to look at what happens when the factory makes 100 light bulbs. Machine A makes 60 of which 3 are faulty, Machine B makes 40 of which 1.2 are faulty. Therefore the pool of faulty lightbulbs is 3/4.2 = 5/7 from machine A and 1.2/4.2 = 3/7 from Machine B."See full answer
Data Scientist
Concept
+2 more
How would you design an A/B test for a new campaign?
Data Scientist
Statistics & Experimentation
Add answer I was asked this
Data Scientist
Statistics & Experimentation
Asked at Amazon • 2 years ago
Session Data Analysis.
Hard
Data Scientist
Coding
+3 more
5 answers I was asked this
"1) select avg(session) from table where session> 180 2) select round(sessiontime/300)*300 as sessionbin, count() as sessioncount from table group by round(sessiontime/300)300 order by session_bin 3) SELECT t1.country AS country_a, t2.country AS country_b FROM ( SELECT country, COUNT(*) AS session_count FROM yourtablename GROUP BY country ) AS t1 JOIN ( SELECT country, COUNT(*) AS session_count FROM yourtablename `GROUP BY countr"
Erjan G. - "1) select avg(session) from table where session> 180 2) select round(sessiontime/300)*300 as sessionbin, count() as sessioncount from table group by round(sessiontime/300)300 order by session_bin 3) SELECT t1.country AS country_a, t2.country AS country_b FROM ( SELECT country, COUNT(*) AS session_count FROM yourtablename GROUP BY country ) AS t1 JOIN ( SELECT country, COUNT(*) AS session_count FROM yourtablename `GROUP BY countr"See full answer
Data Scientist
Coding
+3 more
Asked at Amazon, Meta (Facebook), LinkedIn • 2 years ago
Implement k-means clustering.
Data Scientist
Coding
+4 more
Add answer I was asked this
Data Scientist
Coding
+4 more

Showing 1-20 of 96

Interviewed recently?

Help improve our question database (and earn karma) by telling us about your experience

Trending companies

Data Scientist Interview Questions

You're designing an A/B test to evaluate the impact of showing content from non-friends in users' feeds. How would you test this with proper randomization?

Imagine you're a data scientist at Meta. There's been a sudden 10% drop in Facebook's daily post views. How would you investigate?

A PM at Google asked you to describe the distribution of daily search queries per user. How would you describe it?

SQL Stored Procedures

Imagine you are a data scientist for Instagram. How would you balance ads and follower posts and how would you monitor its effectiveness.

How would you model the expected ROI of a new product launch?

On DoorDash, there are missing item and wrong item issues for deliveries. How would you analyze each of them?

How would you determine if Facebook Messenger should introduce group calling?

A $5 discount coupon is given to N riders. The probability of using a coupon is P. What is the expected cost for the company?

In the transformer architecture, what makes the decoder different from the encoder?

You're a PM at a food delivery app where conversion rates have declined over the past week. How would you investigate the causes? (Conversion: From users browsing to placing orders.)

Two machines produce light bulbs: Machine A produces 60% of the bulbs while machine B produces the remaining. Machine A produces defective bulbs at a rate of 5%, while machine B produces defective ...

Robinhood is planning to introduce a new feature which allows users to trade fractional shares. How would you decide whether this is a good idea or not?

What are outliers and how do you detect and handle them?

Imagine you are a data scientist for Netflix. How would you use data to decide whether a TV series is worth renewing?

A PM at Meta asked you to describe the distribution of daily minutes spent on Facebook per user. How would you describe it?

Explain Bayes' theorem.

How would you design an A/B test for a new campaign?

Session Data Analysis.

Implement k-means clustering.

Explore questions by company

Explore questions by role