Data Scientist Interview Questions

Review this list of 160 data scientist interview questions and answers verified by hiring managers and candidates.

+ Share interview

Product

Engineering

Operations

Design

Marketing

Data

Sales

Finance

Consulting

Security

Share interview

Find Customer Lifetime Value (LTV)
IDE
Medium
Data Scientist
Coding
+3 more
6 answers I was asked this
+3
"-- Write your query here select u.userid as userid, IFNULL(sum(purchase_value), 0) AS LTV FROM user_sessions u JOIN attribution a ON u.sessionid = a.sessionid group by user_id order by LTV desc ; Needs a full join. Wondering why cant we do a left outer join here. All the sessions should have complete data."
Aneesha K. - "-- Write your query here select u.userid as userid, IFNULL(sum(purchase_value), 0) AS LTV FROM user_sessions u JOIN attribution a ON u.sessionid = a.sessionid group by user_id order by LTV desc ; Needs a full join. Wondering why cant we do a left outer join here. All the sessions should have complete data."See full answer
Data Scientist
Coding
+3 more
Asked at Meta (Facebook) • 5 months ago
How can you improve Facebook’s DAU?
Data Scientist
Analytical
Add answer I was asked this
Data Scientist
Analytical
Asked at Meta (Facebook) • 5 months ago
A user advocacy group raises concerns about accessibility for individuals with hearing disabilities. What are some product improvements for Facebook Live and Videos, and how would you define succes...
Data Scientist
Execution
Add answer I was asked this
Data Scientist
Execution
Asked at Microsoft • a year ago
Given a list of sentences, find the top n most frequent words.
Data Scientist
Coding
Add answer I was asked this
Data Scientist
Coding
Asked at Adobe, Apple, Booking.com + 10 more • 7 months ago
Find the maximum subarray sum.
IDE
Medium
Data Scientist
Data Structures & Algorithms
+4 more
27 answers I was asked this
+19
" O(n) time, O(1) space from typing import List def maxsubarraysum(nums: List[int]) -> int: if len(nums) == 0: return 0 maxsum = currsum = nums[0] for i in range(1, len(nums)): currsum = max(currsum + nums[i], nums[i]) maxsum = max(currsum, max_sum) return max_sum debug your code below print(maxsubarraysum([-1, 2, -3, 4])) `"
Rick E. - " O(n) time, O(1) space from typing import List def maxsubarraysum(nums: List[int]) -> int: if len(nums) == 0: return 0 maxsum = currsum = nums[0] for i in range(1, len(nums)): currsum = max(currsum + nums[i], nums[i]) maxsum = max(currsum, max_sum) return max_sum debug your code below print(maxsubarraysum([-1, 2, -3, 4])) `"See full answer
Data Scientist
Data Structures & Algorithms
+4 more

🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

Asked at Adobe, Meta (Facebook), Oracle + 1 more • a year ago
Determine if a given binary tree is a binary search tree (BST).
IDE
Medium
Data Scientist
Coding
+4 more
9 answers I was asked this
+5
"bool isValidBST(TreeNode* root, long min = LONGMIN, long max = LONGMAX){ if (root == NULL) return true; if (root->val val >= max) return false; return isValidBST(root->left, min, root->val) && isValidBST(root->right, root->val, max); } `"
Alvaro R. - "bool isValidBST(TreeNode* root, long min = LONGMIN, long max = LONGMAX){ if (root == NULL) return true; if (root->val val >= max) return false; return isValidBST(root->left, min, root->val) && isValidBST(root->right, root->val, max); } `"See full answer
Data Scientist
Coding
+4 more
Asked at Adobe, Bytedance, Meta (Facebook) + 3 more • 6 months ago
Merge two sorted lists
Data Scientist
Data Structures & Algorithms
+4 more
6 answers I was asked this
+3
"function main(){ const v1=[2,3, 4, 10] const v2= [3,4 ,5,20, 23] return merge(v1,v2); } function merge(left, right){ const result=[]; while(left.length>0&& right.length>0){ if(left[0]0){ result=result.concat(left) } if(right.length>0){ result=result.concat(right) } return result; }"
Samuel M. - "function main(){ const v1=[2,3, 4, 10] const v2= [3,4 ,5,20, 23] return merge(v1,v2); } function merge(left, right){ const result=[]; while(left.length>0&& right.length>0){ if(left[0]0){ result=result.concat(left) } if(right.length>0){ result=result.concat(right) } return result; }"See full answer
Data Scientist
Data Structures & Algorithms
+4 more
Asked at Amazon • 5 months ago
Hypothesis Testing: Suppose a PM claims that users, on average, spend about $50 per month on Amazon. However, you doubt this claim and believe the average should be higher. You sample 100 users and...
Data Scientist
Statistics & Experimentation
1 answer I was asked this
"I would conduct a sample z-test because we have enough samples and the population variance is known. H1: average monthly spending per user is $50 H0: average monthly spending per user is greater $50 One-sample z-test x_bar = $85 mu = $50 s = $20 n = 100 x_bar - mu / (s / sqrt(n) = 17.5 17.5 is the z-score that we will need to associate with its corresponding p-value. However, the z-score is very high, so the p-value will be very close to zero, which is much less than the standa"
Lucas G. - "I would conduct a sample z-test because we have enough samples and the population variance is known. H1: average monthly spending per user is $50 H0: average monthly spending per user is greater $50 One-sample z-test x_bar = $85 mu = $50 s = $20 n = 100 x_bar - mu / (s / sqrt(n) = 17.5 17.5 is the z-score that we will need to associate with its corresponding p-value. However, the z-score is very high, so the p-value will be very close to zero, which is much less than the standa"See full answer
Data Scientist
Statistics & Experimentation
Asked at Google • 4 years ago
What is the best way to connect SQL databases and why?
Data Scientist
Concept
+5 more
2 answers I was asked this
"Clarification questions What is the purpose of connecting the DB? Do we expect high-volumes of traffic to hit the DB Do we have scalability or reliability concerns? Format Code -> DB Code -> Cache -> DB API -> Cache -> DB - APIs are built for a purpose and have a specified protocol (GET, POST, DELETE) to speak to the DB. APIs can also use a contract to retrieve information from a DB much faster than code. Load balanced APIs -> Cache -> DB **Aut"
Aaron W. - "Clarification questions What is the purpose of connecting the DB? Do we expect high-volumes of traffic to hit the DB Do we have scalability or reliability concerns? Format Code -> DB Code -> Cache -> DB API -> Cache -> DB - APIs are built for a purpose and have a specified protocol (GET, POST, DELETE) to speak to the DB. APIs can also use a contract to retrieve information from a DB much faster than code. Load balanced APIs -> Cache -> DB **Aut"See full answer
Data Scientist
Concept
+5 more
Asked at Discord • a year ago
What other companies are you interviewing at and why?
Data Scientist
Behavioral
+4 more
Add answer I was asked this
Data Scientist
Behavioral
+4 more
Asked at Amazon, Apple, Meta (Facebook) + 3 more • 8 months ago
What are you passionate about?
Data Scientist
Behavioral
+4 more
2 answers I was asked this
"Law is my passion. Traveling all over the world in 5 years"
Moshe S. - "Law is my passion. Traveling all over the world in 5 years"See full answer
Data Scientist
Behavioral
+4 more
Asked at TikTok, Valve • 3 years ago
As the data scientist, interpreting a significant increase in revenue from a new feature in one of 20 countries, what would you recommend?
Data Scientist
Analytical
2 answers I was asked this
"too much discussing on p-value…. and theoritical things…. country are independant…."
Brook - "too much discussing on p-value…. and theoritical things…. country are independant…."See full answer
Data Scientist
Analytical
Find Campaign Purchases
IDE
Medium
Data Scientist
Coding
+3 more
6 answers I was asked this
+3
"SELECT upsellcampaignid, COUNT(DISTINCT trans.userid) AS eligibleusers FROM campaign JOIN "transaction" AS trans ON transactiondate BETWEEN datestart AND date_end JOIN user ON trans.userid = user.userid WHERE iseligibleforupsellcampaign = 1 GROUP BY upsellcampaignid `"
Alina G. - "SELECT upsellcampaignid, COUNT(DISTINCT trans.userid) AS eligibleusers FROM campaign JOIN "transaction" AS trans ON transactiondate BETWEEN datestart AND date_end JOIN user ON trans.userid = user.userid WHERE iseligibleforupsellcampaign = 1 GROUP BY upsellcampaignid `"See full answer
Data Scientist
Coding
+3 more
Asked at Anthropic • a month ago
Identify success metrics for a marketing campaign to get new users, then design an experiment to determine if the campaign should continue.
Data Scientist
Statistics & Experimentation
1 answer I was asked this
"Marketing campaigns are run through different channels such as social media, emails, SEO, web advertising, events, etc. Let’s look at some of the overall success metrics at a broader level: Total views for your campaign Unique views for your campaign Returning visitors for your campaign Engagement for your campaign (If it’s a social media campaign, the marketer might be interested in knowing the number of users engaging with the campaign and the type of campaign positive/negative) 5"
Sangeeta P. - "Marketing campaigns are run through different channels such as social media, emails, SEO, web advertising, events, etc. Let’s look at some of the overall success metrics at a broader level: Total views for your campaign Unique views for your campaign Returning visitors for your campaign Engagement for your campaign (If it’s a social media campaign, the marketer might be interested in knowing the number of users engaging with the campaign and the type of campaign positive/negative) 5"See full answer
Data Scientist
Statistics & Experimentation
Asked at Amazon, Discord, Slack • 8 months ago
How do you encourage collaboration among cross-functional teams?
Data Scientist
Behavioral
+4 more
2 answers I was asked this
"1) Have a common goal 2) Have a clear and fair accountability between teams 3) Ensure conflicts are resolved in time on common issues 4) Promote common Brain-storming , problem solving sessions 5) Most important , Have clear and effective communication established and practised"
Saurabh N. - "1) Have a common goal 2) Have a clear and fair accountability between teams 3) Ensure conflicts are resolved in time on common issues 4) Promote common Brain-storming , problem solving sessions 5) Most important , Have clear and effective communication established and practised"See full answer
Data Scientist
Behavioral
+4 more
Asked at Microsoft • a year ago
Given a list of numbers, find the median without sorting the entire list. Hint: Use quick sort algorithm.
Data Scientist
Coding
Add answer I was asked this
Data Scientist
Coding
Asked at Walmart Labs • a year ago
Why do you want to work at Walmart Labs?
Data Scientist
Behavioral
+5 more
Add answer I was asked this
Data Scientist
Behavioral
+5 more
Asked at Tinder • 2 years ago
Tinder subscriptions renew monthly. Explain why different months may have different numbers of renewals.
Data Scientist
Technical
1 answer I was asked this
"Clarification question: How many subscription plans are offered by Tinder ? If there is more than one subscription plan, then we need to ask is the fluctuation happening across all plans or in a particular one ? Assumption: Let's say lower priced subscription plan is showing the most fluctuation and there are only two types of plans In this subscription plan which age group is showing the most fluctuation (18-24,25-30, 30+ etc) ? Is there any seasonality trend observed (eg: placemen"
Srijita P. - "Clarification question: How many subscription plans are offered by Tinder ? If there is more than one subscription plan, then we need to ask is the fluctuation happening across all plans or in a particular one ? Assumption: Let's say lower priced subscription plan is showing the most fluctuation and there are only two types of plans In this subscription plan which age group is showing the most fluctuation (18-24,25-30, 30+ etc) ? Is there any seasonality trend observed (eg: placemen"See full answer
Data Scientist
Technical
Fraudulent Transactions
IDE
Medium
Data Scientist
Coding
+3 more
5 answers I was asked this
+2
"WITH suspicious_transactions AS ( SELECT c.first_name, c.last_name, t.receipt_number, COUNT(t.receiptnumber) OVER (PARTITION BY c.customerid) AS noofoffences FROM customers c JOIN transactions t ON c.customerid = t.customerid WHERE t.receipt_number LIKE '%999%' OR t.receipt_number LIKE '%1234%' OR t.receipt_number LIKE '%XYZ%' ) SELECT first_name, last_name, receipt_number, noofoffences FROM suspicious_transactions WHERE noofoffences >= 2;"
Jayveer S. - "WITH suspicious_transactions AS ( SELECT c.first_name, c.last_name, t.receipt_number, COUNT(t.receiptnumber) OVER (PARTITION BY c.customerid) AS noofoffences FROM customers c JOIN transactions t ON c.customerid = t.customerid WHERE t.receipt_number LIKE '%999%' OR t.receipt_number LIKE '%1234%' OR t.receipt_number LIKE '%XYZ%' ) SELECT first_name, last_name, receipt_number, noofoffences FROM suspicious_transactions WHERE noofoffences >= 2;"See full answer
Data Scientist
Coding
+3 more
Asked at SAP • 2 years ago
Design a system capable of identifying ships that deviate from their course using a dataset that tracks ship positions, recorded as tuples containing (ship_ID, x, y, z, timestamp), with irregular t...
Data Scientist
System Design
1 answer I was asked this
"To handle the non-uniform sampling, I'd first clean and divide the dataset into chunks of n second interval 'uniform' trajectory data(e.g. 5s or 10s trajectories). This gives us a cleaner trajectory data chunks, T, of format (ship_ID, x, y, z, timestamp) to be formed. For the system itself, I'd use a generative model, e.g. Variational AutoEncoder (VAE), and train the model's 'encoder' to produce a latent-space representation of input features (x,y,z,timestamp) from T, and it's 'decoder' to pred"
Anonymous Hornet - "To handle the non-uniform sampling, I'd first clean and divide the dataset into chunks of n second interval 'uniform' trajectory data(e.g. 5s or 10s trajectories). This gives us a cleaner trajectory data chunks, T, of format (ship_ID, x, y, z, timestamp) to be formed. For the system itself, I'd use a generative model, e.g. Variational AutoEncoder (VAE), and train the model's 'encoder' to produce a latent-space representation of input features (x,y,z,timestamp) from T, and it's 'decoder' to pred"See full answer
Data Scientist
System Design

Showing 101-120 of 160

Interviewed recently?

Help improve our question database (and earn karma) by telling us about your experience

+ Share interview experience

Trending companies

Data Scientist Interview Questions

Find Customer Lifetime Value (LTV)

How can you improve Facebook’s DAU?

A user advocacy group raises concerns about accessibility for individuals with hearing disabilities. What are some product improvements for Facebook Live and Videos, and how would you define succes...

Given a list of sentences, find the top n most frequent words.

Find the maximum subarray sum.

Determine if a given binary tree is a binary search tree (BST).

Merge two sorted lists

Hypothesis Testing: Suppose a PM claims that users, on average, spend about $50 per month on Amazon. However, you doubt this claim and believe the average should be higher. You sample 100 users and...

What is the best way to connect SQL databases and why?

What other companies are you interviewing at and why?

What are you passionate about?

As the data scientist, interpreting a significant increase in revenue from a new feature in one of 20 countries, what would you recommend?

Find Campaign Purchases

Identify success metrics for a marketing campaign to get new users, then design an experiment to determine if the campaign should continue.

How do you encourage collaboration among cross-functional teams?

Given a list of numbers, find the median without sorting the entire list. Hint: Use quick sort algorithm.

Why do you want to work at Walmart Labs?

Tinder subscriptions renew monthly. Explain why different months may have different numbers of renewals.

Fraudulent Transactions

Design a system capable of identifying ships that deviate from their course using a dataset that tracks ship positions, recorded as tuples containing (ship_ID, x, y, z, timestamp), with irregular t...

Explore questions by company

Explore questions by role