Data Scientist Interview Questions

Review this list of 278 data scientist interview questions and answers verified by hiring managers and candidates.

+ Add interview

Product

Engineering

Operations

Design

Marketing

Data

Sales

Finance

Consulting

Add interview

Product Manager Software Engineer Data Scientist Technical Program Manager Engineering Manager Data Engineer Machine Learning Engineer Data Analyst BizOps & Strategy Business Analyst

Asked at Adobe, Apple, Salesforce + 1 more • 7 months ago
Write a function to return all prime numbers up to a given number n.
IDE
Medium
Data Scientist
Data Structures & Algorithms
+4 more
11 answers I was asked this
+7
"function findPrimes(n) { if (n < 2) return []; const primes = []; for (let i=2; i <= n; i++) { const half = Math.floor(i/2); let isPrime = true; for (let prime of primes) { if (i % prime === 0) { isPrime = false; break; } } if (isPrime) { primes.push(i); } } return primes; } `"
Tiago R. - "function findPrimes(n) { if (n < 2) return []; const primes = []; for (let i=2; i <= n; i++) { const half = Math.floor(i/2); let isPrime = true; for (let prime of primes) { if (i % prime === 0) { isPrime = false; break; } } if (isPrime) { primes.push(i); } } return primes; } `"See full answer
Data Scientist
Data Structures & Algorithms
+4 more
Post Success By Age Group.
IDE
Medium
Data Scientist
Coding
+1 more
9 answers I was asked this
+6
" with youngsuccrate as( select strftime('%m', postdate) AS postmonth, round(sum(issuccessfulpost)*1.0/count(issuccessfulpost),2)as yascrate from post where userid in (select userid from post_user where age between 0 and 18) group by post_month ), nonyoungsucc_rate as( select strftime('%m', postdate) AS postmonth, round(sum(issuccessfulpost)*1.0/count(issuccessfulpost),2)as nonyasc_rate from post where user_id in (select"
Bhavna S. - " with youngsuccrate as( select strftime('%m', postdate) AS postmonth, round(sum(issuccessfulpost)*1.0/count(issuccessfulpost),2)as yascrate from post where userid in (select userid from post_user where age between 0 and 18) group by post_month ), nonyoungsucc_rate as( select strftime('%m', postdate) AS postmonth, round(sum(issuccessfulpost)*1.0/count(issuccessfulpost),2)as nonyasc_rate from post where user_id in (select"See full answer
Data Scientist
Coding
+1 more
Total Transaction Volume
IDE
Hard
Data Scientist
Coding
+1 more
14 answers I was asked this
+11
"I'm pretty sure Exponent's answer is wrong. In the snippet below, they use "pl.name = 'Telephones' to attempt to filter down to the Telephone transactions, but they do this within a LEFT JOIN which means all product_lines rows are returned. > LEFT JOIN product_lines pl > ON p.productlineid = pl.id > AND pl.name = 'Telephones' Below is my solution. Also, I didn't see anywhere that said the "amount" column was in cents instead of dollars, but I still divided by 100 to be consistent with Exp"
Bradley E. - "I'm pretty sure Exponent's answer is wrong. In the snippet below, they use "pl.name = 'Telephones' to attempt to filter down to the Telephone transactions, but they do this within a LEFT JOIN which means all product_lines rows are returned. > LEFT JOIN product_lines pl > ON p.productlineid = pl.id > AND pl.name = 'Telephones' Below is my solution. Also, I didn't see anywhere that said the "amount" column was in cents instead of dollars, but I still divided by 100 to be consistent with Exp"See full answer
Data Scientist
Coding
+1 more
Post Success After Failure
IDE
Hard
Data Scientist
Coding
+1 more
7 answers I was asked this
+4
"I might be missing something but the solution, seems to be incorrect. ... , post_pairings AS ( SELECT ps.user_id, ps.postseqid AS failpostid, ps.postseqid + 1 AS nextpostid FROM post_seq AS ps WHERE ps.issuccessfulpost IS TRUE ) -- here ps.issuccessfulpost IS TRUE the condition should be FALSE -- in that way ps.postseqid is the actual failed post(failpostid) -- Additionally, at the end the join is assumming that the sequence id is going to match the post_id, wh"
Jaime A. - "I might be missing something but the solution, seems to be incorrect. ... , post_pairings AS ( SELECT ps.user_id, ps.postseqid AS failpostid, ps.postseqid + 1 AS nextpostid FROM post_seq AS ps WHERE ps.issuccessfulpost IS TRUE ) -- here ps.issuccessfulpost IS TRUE the condition should be FALSE -- in that way ps.postseqid is the actual failed post(failpostid) -- Additionally, at the end the join is assumming that the sequence id is going to match the post_id, wh"See full answer
Data Scientist
Coding
+1 more
How do concepts like R-squared, multicollinearity, and statistical tests (T-test vs. F-test) help evaluate linear regression models and interpret their results effectively?
Data Scientist
Statistics & Experimentation
Add answer I was asked this
Data Scientist
Statistics & Experimentation

🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

Say you flip a coin 10 times and observe only one head. What would be your null hypothesis and p-value for testing whether the coin is fair or not?
Data Scientist
Statistics & Experimentation
Add answer I was asked this
Data Scientist
Statistics & Experimentation
Explain Type I and Type II errors and the trade-offs between them.
Data Scientist
Statistics & Experimentation
Add answer I was asked this
Data Scientist
Statistics & Experimentation
In an A/B test, how can you check if assignment to the various buckets was truly random?
Data Scientist
Statistics & Experimentation
Add answer I was asked this
Data Scientist
Statistics & Experimentation
Uber booking rates have dropped by 10%. How would you address this?
Data Scientist
Analytical
+2 more
1 answer I was asked this
"I'll approach this by asking questions and creating scenarios based on answers. If the answer to any questions points to a specific issue, I'll double down on that to understand the issue. This is more of a RCA question. First I'll quickly check Were there any recent updates released before we observed the drops? Assuming Yes, I'll check the updates released and check the test cases and testing results If the issue is identified, I will engage with the product leadership, update them and conne"
Aekagra S. - "I'll approach this by asking questions and creating scenarios based on answers. If the answer to any questions points to a specific issue, I'll double down on that to understand the issue. This is more of a RCA question. First I'll quickly check Were there any recent updates released before we observed the drops? Assuming Yes, I'll check the updates released and check the test cases and testing results If the issue is identified, I will engage with the product leadership, update them and conne"See full answer
Data Scientist
Analytical
+2 more
Post Success By Interface.
IDE
Medium
Data Scientist
Coding
+1 more
8 answers I was asked this
+5
"Select interface, Count(case when issuccessfulpost then 1 end) as post_success, Count() as postattempt, ROUND((COUNT(CASE WHEN issuccessfulpost THEN 1 END) * 100 / COUNT()), 2) AS postsuccess_rate from post where interface like 'Iphone%' group by 1 order by postsuccessrate desc `"
Richard B. - "Select interface, Count(case when issuccessfulpost then 1 end) as post_success, Count() as postattempt, ROUND((COUNT(CASE WHEN issuccessfulpost THEN 1 END) * 100 / COUNT()), 2) AS postsuccess_rate from post where interface like 'Iphone%' group by 1 order by postsuccessrate desc `"See full answer
Data Scientist
Coding
+1 more
Asked at Adobe, Apple, Capital One + 11 more • 7 months ago
Given stock prices for the next n days, how can you maximize your profit by buying or selling one share per day?
IDE
Medium
Data Scientist
Data Structures & Algorithms
+4 more
12 answers I was asked this
+9
"from typing import List def maxprofitgreedy(stock_prices: List[int]) -> int: l=0 # buying r=1 # selling max_profit=0 while rstock_prices[l]: profit=stockprices[r]-stockprices[l] maxprofit=max(maxprofit,profit) else: l=r r+=1 return max_profit debug your code below print(maxprofitgreedy([7, 1, 5, 3, 6, 4])) `"
Prajwal M. - "from typing import List def maxprofitgreedy(stock_prices: List[int]) -> int: l=0 # buying r=1 # selling max_profit=0 while rstock_prices[l]: profit=stockprices[r]-stockprices[l] maxprofit=max(maxprofit,profit) else: l=r r+=1 return max_profit debug your code below print(maxprofitgreedy([7, 1, 5, 3, 6, 4])) `"See full answer
Data Scientist
Data Structures & Algorithms
+4 more
Asked at Apple, Goldman Sachs, Oracle + 2 more • 7 months ago
Linked List Cycle
IDE
Easy
Data Scientist
Data Structures & Algorithms
+4 more
13 answers I was asked this
+8
"I was able to answer this question and the follow-up questions as well"
Anonymous Wasp - "I was able to answer this question and the follow-up questions as well"See full answer
Data Scientist
Data Structures & Algorithms
+4 more
Asked at Dropbox, Google Cloud (GCP), JP Morgan Chase • 3 years ago
Tell me about an impactful project that you led.
Data Scientist
Behavioral
+1 more
2 answers I was asked this
"I responded with a project that I was a part of during my capstone class. I described how I used HTML, Python, and PostGRESQL in conjunction to create a functioning website using SCRUM."
Kanishkan V. - "I responded with a project that I was a part of during my capstone class. I described how I used HTML, Python, and PostGRESQL in conjunction to create a functioning website using SCRUM."See full answer
Data Scientist
Behavioral
+1 more
What is a p-value?
Data Scientist
Statistics & Experimentation
2 answers I was asked this
"It is the smallest level of significance at which the null hypothesis gets rejected"
Farza S. - "It is the smallest level of significance at which the null hypothesis gets rejected"See full answer
Data Scientist
Statistics & Experimentation
Describe a situation where an A/B test would not be appropriate. What type of test would you run instead? Why?
Data Scientist
Statistics & Experimentation
1 answer I was asked this
"A/B testing is used when one wishes to only test minor front-end changes on the website. Consider a scenario where an organization wishes to make significant changes to its existing page, such as wants to create an entirely new version of an existing web page URL and wants to analyze which one performs better. Obviously, the organization will not be willing to touch the existing web page design for comparison purposes. In the above scenario, performing Split URL testing would be beneficial. T"
Sangeeta P. - "A/B testing is used when one wishes to only test minor front-end changes on the website. Consider a scenario where an organization wishes to make significant changes to its existing page, such as wants to create an entirely new version of an existing web page URL and wants to analyze which one performs better. Obviously, the organization will not be willing to touch the existing web page design for comparison purposes. In the above scenario, performing Split URL testing would be beneficial. T"See full answer
Data Scientist
Statistics & Experimentation
Asked at Adobe, Amazon, Apple + 12 more • 7 months ago
Given an array, find the two sum.
IDE
Easy
Data Scientist
Data Structures & Algorithms
+5 more
41 answers I was asked this
+37
"from typing import List def two_sum(nums: List[int], target: int) -> List[int]: prevMap = {} for i, n in enumerate(nums): diff = target - n if diff in prevMap: return [prevMap[diff], i] else: prevMap[n] = i return [] debug your code below print(two_sum([2, 7, 11, 15], 9)) `"
Anonymous Roadrunner - "from typing import List def two_sum(nums: List[int], target: int) -> List[int]: prevMap = {} for i, n in enumerate(nums): diff = target - n if diff in prevMap: return [prevMap[diff], i] else: prevMap[n] = i return [] debug your code below print(two_sum([2, 7, 11, 15], 9)) `"See full answer
Data Scientist
Data Structures & Algorithms
+5 more
Survey Sampling
IDE
Easy
Data Scientist
Coding
+1 more
11 answers I was asked this
+7
"with my_table as (select * , rownumber() over(order by customerid) as row_index from customers) select customer_id, customer_name from my_table where row_index % 3 = 0"
Marcos G. - "with my_table as (select * , rownumber() over(order by customerid) as row_index from customers) select customer_id, customer_name from my_table where row_index % 3 = 0"See full answer
Data Scientist
Coding
+1 more
Asked at Adobe, Apple, Capital One + 1 more • 7 months ago
Roman to Integer
Data Scientist
Data Structures & Algorithms
+4 more
Add answer I was asked this
Data Scientist
Data Structures & Algorithms
+4 more
Unsold Products
IDE
Easy
Data Scientist
Coding
+2 more
8 answers I was asked this
+4
"productssold = set(transactions['productid']) unsoldproducts = products.loc[~products['id'].isin(productssold)] return unsold_products[["id", "name", "stock"]] `"
Laura U. - "productssold = set(transactions['productid']) unsoldproducts = products.loc[~products['id'].isin(productssold)] return unsold_products[["id", "name", "stock"]] `"See full answer
Data Scientist
Coding
+2 more
Game Leaderboard
IDE
Hard
Data Scientist
Coding
+1 more
9 answers I was asked this
+6
"WITH high_score AS( SELECT player_id, MAX(gamescore) AS maxscore FROM scores GROUP BY player_id ), rankings AS( SELECT p.player_name, p.team_id, max_score, DENSERANK() OVER(PARTITION BY p.teamid ORDER BY h.maxscore DESC) AS scorerank FROM high_score AS h JOIN players AS p USING(player_id) ) SELECT team_id, player_name, max_score FROM rankings WHERE score_rank <= 2 GROUP BY teamid, playername O"
Alvin P. - "WITH high_score AS( SELECT player_id, MAX(gamescore) AS maxscore FROM scores GROUP BY player_id ), rankings AS( SELECT p.player_name, p.team_id, max_score, DENSERANK() OVER(PARTITION BY p.teamid ORDER BY h.maxscore DESC) AS scorerank FROM high_score AS h JOIN players AS p USING(player_id) ) SELECT team_id, player_name, max_score FROM rankings WHERE score_rank <= 2 GROUP BY teamid, playername O"See full answer
Data Scientist
Coding
+1 more