Data Scientist Interview Questions

Review this list of 278 data scientist interview questions and answers verified by hiring managers and candidates.

+ Add interview

Product

Engineering

Operations

Design

Marketing

Data

Sales

Finance

Consulting

Add interview

Product Manager Software Engineer Data Scientist Technical Program Manager Engineering Manager Data Engineer Machine Learning Engineer Data Analyst BizOps & Strategy Business Analyst

Asked at Amazon • 13 days ago
Hypothesis Testing: Suppose a PM claims that users, on average, spend about $50 per month on Amazon. However, you doubt this claim and believe the average should be higher. You sample 100 users and...
Data Scientist
Statistics & Experimentation
Add answer I was asked this
Data Scientist
Statistics & Experimentation
What are the Z and t-tests?
Data Scientist
Statistics & Experimentation
Add answer I was asked this
Data Scientist
Statistics & Experimentation
Asked at Adobe, Apple, Booking.com + 10 more • 2 months ago
Find the maximum subarray sum.
IDE
Medium
Data Scientist
Data Structures & Algorithms
+4 more
22 answers I was asked this
+16
" O(n) time, O(1) space from typing import List def maxsubarraysum(nums: List[int]) -> int: if len(nums) == 0: return 0 maxsum = currsum = nums[0] for i in range(1, len(nums)): currsum = max(currsum + nums[i], nums[i]) maxsum = max(currsum, max_sum) return max_sum debug your code below print(maxsubarraysum([-1, 2, -3, 4])) `"
Rick E. - " O(n) time, O(1) space from typing import List def maxsubarraysum(nums: List[int]) -> int: if len(nums) == 0: return 0 maxsum = currsum = nums[0] for i in range(1, len(nums)): currsum = max(currsum + nums[i], nums[i]) maxsum = max(currsum, max_sum) return max_sum debug your code below print(maxsubarraysum([-1, 2, -3, 4])) `"See full answer
Data Scientist
Data Structures & Algorithms
+4 more
Walmart Inventory Status
IDE
Medium
Data Scientist
Coding
+1 more
4 answers I was asked this
+1
"select DISTINCT p.product_id, p.product_name , CASE when sale_date is null then 'Not Sold' else 'Sold' END as sale_status from products p left join sales s on p.productid= s.productid `"
Gowtami K. - "select DISTINCT p.product_id, p.product_name , CASE when sale_date is null then 'Not Sold' else 'Sold' END as sale_status from products p left join sales s on p.productid= s.productid `"See full answer
Data Scientist
Coding
+1 more
Asked at SAP • 2 years ago
You have a dataset comprising 1,000 avatar images and 100,000 user descriptions with associated avatar images. Create a model that recommends an image from a new set of 100,000 images for a user de...
Data Scientist
Machine Learning
2 answers I was asked this
"[I'm not sure whether the answer below is the best, as I have not gotten result and feedback from my interview] Ans: I would solve by first using a VAE-style model, to create a latent space embedding that translates user description to generate images. Training would be done on the 1000 avatar images and 100000 descriptions, following this scheme: VAE: description -> encoder -> latent space -> decoder -> image Q: "OK, but that means you're limiting the generated images to be only the 1000 imag"
Nick S. - "[I'm not sure whether the answer below is the best, as I have not gotten result and feedback from my interview] Ans: I would solve by first using a VAE-style model, to create a latent space embedding that translates user description to generate images. Training would be done on the 1000 avatar images and 100000 descriptions, following this scheme: VAE: description -> encoder -> latent space -> decoder -> image Q: "OK, but that means you're limiting the generated images to be only the 1000 imag"See full answer
Data Scientist
Machine Learning

🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

Netflix Genre Ratings
IDE
Medium
Data Scientist
Coding
+1 more
3 answers I was asked this
"SELECT DISTINCT title, ROUND(AVG(rating) over (partition by title),1) avg_rating, ROUND(AVG(rating) over (partition by genre),1) genre_rating FROM rating r JOIN movie m ON r.movieid=m.movieid ORDER by 1"
Harshi B. - "SELECT DISTINCT title, ROUND(AVG(rating) over (partition by title),1) avg_rating, ROUND(AVG(rating) over (partition by genre),1) genre_rating FROM rating r JOIN movie m ON r.movieid=m.movieid ORDER by 1"See full answer
Data Scientist
Coding
+1 more
Asked at Meta (Facebook) • 4 years ago
Would you port Facebook rooms to Instagram?
Data Scientist
Product Strategy
Add answer I was asked this
Data Scientist
Product Strategy
Unique Chat Conversations
IDE
Medium
Data Scientist
Coding
+1 more
2 answers I was asked this
"SELECT COUNT(*) unique_conversations FROM messenger_sends WHERE senderid < receiverid"
Lucas G. - "SELECT COUNT(*) unique_conversations FROM messenger_sends WHERE senderid < receiverid"See full answer
Data Scientist
Coding
+1 more
Asked at Walmart Labs • 7 months ago
Tell me about your e-commerce experience.
Data Scientist
Behavioral
+2 more
Add answer I was asked this
Data Scientist
Behavioral
+2 more
Asked at Discord • 7 months ago
Why do you want to work at Discord?
Data Scientist
Behavioral
+2 more
Add answer I was asked this
Data Scientist
Behavioral
+2 more
Asked at Adobe, Apple, Google + 1 more • 7 months ago
Permutations
IDE
Medium
Data Scientist
Data Structures & Algorithms
+3 more
3 answers I was asked this
"function permute(nums) { if (nums.length <= 1) { return [nums]; } const prevPermutations = permute(nums.slice(0, nums.length-1)); const currentNum = nums[nums.length-1]; const permutations = new Set(); for (let prev of prevPermutations) { for (let i=0; i < prev.length; i++) { permutations.add([...prev.slice(0, i), currentNum, ...prev.slice(i)]); } permutations.add([...prev, currentNum]); } return [...permutations]"
Tiago R. - "function permute(nums) { if (nums.length <= 1) { return [nums]; } const prevPermutations = permute(nums.slice(0, nums.length-1)); const currentNum = nums[nums.length-1]; const permutations = new Set(); for (let prev of prevPermutations) { for (let i=0; i < prev.length; i++) { permutations.add([...prev.slice(0, i), currentNum, ...prev.slice(i)]); } permutations.add([...prev, currentNum]); } return [...permutations]"See full answer
Data Scientist
Data Structures & Algorithms
+3 more
Validate Bitcoin Transactions
IDE
Hard
Data Scientist
Coding
+1 more
4 answers I was asked this
+1
"WITH CTE AS ( SELECT *, ROWNUMBER()OVER(PARTITION BY utxoid ORDER BY transactionid) AS trxrk FROM transactions JOIN transaction_inputs USING (transaction_id) JOIN utxo USING (utxo_id) ) SELECT transaction_id AS InvalidTransactionId FROM CTE WHERE sender!=address OR trx_rk > 1 `"
E L. - "WITH CTE AS ( SELECT *, ROWNUMBER()OVER(PARTITION BY utxoid ORDER BY transactionid) AS trxrk FROM transactions JOIN transaction_inputs USING (transaction_id) JOIN utxo USING (utxo_id) ) SELECT transaction_id AS InvalidTransactionId FROM CTE WHERE sender!=address OR trx_rk > 1 `"See full answer
Data Scientist
Coding
+1 more
Improving Students
IDE
Hard
Data Scientist
Coding
+2 more
3 answers I was asked this
"The solution produces the same result as the 'prescribed solution' yet it does not get accepted In the test results section transcript['year'] = transcript['year'].astype(str) df = pd.pivottable(data = transcript, index = 'studentid', columns = 'year', values = 'yearlygpa', aggfunc = 'mean').resetindex() df = df[(df['2021'] < df['2022']) & (df['2022'] < df['2023'])] df['average_gpa'] = df[['2021', '2022', '2023']].mean(axis=1).round(2) return df "
Prachi G. - "The solution produces the same result as the 'prescribed solution' yet it does not get accepted In the test results section transcript['year'] = transcript['year'].astype(str) df = pd.pivottable(data = transcript, index = 'studentid', columns = 'year', values = 'yearlygpa', aggfunc = 'mean').resetindex() df = df[(df['2021'] < df['2022']) & (df['2022'] < df['2023'])] df['average_gpa'] = df[['2021', '2022', '2023']].mean(axis=1).round(2) return df "See full answer
Data Scientist
Coding
+2 more
EPA Temperature Monitoring
IDE
Medium
Data Scientist
Coding
+1 more
4 answers I was asked this
+1
"with jay as (select date,temperature,deff from( select *,temperature-lag(temperature)over(order by date ) as deff from city_temperatures) where deff >=-3 and deff >=5) select date,temperature from jay"
Jayveer S. - "with jay as (select date,temperature,deff from( select *,temperature-lag(temperature)over(order by date ) as deff from city_temperatures) where deff >=-3 and deff >=5) select date,temperature from jay"See full answer
Data Scientist
Coding
+1 more
Biased Coin Flip Histogram
Easy
Data Scientist
Coding
2 answers I was asked this
"import random def coin_flip(): x=4*[0]+[1] res=[] for i in range(20): res.append(random.choice(x)) return res res=[0,0] # [head,tail] for j in range(1000): temp=coin_flip() res[0]+=sum(temp) #head res[1]+=(20-sum(temp)) #tail"
Alireza K. - "import random def coin_flip(): x=4*[0]+[1] res=[] for i in range(20): res.append(random.choice(x)) return res res=[0,0] # [head,tail] for j in range(1000): temp=coin_flip() res[0]+=sum(temp) #head res[1]+=(20-sum(temp)) #tail"See full answer
Data Scientist
Coding
Asked at TikTok • 3 years ago
How do you generate insights?
Data Scientist
Analytical
3 answers I was asked this
"I generate insights through stakeholder requirements and the data I have in hand"
Anonymous Eagle - "I generate insights through stakeholder requirements and the data I have in hand"See full answer
Data Scientist
Analytical
Asked at Adobe, Apple, Intuit + 3 more • 7 months ago
Sudoku Solver
IDE
Hard
Data Scientist
Data Structures & Algorithms
+4 more
2 answers I was asked this
"static boolean sudokuSolve(char board) { return sudokuSolve(board, 0, 0); } static boolean sudokuSolve(char board, int r, int c) { if(c>=board[0].length) { r=r+1; c=0; } if(r>=board.length) return true; if(boardr=='.') { for(int num=1; num<=9; num++) { boardr=(char)('0' + num); if(isValidPosition(board, r, c)) { if(sudokuSolve(board, r, c+1)) return true; } boardr='.'; } } else { return sudokuSolve(board, r, c+1); } return false; } static boolean isValidPosition(char b"
Divya R. - "static boolean sudokuSolve(char board) { return sudokuSolve(board, 0, 0); } static boolean sudokuSolve(char board, int r, int c) { if(c>=board[0].length) { r=r+1; c=0; } if(r>=board.length) return true; if(boardr=='.') { for(int num=1; num<=9; num++) { boardr=(char)('0' + num); if(isValidPosition(board, r, c)) { if(sudokuSolve(board, r, c+1)) return true; } boardr='.'; } } else { return sudokuSolve(board, r, c+1); } return false; } static boolean isValidPosition(char b"See full answer
Data Scientist
Data Structures & Algorithms
+4 more
Nth Ranked Player
IDE
Medium
Data Scientist
Coding
+1 more
3 answers I was asked this
"with cte as (select *, row_number() over(order by score desc) as rn from players) select player_name, score, rn as ranking from cte where rn= 4 or rn =6 or rn =11 `"
Gowtami K. - "with cte as (select *, row_number() over(order by score desc) as rn from players) select player_name, score, rn as ranking from cte where rn= 4 or rn =6 or rn =11 `"See full answer
Data Scientist
Coding
+1 more
Number of Direct Reports
IDE
Medium
Data Scientist
Coding
+1 more
4 answers I was asked this
+1
"SELECT e1.empid AS manageremployee_id, e1.empname AS managername, COUNT(e2.empid) AS numberofdirectreports FROM employees AS e1 INNER JOIN employees AS e2 ON e2.managerid = e1.empid GROUP BY e1.emp_id HAVING COUNT(e2.emp_id) >= 2 ORDER BY numberofdirectreports DESC, managername ASC `"
Alvin P. - "SELECT e1.empid AS manageremployee_id, e1.empname AS managername, COUNT(e2.empid) AS numberofdirectreports FROM employees AS e1 INNER JOIN employees AS e2 ON e2.managerid = e1.empid GROUP BY e1.emp_id HAVING COUNT(e2.emp_id) >= 2 ORDER BY numberofdirectreports DESC, managername ASC `"See full answer
Data Scientist
Coding
+1 more
Find Customers by Department
IDE
Medium
Data Scientist
Coding
+1 more
2 answers I was asked this
"SELECT COUNT(DISTINCT o.customerid) AS customers, d.departmentname FROM orders o INNER JOIN departments d ON d.departmentid = o.departmentid WHERE d.departmentname IN ('Electronics','Fashion') AND o.orderdate BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY d.department_name; `"
Derrick M. - "SELECT COUNT(DISTINCT o.customerid) AS customers, d.departmentname FROM orders o INNER JOIN departments d ON d.departmentid = o.departmentid WHERE d.departmentname IN ('Electronics','Fashion') AND o.orderdate BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY d.department_name; `"See full answer
Data Scientist
Coding
+1 more