Data Scientist Coding Interview Questions

Review this list of 111 coding data scientist interview questions and answers verified by hiring managers and candidates.

+ Add interview

Product

Engineering

Operations

Design

Marketing

Data

Sales

Finance

Consulting

Add interview

Software Engineer Data Scientist Data Engineer Machine Learning Engineer Data Analyst Engineering Manager Backend Engineer QA Engineer Frontend Engineer Mobile Engineer

Asked at Adobe, Meta (Facebook), Oracle + 1 more • 7 months ago
Determine if a given binary tree is a binary search tree (BST).
IDE
Medium
Data Scientist
Coding
+4 more
9 answers I was asked this
+5
"bool isValidBST(TreeNode* root, long min = LONGMIN, long max = LONGMAX){ if (root == NULL) return true; if (root->val val >= max) return false; return isValidBST(root->left, min, root->val) && isValidBST(root->right, root->val, max); } `"
Alvaro R. - "bool isValidBST(TreeNode* root, long min = LONGMIN, long max = LONGMAX){ if (root == NULL) return true; if (root->val val >= max) return false; return isValidBST(root->left, min, root->val) && isValidBST(root->right, root->val, max); } `"See full answer
Data Scientist
Coding
+4 more
Asked at Adobe, Bytedance, Meta (Facebook) + 3 more • 18 days ago
Merge two sorted lists
Data Scientist
Coding
+4 more
5 answers I was asked this
+2
"int main() { int a1[7]={1,2,3,4,5,6,7}; int a2[7]={1,9,10,11,12,13,14}; vectorv; v.insert(v.begin(),begin(a1),end(a1)); v.insert(v.begin(),begin(a2),end(a2)); int a3[v.size()]; sort(v.begin(),v.end()); for(int i=0;i<v.size();i++) { a3[i]=v[i]; } } `"
Aryan D. - "int main() { int a1[7]={1,2,3,4,5,6,7}; int a2[7]={1,9,10,11,12,13,14}; vectorv; v.insert(v.begin(),begin(a1),end(a1)); v.insert(v.begin(),begin(a2),end(a2)); int a3[v.size()]; sort(v.begin(),v.end()); for(int i=0;i<v.size();i++) { a3[i]=v[i]; } } `"See full answer
Data Scientist
Coding
+4 more
TV Show Watch Time
IDE
Medium
Data Scientist
Coding
+1 more
7 answers I was asked this
+4
"with base as ( select viewerid, year, month, watchhours, lag(watchhours, 2) over (partition by viewerid order by year, month) as p3, lag(watchhours, 1) over (partition by viewerid order by year, month) as p2 from watch_time ) select viewer_id from base where p3 < p2 and p2 < watch_hours group by 1 `"
- Z. - "with base as ( select viewerid, year, month, watchhours, lag(watchhours, 2) over (partition by viewerid order by year, month) as p3, lag(watchhours, 1) over (partition by viewerid order by year, month) as p2 from watch_time ) select viewer_id from base where p3 < p2 and p2 < watch_hours group by 1 `"See full answer
Data Scientist
Coding
+1 more
Initial Contact Attribution
IDE
Hard
Data Scientist
Coding
+1 more
3 answers I was asked this
"too many questions for clarification on this to start"
Steven S. - "too many questions for clarification on this to start"See full answer
Data Scientist
Coding
+1 more
Fraudulent Transactions
IDE
Medium
Data Scientist
Coding
+1 more
4 answers I was asked this
+1
"WITH suspicious_transactions AS ( SELECT c.first_name, c.last_name, t.receipt_number, COUNT(t.receiptnumber) OVER (PARTITION BY c.customerid) AS noofoffences FROM customers c JOIN transactions t ON c.customerid = t.customerid WHERE t.receipt_number LIKE '%999%' OR t.receipt_number LIKE '%1234%' OR t.receipt_number LIKE '%XYZ%' ) SELECT first_name, last_name, receipt_number, noofoffences FROM suspicious_transactions WHERE noofoffences >= 2;"
Jayveer S. - "WITH suspicious_transactions AS ( SELECT c.first_name, c.last_name, t.receipt_number, COUNT(t.receiptnumber) OVER (PARTITION BY c.customerid) AS noofoffences FROM customers c JOIN transactions t ON c.customerid = t.customerid WHERE t.receipt_number LIKE '%999%' OR t.receipt_number LIKE '%1234%' OR t.receipt_number LIKE '%XYZ%' ) SELECT first_name, last_name, receipt_number, noofoffences FROM suspicious_transactions WHERE noofoffences >= 2;"See full answer
Data Scientist
Coding
+1 more

🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

Find Revenue by Department
IDE
Medium
Data Scientist
Coding
+1 more
4 answers I was asked this
+1
"SELECT d.departmentname,SUM(o.orderamount) AS total_revenue FROM orders o JOIN departments d ON d.departmentid =o.departmentid WHERE o.orderdate >= CURRENTDATE - INTERVAL '12 months' GROUP BY d.department_name ORDER BY total_revenue DESC; `"
Derrick M. - "SELECT d.departmentname,SUM(o.orderamount) AS total_revenue FROM orders o JOIN departments d ON d.departmentid =o.departmentid WHERE o.orderdate >= CURRENTDATE - INTERVAL '12 months' GROUP BY d.department_name ORDER BY total_revenue DESC; `"See full answer
Data Scientist
Coding
+1 more
Find Campaign Purchases
IDE
Medium
Data Scientist
Coding
+1 more
3 answers I was asked this
"SELECT upsellcampaignid, COUNT(DISTINCT trans.userid) AS eligibleusers FROM campaign JOIN "transaction" AS trans ON transactiondate BETWEEN datestart AND date_end JOIN user ON trans.userid = user.userid WHERE iseligibleforupsellcampaign = 1 GROUP BY upsellcampaignid `"
Alina G. - "SELECT upsellcampaignid, COUNT(DISTINCT trans.userid) AS eligibleusers FROM campaign JOIN "transaction" AS trans ON transactiondate BETWEEN datestart AND date_end JOIN user ON trans.userid = user.userid WHERE iseligibleforupsellcampaign = 1 GROUP BY upsellcampaignid `"See full answer
Data Scientist
Coding
+1 more
Asked at Adobe, Apple, Booking.com + 10 more • 2 months ago
Find the maximum subarray sum.
IDE
Medium
Data Scientist
Coding
+4 more
22 answers I was asked this
+16
" O(n) time, O(1) space from typing import List def maxsubarraysum(nums: List[int]) -> int: if len(nums) == 0: return 0 maxsum = currsum = nums[0] for i in range(1, len(nums)): currsum = max(currsum + nums[i], nums[i]) maxsum = max(currsum, max_sum) return max_sum debug your code below print(maxsubarraysum([-1, 2, -3, 4])) `"
Rick E. - " O(n) time, O(1) space from typing import List def maxsubarraysum(nums: List[int]) -> int: if len(nums) == 0: return 0 maxsum = currsum = nums[0] for i in range(1, len(nums)): currsum = max(currsum + nums[i], nums[i]) maxsum = max(currsum, max_sum) return max_sum debug your code below print(maxsubarraysum([-1, 2, -3, 4])) `"See full answer
Data Scientist
Coding
+4 more
Walmart Inventory Status
IDE
Medium
Data Scientist
Coding
+1 more
4 answers I was asked this
+1
"select DISTINCT p.product_id, p.product_name , CASE when sale_date is null then 'Not Sold' else 'Sold' END as sale_status from products p left join sales s on p.productid= s.productid `"
Gowtami K. - "select DISTINCT p.product_id, p.product_name , CASE when sale_date is null then 'Not Sold' else 'Sold' END as sale_status from products p left join sales s on p.productid= s.productid `"See full answer
Data Scientist
Coding
+1 more
Netflix Genre Ratings
IDE
Medium
Data Scientist
Coding
+1 more
3 answers I was asked this
"SELECT DISTINCT title, ROUND(AVG(rating) over (partition by title),1) avg_rating, ROUND(AVG(rating) over (partition by genre),1) genre_rating FROM rating r JOIN movie m ON r.movieid=m.movieid ORDER by 1"
Harshi B. - "SELECT DISTINCT title, ROUND(AVG(rating) over (partition by title),1) avg_rating, ROUND(AVG(rating) over (partition by genre),1) genre_rating FROM rating r JOIN movie m ON r.movieid=m.movieid ORDER by 1"See full answer
Data Scientist
Coding
+1 more
Unique Chat Conversations
IDE
Medium
Data Scientist
Coding
+1 more
2 answers I was asked this
"SELECT COUNT(*) unique_conversations FROM messenger_sends WHERE senderid < receiverid"
Lucas G. - "SELECT COUNT(*) unique_conversations FROM messenger_sends WHERE senderid < receiverid"See full answer
Data Scientist
Coding
+1 more
Asked at Adobe, Apple, Google + 1 more • 7 months ago
Permutations
IDE
Medium
Data Scientist
Coding
+3 more
3 answers I was asked this
"function permute(nums) { if (nums.length <= 1) { return [nums]; } const prevPermutations = permute(nums.slice(0, nums.length-1)); const currentNum = nums[nums.length-1]; const permutations = new Set(); for (let prev of prevPermutations) { for (let i=0; i < prev.length; i++) { permutations.add([...prev.slice(0, i), currentNum, ...prev.slice(i)]); } permutations.add([...prev, currentNum]); } return [...permutations]"
Tiago R. - "function permute(nums) { if (nums.length <= 1) { return [nums]; } const prevPermutations = permute(nums.slice(0, nums.length-1)); const currentNum = nums[nums.length-1]; const permutations = new Set(); for (let prev of prevPermutations) { for (let i=0; i < prev.length; i++) { permutations.add([...prev.slice(0, i), currentNum, ...prev.slice(i)]); } permutations.add([...prev, currentNum]); } return [...permutations]"See full answer
Data Scientist
Coding
+3 more
Validate Bitcoin Transactions
IDE
Hard
Data Scientist
Coding
+1 more
4 answers I was asked this
+1
"WITH CTE AS ( SELECT *, ROWNUMBER()OVER(PARTITION BY utxoid ORDER BY transactionid) AS trxrk FROM transactions JOIN transaction_inputs USING (transaction_id) JOIN utxo USING (utxo_id) ) SELECT transaction_id AS InvalidTransactionId FROM CTE WHERE sender!=address OR trx_rk > 1 `"
E L. - "WITH CTE AS ( SELECT *, ROWNUMBER()OVER(PARTITION BY utxoid ORDER BY transactionid) AS trxrk FROM transactions JOIN transaction_inputs USING (transaction_id) JOIN utxo USING (utxo_id) ) SELECT transaction_id AS InvalidTransactionId FROM CTE WHERE sender!=address OR trx_rk > 1 `"See full answer
Data Scientist
Coding
+1 more
Improving Students
IDE
Hard
Data Scientist
Coding
+2 more
3 answers I was asked this
"The solution produces the same result as the 'prescribed solution' yet it does not get accepted In the test results section transcript['year'] = transcript['year'].astype(str) df = pd.pivottable(data = transcript, index = 'studentid', columns = 'year', values = 'yearlygpa', aggfunc = 'mean').resetindex() df = df[(df['2021'] < df['2022']) & (df['2022'] < df['2023'])] df['average_gpa'] = df[['2021', '2022', '2023']].mean(axis=1).round(2) return df "
Prachi G. - "The solution produces the same result as the 'prescribed solution' yet it does not get accepted In the test results section transcript['year'] = transcript['year'].astype(str) df = pd.pivottable(data = transcript, index = 'studentid', columns = 'year', values = 'yearlygpa', aggfunc = 'mean').resetindex() df = df[(df['2021'] < df['2022']) & (df['2022'] < df['2023'])] df['average_gpa'] = df[['2021', '2022', '2023']].mean(axis=1).round(2) return df "See full answer
Data Scientist
Coding
+2 more
EPA Temperature Monitoring
IDE
Medium
Data Scientist
Coding
+1 more
4 answers I was asked this
+1
"this task is misleading . i used lag(1) and lead(1) cuz it did not say "compare temperature from 2 days before and 1 day before" , it reads to me as if its asking "compare cur temperature to prev and future and see if it rose and fall""
Erjan G. - "this task is misleading . i used lag(1) and lead(1) cuz it did not say "compare temperature from 2 days before and 1 day before" , it reads to me as if its asking "compare cur temperature to prev and future and see if it rose and fall""See full answer
Data Scientist
Coding
+1 more
Biased Coin Flip Histogram
Easy
Data Scientist
Coding
2 answers I was asked this
"import random def coin_flip(): x=4*[0]+[1] res=[] for i in range(20): res.append(random.choice(x)) return res res=[0,0] # [head,tail] for j in range(1000): temp=coin_flip() res[0]+=sum(temp) #head res[1]+=(20-sum(temp)) #tail"
Alireza K. - "import random def coin_flip(): x=4*[0]+[1] res=[] for i in range(20): res.append(random.choice(x)) return res res=[0,0] # [head,tail] for j in range(1000): temp=coin_flip() res[0]+=sum(temp) #head res[1]+=(20-sum(temp)) #tail"See full answer
Data Scientist
Coding
Asked at Adobe, Apple, Intuit + 3 more • 7 months ago
Sudoku Solver
IDE
Hard
Data Scientist
Coding
+4 more
2 answers I was asked this
"static boolean sudokuSolve(char board) { return sudokuSolve(board, 0, 0); } static boolean sudokuSolve(char board, int r, int c) { if(c>=board[0].length) { r=r+1; c=0; } if(r>=board.length) return true; if(boardr=='.') { for(int num=1; num<=9; num++) { boardr=(char)('0' + num); if(isValidPosition(board, r, c)) { if(sudokuSolve(board, r, c+1)) return true; } boardr='.'; } } else { return sudokuSolve(board, r, c+1); } return false; } static boolean isValidPosition(char b"
Divya R. - "static boolean sudokuSolve(char board) { return sudokuSolve(board, 0, 0); } static boolean sudokuSolve(char board, int r, int c) { if(c>=board[0].length) { r=r+1; c=0; } if(r>=board.length) return true; if(boardr=='.') { for(int num=1; num<=9; num++) { boardr=(char)('0' + num); if(isValidPosition(board, r, c)) { if(sudokuSolve(board, r, c+1)) return true; } boardr='.'; } } else { return sudokuSolve(board, r, c+1); } return false; } static boolean isValidPosition(char b"See full answer
Data Scientist
Coding
+4 more
Nth Ranked Player
IDE
Medium
Data Scientist
Coding
+1 more
3 answers I was asked this
"with cte as (select *, row_number() over(order by score desc) as rn from players) select player_name, score, rn as ranking from cte where rn= 4 or rn =6 or rn =11 `"
Gowtami K. - "with cte as (select *, row_number() over(order by score desc) as rn from players) select player_name, score, rn as ranking from cte where rn= 4 or rn =6 or rn =11 `"See full answer
Data Scientist
Coding
+1 more
Number of Direct Reports
IDE
Medium
Data Scientist
Coding
+1 more
4 answers I was asked this
+1
"SELECT e1.empid AS manageremployee_id, e1.empname AS managername, COUNT(e2.empid) AS numberofdirectreports FROM employees AS e1 INNER JOIN employees AS e2 ON e2.managerid = e1.empid GROUP BY e1.emp_id HAVING COUNT(e2.emp_id) >= 2 ORDER BY numberofdirectreports DESC, managername ASC `"
Alvin P. - "SELECT e1.empid AS manageremployee_id, e1.empname AS managername, COUNT(e2.empid) AS numberofdirectreports FROM employees AS e1 INNER JOIN employees AS e2 ON e2.managerid = e1.empid GROUP BY e1.emp_id HAVING COUNT(e2.emp_id) >= 2 ORDER BY numberofdirectreports DESC, managername ASC `"See full answer
Data Scientist
Coding
+1 more
Find Customers by Department
IDE
Medium
Data Scientist
Coding
+1 more
2 answers I was asked this
"SELECT COUNT(DISTINCT o.customerid) AS customers, d.departmentname FROM orders o INNER JOIN departments d ON d.departmentid = o.departmentid WHERE d.departmentname IN ('Electronics','Fashion') AND o.orderdate BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY d.department_name; `"
Derrick M. - "SELECT COUNT(DISTINCT o.customerid) AS customers, d.departmentname FROM orders o INNER JOIN departments d ON d.departmentid = o.departmentid WHERE d.departmentname IN ('Electronics','Fashion') AND o.orderdate BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY d.department_name; `"See full answer
Data Scientist
Coding
+1 more