Data Scientist Coding Interview Questions

Review this list of 111 coding data scientist interview questions and answers verified by hiring managers and candidates.
  • +6

    "with top_players as( Select team_id, player_name, max(gamescore) as maxscore ,denserank() over(partition by teamid order by max(game_score) desc) rk from players p join scores s on p.playerid=s.playerid group by player_name, team_id ) Select team_id, player_name, max_score from top_players where rk<=2 order by teamid, maxscore desc `"

    Victor N. - "with top_players as( Select team_id, player_name, max(gamescore) as maxscore ,denserank() over(partition by teamid order by max(game_score) desc) rk from players p join scores s on p.playerid=s.playerid group by player_name, team_id ) Select team_id, player_name, max_score from top_players where rk<=2 order by teamid, maxscore desc `"See full answer

    Data Scientist
    Coding
    +1 more
  • Adobe logoAsked at Adobe 
    +8

    "Problem Statement: The Fibonacci sequence is defined as F(n) = F(n-1) + F(n-2) with F(0) = 1 and F(1) = 1. The solution is given in the problem statement itself. If the value of n = 0, return 1. If the value of n = 1, return 1. Otherwise, return the sum of data at (n - 1) and (n - 2). Explanation: The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones, typically starting with 0 and 1. Java Solution: public static int fib(int n"

    Rishi G. - "Problem Statement: The Fibonacci sequence is defined as F(n) = F(n-1) + F(n-2) with F(0) = 1 and F(1) = 1. The solution is given in the problem statement itself. If the value of n = 0, return 1. If the value of n = 1, return 1. Otherwise, return the sum of data at (n - 1) and (n - 2). Explanation: The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones, typically starting with 0 and 1. Java Solution: public static int fib(int n"See full answer

    Data Scientist
    Coding
    +2 more
  • +5

    "Here is my implementation: select marketing_channel, AVG(purchasevalue) as avgpurchase_value from attribution group by marketing_channel order by avgpurchasevalue DESC ; There is no need to copy and past the line of code for calculating the average into order by, just Alias is enough because going by the order of execution in sql, Always, order by is executed after executing select clause."

    Maliki U. - "Here is my implementation: select marketing_channel, AVG(purchasevalue) as avgpurchase_value from attribution group by marketing_channel order by avgpurchasevalue DESC ; There is no need to copy and past the line of code for calculating the average into order by, just Alias is enough because going by the order of execution in sql, Always, order by is executed after executing select clause."See full answer

    Data Scientist
    Coding
    +1 more
  • +3

    "-- The text of the task is a bit confusing. If the status is repeated several -- times, then in the end you should show as start_date the date of the first -- occurrence, and in end_date the date of the last occurrence of this status, -- and not the date of the beginning of the next status with t1 as (select order_id, status, orderdate as startdate, lead(orderdate) over (partition by orderid order by orderdate) as enddate, ifnull(lag(status) over (partition by order_id order by or"

    Alexey T. - "-- The text of the task is a bit confusing. If the status is repeated several -- times, then in the end you should show as start_date the date of the first -- occurrence, and in end_date the date of the last occurrence of this status, -- and not the date of the beginning of the next status with t1 as (select order_id, status, orderdate as startdate, lead(orderdate) over (partition by orderid order by orderdate) as enddate, ifnull(lag(status) over (partition by order_id order by or"See full answer

    Data Scientist
    Coding
    +1 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • +4

    "SELECT order_amount FROM ( SELECT *, rank() OVER(ORDER BY order_amount desc) as ranking FROM departments d LEFT JOIN orders o ON d.departmentid = o.departmentid LEFT JOIN customers c ON o.customerid = c.customerid WHERE department_name = 'Fashion' ) where ranking = 2"

    Jacky T. - "SELECT order_amount FROM ( SELECT *, rank() OVER(ORDER BY order_amount desc) as ranking FROM departments d LEFT JOIN orders o ON d.departmentid = o.departmentid LEFT JOIN customers c ON o.customerid = c.customerid WHERE department_name = 'Fashion' ) where ranking = 2"See full answer

    Data Scientist
    Coding
    +1 more
  • +8

    "select sub.name subreddit_name, count(distinct us.userid) totalusers from user_subreddit as us left join subreddit as sub on us.subredditid = sub.subredditid group by us.subreddit_id having count(distinct us.user_id) > 3"

    Lucas G. - "select sub.name subreddit_name, count(distinct us.userid) totalusers from user_subreddit as us left join subreddit as sub on us.subredditid = sub.subredditid group by us.subreddit_id having count(distinct us.user_id) > 3"See full answer

    Data Scientist
    Coding
    +1 more
  • +3

    "-- filter for december and november data -- the total order amount per depatment per month -- department, month, order_amount with monthly_orders AS ( SELECT department_id, strftime('%m', order_date) AS month, SUM(orderamount) AS orderamount FROM orders WHERE orderdate >= '2022-11-01' AND orderdate < '2023-01-01' group by department_id, month ), -- -- add difference from this month to last ( use lag ) monthly_comp"

    Aneesha K. - "-- filter for december and november data -- the total order amount per depatment per month -- department, month, order_amount with monthly_orders AS ( SELECT department_id, strftime('%m', order_date) AS month, SUM(orderamount) AS orderamount FROM orders WHERE orderdate >= '2022-11-01' AND orderdate < '2023-01-01' group by department_id, month ), -- -- add difference from this month to last ( use lag ) monthly_comp"See full answer

    Data Scientist
    Coding
    +1 more
  • +5

    "select customer_id, order_date, orderid as earliestorder_id from ( select customer_id, order_date, order_id, rownumber() over (partition by customerid, orderdate order by orderdate) as orderrankper_customer from orders ) sub_table where orderrankper_customer=1 order by orderdate, customerid; Standard solution assumed that the orderid indicates which order comes in first. However this is not always the case, and sometime orderid can be random number withou"

    Jessica C. - "select customer_id, order_date, orderid as earliestorder_id from ( select customer_id, order_date, order_id, rownumber() over (partition by customerid, orderdate order by orderdate) as orderrankper_customer from orders ) sub_table where orderrankper_customer=1 order by orderdate, customerid; Standard solution assumed that the orderid indicates which order comes in first. However this is not always the case, and sometime orderid can be random number withou"See full answer

    Data Scientist
    Coding
    +1 more
  • Adobe logoAsked at Adobe 
    Video answer for 'Merge k sorted linked lists.'
    +6

    "A much better solution than the one in the article, below: It looks like the ones writing articles here in Javascript do not understand the time/space complexity of javascript methods. shift, splice, sort, etc... In the solution article you have a shift and a sort being done inside a while, that is, the multiplication of Ns. My solution, below, iterates through the list once and then sorts it, separately. It´s O(N+Log(N)) class ListNode { constructor(val = 0, next = null) { th"

    Guilherme F. - "A much better solution than the one in the article, below: It looks like the ones writing articles here in Javascript do not understand the time/space complexity of javascript methods. shift, splice, sort, etc... In the solution article you have a shift and a sort being done inside a while, that is, the multiplication of Ns. My solution, below, iterates through the list once and then sorts it, separately. It´s O(N+Log(N)) class ListNode { constructor(val = 0, next = null) { th"See full answer

    Data Scientist
    Coding
    +4 more
  • +1

    "SELECT AVG(julianday(dateend) - julianday(datestart)) AS avgcampaignduration FROM campaign; `"

    Salome L. - "SELECT AVG(julianday(dateend) - julianday(datestart)) AS avgcampaignduration FROM campaign; `"See full answer

    Data Scientist
    Coding
    +1 more
  • Apple logoAsked at Apple 
    Data Scientist
    Coding
    +4 more
  • +3

    "SELECT u.id as user_id, u.name, COUNT(t.product_id) AS orders FROM users u JOIN transactions t ON t.user_id = u.id JOIN products p ON p.id = t.product_id GROUP BY u.id, u.name ORDER BY orders DESC LIMIT 1 `"

    Derrick M. - "SELECT u.id as user_id, u.name, COUNT(t.product_id) AS orders FROM users u JOIN transactions t ON t.user_id = u.id JOIN products p ON p.id = t.product_id GROUP BY u.id, u.name ORDER BY orders DESC LIMIT 1 `"See full answer

    Data Scientist
    Coding
    +1 more
  • +5

    "with t1 as (select employee_name, department_id, salary, avg(salary) over (partition by departmentid) as avgsalary, abs(salary - avg(salary) over (partition by department_id)) as diff from employees ) select employee_name, department_id, salary, avg_salary, denserank() over (partition by departmentid order by diff desc) as deviation_rank from t1 order by departmentid asc, deviationrank asc, employee_name `"

    Alexey T. - "with t1 as (select employee_name, department_id, salary, avg(salary) over (partition by departmentid) as avgsalary, abs(salary - avg(salary) over (partition by department_id)) as diff from employees ) select employee_name, department_id, salary, avg_salary, denserank() over (partition by departmentid order by diff desc) as deviation_rank from t1 order by departmentid asc, deviationrank asc, employee_name `"See full answer

    Data Scientist
    Coding
    +1 more
  • +3

    "Hi, my solution gives the exact numerical values as the proposed solution, but it doesn't pass the tests. Am I missing something, or is this a bug? def findrevenueby_city(transactions: pd.DataFrame, users: pd.DataFrame, exchange_rate: pd.DataFrame) -> pd.DataFrame: gets user city for each user id userids = users[['id', 'usercity']] and merge on transactions transactions = transactions.merge(user_ids, how='left"

    Gabriel P. - "Hi, my solution gives the exact numerical values as the proposed solution, but it doesn't pass the tests. Am I missing something, or is this a bug? def findrevenueby_city(transactions: pd.DataFrame, users: pd.DataFrame, exchange_rate: pd.DataFrame) -> pd.DataFrame: gets user city for each user id userids = users[['id', 'usercity']] and merge on transactions transactions = transactions.merge(user_ids, how='left"See full answer

    Data Scientist
    Coding
    +2 more
  • +1

    "Schema is wrong - id from product is mapped to id from transactions, id from product should point to product_id in transcations table"

    Arshad P. - "Schema is wrong - id from product is mapped to id from transactions, id from product should point to product_id in transcations table"See full answer

    Data Scientist
    Coding
    +2 more
  • "with cte as ( select user_id, timestamp as current_login, lag(timestamp,1) over(partition by userid order by timestamp) as previouslogin , round(abs(julianday(timestamp)-julianday(lag(timestamp,1) over(partition by userid order by timestamp)))2460)as minuteselapsed from useractivitylog where activity_type ='LOGIN' ) select userid, currentlogin, previouslogin, minuteselapsed from cte where currentlogin previouslogin `"

    Gowtami K. - "with cte as ( select user_id, timestamp as current_login, lag(timestamp,1) over(partition by userid order by timestamp) as previouslogin , round(abs(julianday(timestamp)-julianday(lag(timestamp,1) over(partition by userid order by timestamp)))2460)as minuteselapsed from useractivitylog where activity_type ='LOGIN' ) select userid, currentlogin, previouslogin, minuteselapsed from cte where currentlogin previouslogin `"See full answer

    Data Scientist
    Coding
    +1 more
  • +2

    "not sure what's wrong here> select a.marketing_channel, avg(purchasevalue) as avgpurchase_value, sum(case when a.purchasevalue > 0 then 1 else 0 end) * 1.0 /count(a.sessionid) as conversion_rate from attribution a left join usersessions u on a.sessionid = u.session_id group by a.marketing_channel order by conversion_rate desc `"

    Shriganesh K. - "not sure what's wrong here> select a.marketing_channel, avg(purchasevalue) as avgpurchase_value, sum(case when a.purchasevalue > 0 then 1 else 0 end) * 1.0 /count(a.sessionid) as conversion_rate from attribution a left join usersessions u on a.sessionid = u.session_id group by a.marketing_channel order by conversion_rate desc `"See full answer

    Data Scientist
    Coding
    +1 more
  • "Test case is wrong. It expects to sort in asc order of month_year. -- Write your query here SELECT strftime('%Y-%m', createdat) AS monthyear, COUNT(DISTINCT userid) AS numcustomers, COUNT(t.id) AS num_orders, SUM(price * quantity) AS order_amt FROM transactions t INNER JOIN products p ON t.product_id = p.id GROUP BY month_year ORDER BY month_year ; "

    Aneesha K. - "Test case is wrong. It expects to sort in asc order of month_year. -- Write your query here SELECT strftime('%Y-%m', createdat) AS monthyear, COUNT(DISTINCT userid) AS numcustomers, COUNT(t.id) AS num_orders, SUM(price * quantity) AS order_amt FROM transactions t INNER JOIN products p ON t.product_id = p.id GROUP BY month_year ORDER BY month_year ; "See full answer

    Data Scientist
    Coding
    +1 more
  • +1

    "SELECT i.item_category, o.order_date, SUM(o.orderquantity) AS totalunits_ordered FROM orders o JOIN items i ON o.itemid = i.itemid WHERE o.order_date >= DATE('now', '-6 days') GROUP BY i.item_category, o.order_date ORDER BY i.item_category ASC, o.order_date ASC;"

    Anonymous Tortoise - "SELECT i.item_category, o.order_date, SUM(o.orderquantity) AS totalunits_ordered FROM orders o JOIN items i ON o.itemid = i.itemid WHERE o.order_date >= DATE('now', '-6 days') GROUP BY i.item_category, o.order_date ORDER BY i.item_category ASC, o.order_date ASC;"See full answer

    Data Scientist
    Coding
    +1 more
Showing 61-80 of 111