Data Scientist SQL Interview Questions

Review this list of 26 sql data scientist interview questions and answers verified by hiring managers and candidates.
  • +8

    "Answer: select fromcaller, count(DISTINCT tocallee) as num_calls from calls group by fromcaller having count(DISTINCT tocallee) >= 3 Setup: CREATE TABLE calls ( from_caller VARCHAR(20), to_callee VARCHAR(20) ); INSERT INTO calls (fromcaller, tocallee) VALUES ('Alice', 'Bob'), ('Charlie', 'Dave'), ('Alice', 'Frank'), ('Charlie', 'Heidi'), ('Charlie', 'Judy'); "

    KAI - "Answer: select fromcaller, count(DISTINCT tocallee) as num_calls from calls group by fromcaller having count(DISTINCT tocallee) >= 3 Setup: CREATE TABLE calls ( from_caller VARCHAR(20), to_callee VARCHAR(20) ); INSERT INTO calls (fromcaller, tocallee) VALUES ('Alice', 'Bob'), ('Charlie', 'Dave'), ('Alice', 'Frank'), ('Charlie', 'Heidi'), ('Charlie', 'Judy'); "See full answer

    Data Scientist
    SQL
    +3 more
  • Video answer for 'Employee Earnings.'
    +37

    "SELECT employees.first_name, managers.salary AS manager_salary FROM employees LEFT JOIN employees AS managers ON employees.manager_id = managers.id WHERE employees.salary > managers.salary `"

    Tiffany A. - "SELECT employees.first_name, managers.salary AS manager_salary FROM employees LEFT JOIN employees AS managers ON employees.manager_id = managers.id WHERE employees.salary > managers.salary `"See full answer

    Data Scientist
    SQL
    +4 more
  • IBM logoAsked at IBM 
    +49

    "SELECT MIN(id) AS id, TRIM(LOWER(email)) AS cleaned_email FROM users GROUP BY cleaned_email ORDER BY id `"

    Salome L. - "SELECT MIN(id) AS id, TRIM(LOWER(email)) AS cleaned_email FROM users GROUP BY cleaned_email ORDER BY id `"See full answer

    Data Scientist
    SQL
    +3 more
  • "it is really good explanation thanks it is really good explanation thanks"

    Amney M. - "it is really good explanation thanks it is really good explanation thanks"See full answer

    Data Scientist
    SQL
    +4 more
  • "WITH ActiveUsersYesterday AS ( SELECT DISTINCT user_id FROM user_activity WHERE activity_date = CAST(GETDATE() - 1 AS DATE) ), VideoCallUsersYesterday AS ( SELECT DISTINCT user_id FROM video_calls WHERE call_date = CAST(GETDATE() - 1 AS DATE) ) SELECT (CAST(COUNT(DISTINCT v.userid) AS FLOAT) / NULLIF(COUNT(DISTINCT a.userid), 0)) * 100 AS percentagevideocall_users FROM ActiveUsersYesterday a LEFT JOIN VideoCallUsersYesterday v ON a.userid = v.userid;"

    Bala G. - "WITH ActiveUsersYesterday AS ( SELECT DISTINCT user_id FROM user_activity WHERE activity_date = CAST(GETDATE() - 1 AS DATE) ), VideoCallUsersYesterday AS ( SELECT DISTINCT user_id FROM video_calls WHERE call_date = CAST(GETDATE() - 1 AS DATE) ) SELECT (CAST(COUNT(DISTINCT v.userid) AS FLOAT) / NULLIF(COUNT(DISTINCT a.userid), 0)) * 100 AS percentagevideocall_users FROM ActiveUsersYesterday a LEFT JOIN VideoCallUsersYesterday v ON a.userid = v.userid;"See full answer

    Data Scientist
    SQL
    +2 more
  • 🧠 Want an expert answer to a question? Saving questions lets us know what content to make next.

  • +20

    "WITH filtered_posts AS ( SELECT p.user_id, p.issuccessfulpost FROM post p WHERE p.postdate >= '2023-11-01' AND p.postdate < '2023-12-01' ), post_summary AS ( SELECT pu.user_type, COUNT(*) AS post_attempt, SUM(CASE WHEN fp.issuccessfulpost = 1 THEN 1 ELSE 0 END) AS post_success FROM filtered_posts fp JOIN postuser pu ON fp.userid = pu.user_id GROUP BY pu.user_type ) SELECT user_type, post_success, post_attempt, CAST(postsuccess AS FLOAT) / postattempt AS postsuccessrate FROM po"

    David I. - "WITH filtered_posts AS ( SELECT p.user_id, p.issuccessfulpost FROM post p WHERE p.postdate >= '2023-11-01' AND p.postdate < '2023-12-01' ), post_summary AS ( SELECT pu.user_type, COUNT(*) AS post_attempt, SUM(CASE WHEN fp.issuccessfulpost = 1 THEN 1 ELSE 0 END) AS post_success FROM filtered_posts fp JOIN postuser pu ON fp.userid = pu.user_id GROUP BY pu.user_type ) SELECT user_type, post_success, post_attempt, CAST(postsuccess AS FLOAT) / postattempt AS postsuccessrate FROM po"See full answer

    Data Scientist
    SQL
    +3 more
  • +35

    "Here's a simpler solution: select u.username , count(p.postid) as countposts from posts as p join users as u on p.userid = u.userid where p.likes >= 100 group by 1 order by 2 desc, 1 asc limit 3 `"

    Bradley E. - "Here's a simpler solution: select u.username , count(p.postid) as countposts from posts as p join users as u on p.userid = u.userid where p.likes >= 100 group by 1 order by 2 desc, 1 asc limit 3 `"See full answer

    Data Scientist
    SQL
    +3 more
  • +48

    "Limit and rank() only works if there are no 2 employees with same salary ( which is okay for this use case) For the query to pass all the test results, we need to use dense_rank with ranked_employees as ( select id, firstname, lastname, salary, denserank() over(order by salary desc) as salaryrank from employees ) select id, firstname, lastname, salary from ranked_employees where salary_rank <= 3 `"

    Vysali K. - "Limit and rank() only works if there are no 2 employees with same salary ( which is okay for this use case) For the query to pass all the test results, we need to use dense_rank with ranked_employees as ( select id, firstname, lastname, salary, denserank() over(order by salary desc) as salaryrank from employees ) select id, firstname, lastname, salary from ranked_employees where salary_rank <= 3 `"See full answer

    Data Scientist
    SQL
    +3 more
  • +24

    "SELECT d.name as departmentname,e.id as employeeid,e.firstname,e.lastname,MAX(e.salary) as salary FROM employees e LEFT JOIN departments d ON e.department_id=d.id GROUP BY department_name ORDER BY department_name;"

    Anisha S. - "SELECT d.name as departmentname,e.id as employeeid,e.firstname,e.lastname,MAX(e.salary) as salary FROM employees e LEFT JOIN departments d ON e.department_id=d.id GROUP BY department_name ORDER BY department_name;"See full answer

    Data Scientist
    SQL
    +3 more
  • +17

    "SELECT u.user_id, u.user_name, u.email, ROUND(AVG(CASE WHEN b.status = 'Unmatched' THEN 1.0 ELSE 0 END), 2) AS avgunmatchedbookings FROM users u LEFT JOIN bookings b ON u.userid = b.userid GROUP BY u.user_id, u.user_name, u.email; `"

    Akshay D. - "SELECT u.user_id, u.user_name, u.email, ROUND(AVG(CASE WHEN b.status = 'Unmatched' THEN 1.0 ELSE 0 END), 2) AS avgunmatchedbookings FROM users u LEFT JOIN bookings b ON u.userid = b.userid GROUP BY u.user_id, u.user_name, u.email; `"See full answer

    Data Scientist
    SQL
    +3 more
  • +11

    "--country names are UPPERCASE but the table in the in the question showing lowercase. That's why it took me a while to figure it out until I ran the country column WITH RECURSIVE Hierarchy AS ( SELECT e.Emp_ID, CONCAT(e.FirstName, ' ', e.MiddleName, ' ', e.LastName) AS FullName, e.Manager_ID, 0 AS Level, CASE WHEN e.Country = 'IRELAND' THEN s.Salary * 1.09 WHEN e.Country = 'INDIA' THEN s.Salary * 0.012 ELSE s.Salary "

    Victor N. - "--country names are UPPERCASE but the table in the in the question showing lowercase. That's why it took me a while to figure it out until I ran the country column WITH RECURSIVE Hierarchy AS ( SELECT e.Emp_ID, CONCAT(e.FirstName, ' ', e.MiddleName, ' ', e.LastName) AS FullName, e.Manager_ID, 0 AS Level, CASE WHEN e.Country = 'IRELAND' THEN s.Salary * 1.09 WHEN e.Country = 'INDIA' THEN s.Salary * 0.012 ELSE s.Salary "See full answer

    Data Scientist
    SQL
    +3 more
  • +20

    "select name, stock from products p left join transactions t on p.id = t.product_id order by date desc limit 1"

    Daniel C. - "select name, stock from products p left join transactions t on p.id = t.product_id order by date desc limit 1"See full answer

    Data Scientist
    SQL
    +3 more
  • +13

    "The user table no longer exists as expected - I get an error that user does not contain user_id. Note that querying the table results in only user:swuoevkivrjfta select * FROM user `"

    Evan R. - "The user table no longer exists as expected - I get an error that user does not contain user_id. Note that querying the table results in only user:swuoevkivrjfta select * FROM user `"See full answer

    Data Scientist
    SQL
    +3 more
  • Google logoAsked at Google 
    +2

    "WITH RECURSIVE fibonacci_series AS ( SELECT 1 AS n, 0 AS fib1, 1 AS fib2 UNION ALL SELECT n + 1 AS n, fib2 AS fib1, fib1 + fib2 AS fib2 FROM fibonacci_series WHERE n < 20 -- Limit the series to 20 numbers ) SELECT n, fib1 AS fib FROM fibonacci_series ORDER BY n; `"

    Yashasvi V. - "WITH RECURSIVE fibonacci_series AS ( SELECT 1 AS n, 0 AS fib1, 1 AS fib2 UNION ALL SELECT n + 1 AS n, fib2 AS fib1, fib1 + fib2 AS fib2 FROM fibonacci_series WHERE n < 20 -- Limit the series to 20 numbers ) SELECT n, fib1 AS fib FROM fibonacci_series ORDER BY n; `"See full answer

    Data Scientist
    SQL
    +4 more
  • +8

    "In the question it says: "above the overall average total posts", which to me implying a >, yet in the solution it uses >= Caused me 1 hr to find out. plz fix"

    Peter W. - "In the question it says: "above the overall average total posts", which to me implying a >, yet in the solution it uses >= Caused me 1 hr to find out. plz fix"See full answer

    Data Scientist
    SQL
    +3 more
  • Amazon logoAsked at Amazon 

    "1) select avg(session) from table where session> 180 2) select round(sessiontime/300)*300 as sessionbin, count() as sessioncount from table group by round(sessiontime/300)300 order by session_bin 3) SELECT t1.country AS country_a, t2.country AS country_b FROM ( SELECT country, COUNT(*) AS session_count FROM yourtablename GROUP BY country ) AS t1 JOIN ( SELECT country, COUNT(*) AS session_count FROM yourtablename `GROUP BY countr"

    Erjan G. - "1) select avg(session) from table where session> 180 2) select round(sessiontime/300)*300 as sessionbin, count() as sessioncount from table group by round(sessiontime/300)300 order by session_bin 3) SELECT t1.country AS country_a, t2.country AS country_b FROM ( SELECT country, COUNT(*) AS session_count FROM yourtablename GROUP BY country ) AS t1 JOIN ( SELECT country, COUNT(*) AS session_count FROM yourtablename `GROUP BY countr"See full answer

    Data Scientist
    SQL
    +4 more
  • +16

    "with max_score as( select test_id, employee_id, max(score) as max_score from test_results as t1 join tests as t2 on t1.test_id=t2.id group by 2,1 ) select id as employee_id, name as employee_name, sum(maxscore) as totalscore from employees as e join max_score as m on e.id=m.employee_id group by 1,2 order by 3 desc, 1 `"

    Victor N. - "with max_score as( select test_id, employee_id, max(score) as max_score from test_results as t1 join tests as t2 on t1.test_id=t2.id group by 2,1 ) select id as employee_id, name as employee_name, sum(maxscore) as totalscore from employees as e join max_score as m on e.id=m.employee_id group by 1,2 order by 3 desc, 1 `"See full answer

    Data Scientist
    SQL
    +3 more
  • +4

    "-- Write your query here WITH high_value AS( SELECT user_id FROM user_sessions JOIN attribution ON usersessions.sessionid = attribution.session_id GROUP BY user_id HAVING SUM(purchase_value) > 100 ORDER BY SUM(purchase_value) DESC ) SELECT usersessions.userid, marketing_channel FROM user_sessions JOIN high_value ON usersessions.userid = highvalue.userid JOIN attribution ON usersessions.sessionid = attribution.session_id GROUP BY usersessions.userid H"

    Alina G. - "-- Write your query here WITH high_value AS( SELECT user_id FROM user_sessions JOIN attribution ON usersessions.sessionid = attribution.session_id GROUP BY user_id HAVING SUM(purchase_value) > 100 ORDER BY SUM(purchase_value) DESC ) SELECT usersessions.userid, marketing_channel FROM user_sessions JOIN high_value ON usersessions.userid = highvalue.userid JOIN attribution ON usersessions.sessionid = attribution.session_id GROUP BY usersessions.userid H"See full answer

    Data Scientist
    SQL
    +3 more
  • Data Scientist
    SQL
    +3 more
  • Deloitte logoAsked at Deloitte 

    "BETWEEN and HAVING clauses in SQL serve different purposes: 1. BETWEEN Clause Used to filter rows based on a range of values. Works with numeric, date, or text values. Can be used with WHERE or HAVING clauses. The range includes both lower and upper bounds. Example: Filtering employees with salaries between 30,000 and 50,000 `SELECT * FROM Employees WHERE salary BETWEEN 30000 AND 50000;` 2. HAVING Clause Used to filter **groups"

    Meenakshi D. - "BETWEEN and HAVING clauses in SQL serve different purposes: 1. BETWEEN Clause Used to filter rows based on a range of values. Works with numeric, date, or text values. Can be used with WHERE or HAVING clauses. The range includes both lower and upper bounds. Example: Filtering employees with salaries between 30,000 and 50,000 `SELECT * FROM Employees WHERE salary BETWEEN 30000 AND 50000;` 2. HAVING Clause Used to filter **groups"See full answer

    Data Scientist
    SQL
    +4 more
Showing 1-20 of 26