SQL is a fundamental topic in data science interviews at all levels. Querying, manipulating, and interpreting data are key skills for any data scientist.
For junior roles, you will likely be tested on basic querying and aggregation concepts. For more senior interviews, you should be familiar with advanced concepts like window functions, CTEs, and complex data transformations.
Below, we've compiled a list of example SQL interview questions indicative of the types of questions you can expect in real data science interviews. We created this list with input from data analysts, data scientists, and machine learning engineers at Meta, Google, Dropbox, Amazon, and top consulting firms.
Each question includes a sample answer and explanation of the underlying SQL concepts.
Sneak peek:
- Watch a senior SQL developer answer: "Explain SQL stored procedures."
- Watch a Meta data scientist answer: "Find employees who earn more than their managers."
- Watch a venture capitalist answer: "Calculate e-commerce orders from yesterday."
This guide was written and compiled by Derrick Mwiti, a senior data scientist and course instructor.
Key Takeaways
- Junior Data Scientists: Expect SQL questions on basic querying, syntax, and data retrieval using functions like
WHERE,INNER JOIN,ORDER BY,GROUP BY, and aggregations likeCOUNT(),AVG(), andMAX(). - Mid-level Candidates: Know different types of joins, nested and correlated subqueries,
CASE WHEN, date and time manipulation,WITHclauses,UNION,INTERSECT,EXCEPT, and string functions likeCONCAT()andREPLACE(). - Senior Candidates: Be comfortable with window functions, pivot tables, data transformation, normalization, and handling complex data use-cases.
Beginner SQL
Early in your data career, your SQL interviews will likely be focused on basic querying, syntax, and data retrieval, such as:
- Using
WHEREclauses to filter rows based on specific conditions. - Combining data from multiple tables using different types of joins such as
INNER JOINandLEFT JOIN. - Using functions like
COUNT(),SUM(),AVG(),MIN(), andMAX()to aggregate. - Using
GROUP BYto aggregate data into summary rows. - Using
ORDER BYto sort query results.
Here are some examples of common SQL questions you might hear:
1. Employees Earning More Than Managers
You are given an employee table with the following columns: employee_id, name, salary, department_id, and manager_id.
Identify the employees who earn more than their direct managers. The expected output is: employee_id, employee_name.
Here is a sample solution:
SELECT
e.employee_id,
e.name AS employee_name
FROM
employee e
JOIN
employee m ON e.manager_id = m.employee_id
WHERE
e.salary > m.salary;- First, the
employeetable is aliased asefor employees andmfor managers. - Managers and employees are joined by the formula
e.manager_id = m.employee_id. - Employees whose salary exceeds that of their manager are filtered out by the
WHEREclausee.salary > m.salary. - The employees who meet the condition have their
employee_nameandemployee_idretrieved usingSELECT.
2. Average Unmatched Bookings on Lyft
You work for Lyft, a ride-sharing company and are tasked with finding the average number of unmatched bookings per user. The expected output is: user_id, user_name, email, and avg_unmatched_bookings (rounded to the nearest 2 decimal places).
You have a users table with user_id, user_name, email. You also have a bookings table with booking_id, user_id, driver_id, booking_time, status.
Here's a sample solution:
SELECT
u.user_id,
u.user_name,
u.email,
COALESCE(ROUND(AVG(CASE WHEN b.status = 'Unmatched' THEN 1 ELSE 0 END), 2), 0) AS avg_unmatched_bookings
FROM users u
LEFT JOIN bookings b ON u.user_id = b.user_id
GROUP BY u.user_id, u.user_name, u.email;- First, use the
LEFT JOINto combine theuserstable with thebookingstable on theuser_idcolumn. This ensures that even users with no bookings are included in the result. - Then, use the
CASEstatement within theAVG()function to compute the average number of bookings for each user. Unmatched bookings are counted as 1, or 0 otherwise. - Use the
COALESCEfunction together with theROUNDfunction to ensure there are no null values. Then, the average is rounded to two decimal places.
3. Successful Posts Per User Type
Show the total amount of successful posts a particular user_type made in November 2023. Then, order your output by descending success rate.
Your output should include: user_type, post_success (number of successful posts), post_attempt (number of total posts), and post_success_rate (ranging from 0.00 to 1.00).
You've been given a post table with post_id, post_date, user_id, interface, and is_successful_post. You also have a post_user table with user_id, user_type, and age.
A sample solution looks like:
SELECT
pu.user_type,
COUNT(CASE WHEN p.is_successful_post = 1 THEN 1 END) AS post_success,
COUNT(p.post_id) AS post_attempt,
ROUND(COUNT(CASE WHEN p.is_successful_post = 1 THEN 1 END) * 1.0 / COUNT(p.post_id), 2) AS post_success_rate
FROM
post p
JOIN
post_user pu ON p.user_id = pu.user_id
WHERE
p.post_date >= '2023-11-01' AND p.post_date < '2023-12-01'
GROUP BY
pu.user_type
ORDER BY
post_success_rate DESC;- Use the
user_idcolumn to join theposttable with thepost_usertable to get the user type for each post. - Use the
WHEREfunction to filter posts from November 2023. - Count the number of successful posts using
COUNT(CASE WHEN p.is_successful_post = 1 THEN 1 END). - Round the answer to two decimal places using
ROUND. To guarantee that the division function is performed in floating-point, multiply the result by1.0. - Arrange the results using
GROUP BY pu.user_type. - Sort
post_success_ratein descending order withDESC.
4. Average Duration of a Marketing Campaign
Calculate the average duration of marketing campaigns, given three tables.
First, you have a campaign table with an upsell_campaign_id, date_start, and date_end. Then, you have a user table with user_id, name, and is_eligible_for_upsell_campaign. Finally, there's a transaction table withtransaction_id, user_id, product_id, transaction_date, and quantity.
Here is a sample solution:
SELECT AVG(DATEDIFF(date_end, date_start)) AS avg_campaign_length
FROM campaign;- Find the difference between the start date and end date of a campaign using
DATEDIFF. - Then, calculate the average using
AVG.
5. Total Viewership on Laptops and Mobile Devices
Imagine you work at Netflix. Calculate the total viewership from laptops compared to mobile devices.
There are three types of devices that viewers use:phone, tablet, and laptop. Mobile viewership is computed by summing the tablet and phone viewership. You also have a viewership table with user_id, device_type, and view_time.
Here's a sample solution:
SELECT
SUM(CASE WHEN device_type = 'laptop' THEN 1 ELSE 0 END) AS laptop_views,
SUM(CASE WHEN device_type IN ('phone', 'tablet') THEN 1 ELSE 0 END) AS mobile_views
FROM
viewership;- Use
SUMwithCASEstatements to conditionally count the number of viewership records for eachdevice_type.
6. Average Star Rating of Product by Month
Compute the average star rating for each product grouped by the month given a reviews table with these columns: review_id, user_id, submit_date, product_id, and stars.
Display the month as a numerical value, product ID, and average star rating rounded to two decimal places. Sort the results by the first month followed by the product ID.
For example:
SELECT
EXTRACT(MONTH FROM submit_date) AS mth,
product_id,
ROUND(AVG(stars), 2) AS avg_stars
FROM
reviews
GROUP BY
EXTRACT(MONTH FROM submit_date), product_id
ORDER BY
mth, product_id;- First,
EXTRACTthe month from thesubmit_datecolumn. - Then,
ROUNDcomputes the average to 2 decimal places. - Finally,
GROUP BYandORDER BYorder the data by month and product ID.
7. Discount Product Catalog
You work for Amazon. The company likes to provide seasonal discounts. This year, electronics are discounted at 10%, clothing at 20%, groceries at 5%, and books at 15%.
You are given a products table with name, type, and price. Generate a discounted product catalog that displays the name, type, and discounted_price for each item. Round the discounted prices to the nearest cent.
SELECT
name,
type,
ROUND(price *
CASE
WHEN type = 'Electronic' THEN 0.90
WHEN type = 'Clothing' THEN 0.80
WHEN type = 'Grocery' THEN 0.95
WHEN type = 'Book' THEN 0.85
ELSE 1.00
END, 2) AS discounted_price
FROM
products;CASEapplies the appropriate discount based on the product.ROUND (..., 2)rounds the price to the nearest cent.SELECTpicks the name, type, and discounted price.
8. Differences in Credit Cards Issued
You work for a bank like Capital One that wants to launch a new credit card. You are given a table with the number of credit cards issued each month that includes card_name, issued_amount, issue_month, and issue_year.
Output the name of each credit card and the difference in the total number of credit cards between the highest and lowest issuance month. Arrange the output based on the highest disparity.
SELECT
card_name,
MAX(issued_amount) - MIN(issued_amount) AS difference
FROM
monthly_cards_issued
GROUP BY
card_name
ORDER BY
difference DESC;MAX(issued_amount) - MIN(issued_amount)computes the difference between the highest and lowest issuance amounts for each card.GROUP BY card_namegroups the results by card name to calculate the difference for each card.ORDER BY difference DESCsorts the results by the difference in descending order.
9. Transactions By City
Fetch the number of transactions made by users in specific cities. Then, order by descending number of transactions.
You're given 2 tables. First, a users table with id, first_name, last_name, user_city, and email. You also have a transactions table with id, customer_id, product_id, currency_code, date, and amount.
Here's a sample solution:
SELECT
u.user_city,
COUNT(t.id) AS number_of_transactions
FROM
users u
JOIN
transactions t ON u.id = t.customer_id
GROUP BY
u.user_city
ORDER BY
number_of_transactions DESC;- The
userstable is joined with thetransactionstable on thecustomer_idto get the transactions for each user. COUNT(t.id)counts the number of transactions for each user city.- The results are grouped by
user_cityto aggregate the transaction counts per city. - The results are ordered by the
number_of_transactionsin descending order.
Intermediate SQL
Data scientists use SQL to clean, transform, and preprocess data. As your data science career advances, so too will your SQL interviews.
Your roles will begin to require you to be able to process, manipulate, and summarize larger datasets.
For this reason, mid-level data scientists should expect to get asked SQL questions about:
- Using different types of joins in more complex scenarios.
- Writing nested and correlated subqueries.
- Using functions like
ROW_NUMBER(),RANK(),DENSE_RANK(),LEAD(), andLAG(). - Using
CASE WHENfor conditional logic in queries. - Manipulating and calculating dates and times.
- Using
WITHclauses to define temporary result sets. - Using
UNION,INTERSECT, andEXCEPTto combine results from multiple queries. - Manipulating string data using functions like
CONCAT(),SUBSTRING(),REPLACE().
Here are some questions you can practice:
10. Revenue by Country
Given a table called purchase_orders, with quantity and price columns, compute the total_revenue. Group the total revenue by country, then order by the total revenue.
Here's a sample solution:
SELECT SUM(quantity * price) AS total_revenue
FROM purchase_orders
GROUP BY country
ORDER
BY total_revenue DESC;- The table doesn’t have a total revenue column. Compute it by multiplying the
quantityandpricecolumns. SUMthem to find the total revenue.
11. Categorize Orders by Size
Given a table of orders with quantity, categorize each order into "small" and "large" orders with a new size column using the CASE WHEN function.
Large orders have quantity >= 2.
SELECT
*,
CASE
WHEN quantity >= 2 THEN 'large'
ELSE 'small'
END AS size
FROM
orders;CASE WHENis similar to if-then-else statements. It's used to evaluate one or more conditions and returns a value based on its evaluation.
12. Orders Per Month
Given a table of orders with timestamps in the order_date column, aggregate and return the number of orders per month.
SELECT
DATE(order_date, 'start of month') AS order_month,
COUNT(*) AS no_orders
FROM orders
GROUP BY order_month;- For SQLite, use
DATE(column, 'start of month'). - For PostgreSQL, use
DATE_TRUNC('month', column). - For MySQL, use
EXTRACT(MONTH FROM column).
13. Retention Based on Signup and Churn Dates
Imagine you work at a company with monthly subscriptions. Given a table of users, compute the average retention based on the month of the signup_date and churn_date. Group the results by signup_month.
SELECT
date_trunc('month', signup_date) AS signup_month,
AVG(datediff('day', signup_date, churn_date)) AS avg_retention
FROM users
GROUP BY signup_month;- Find the difference in days between the
signup_dateand thechurn_dateusingdatediffwhile also finding the signup month.
14. Orders with Customer Data from March
You're given tables for customers and orders. There's a foreign key customer_id, which references customers.id. Return all orders with their associated customer data from March 2022.
Here's a sample solution:
SELECT orders.id as order_id, customers.*
FROM orders
INNER JOIN customers
ON orders.customer_id = customers.id
WHERE orders.order_date BETWEEN '2022-03-01' and '2022-04-01';orders.id as order_idrenames theidcolumn from theorderstable toorder_idfor clarity.customers.*selects all columns from thecustomerstable.INNER JOINjoins theordersandcustomerstables. The join condition is that thecustomer_idin theorderstable matches theidin thecustomerstable.WHEREandBETWEENfilter orders between March and April of 2022.
15. Customers with At Least One Order Using Subqueries
Given customers and orders tables with id and customer_id columns, use subqueries to return all customers who have at least one order.
Here's a sample solution:
SELECT *
FROM customers
WHERE (SELECT COUNT(*)
FROM orders
WHERE customers.id = orders.customer_id) > 0;- The subquery
SELECT COUNT()counts the number of orders for each customer. - The
WHEREclause ensures that only customers with at least one order are counted.
Alternatively:
SELECT *
FROM customers
WHERE id IN (SELECT customer_id FROM orders);- The subquery
SELECT customer_id FROM ordersretrieves allcustomer_idvalues from theorderstable. - The main query selects all columns from the customers table where the
idmatches any of thecustomer_idvalues returned by the subquery.
16. Customers with At Least One Order Using CTEs
Solve the same problem as above. This time, use Common Table Expressions (CTEs) instead of subqueries.
CTEs are used for breaking down complex queries into more manageable parts and are similar to subqueries, however they are defined before the main query. Subqueries, on the other hand, are defined within the main query.
CTEs can be referenced multiple times in a single query, whereas subqueries can't. However, subqueries are more efficient because the database engine easily optimizes them.
Here's a sample solution:
WITH customer_orders AS (
SELECT customer_id, COUNT(*) as num_orders
FROM orders
GROUP BY customer_id
)
SELECT customers.*
FROM customers
JOIN customer_orders
ON customers.id = customer_orders.customer_id
WHERE customer_orders.num_orders > 0;- The
WITHkeyword is used to define the CTE. TheASkeyword is used to define the query for populating it. - The name of the CTE is
customer_orders. The CTE is then used in the main query as any other table. - The
customer_ordersCTE in the main query finds customers with at least one order.
17. Orders Ranked by Total Price
Given an orders table with price and quantity columns, write a query to return a list of orders ranked by total price.
A sample answer might look like:
SELECT
orders.*,
SUM(quantity * price) AS total_price,
RANK() OVER (ORDER BY SUM(quantity * price) DESC) AS rank
FROM
orders
GROUP BY
id, customer_id;- Multiply the
priceandquantityto get the total price. - Use the
RANKwindow function. It's general syntax is:RANK() OVER ( [PARTITION BY partition_expression] [ORDER BY order_expression] ) - The
PARTITION BYandORDER BYclauses determine how the rows will be partitioned and ordered for processing by theRANKfunction. - The
RANKfunction assigns the same rank when there is a tie. DENSE RANKdoesn’t skip rank values for ties. If two rows have the same value, they will receive the same rank, but the next row will get the next rank instead of skipping the rank value like inRANK.
Advanced SQL
Senior data scientists are trusted to handle extremely large datasets.
Large companies like Meta, Microsoft, TikTok, and Google need data scientists who can clean, synthesize, and interpret large amounts of information.
In combination with machine learning, senior scientists help to make sense of trends and make predictions for the future.
Before stepping into a senior SQL interview in your data science loops, consider brushing up on concepts like:
- Window functions like
RANK,DENSE RANK,ROW_NUMBER, andLAGfor complex calculations and analytics. - Efficient queries and and execution plans.
- Using recursive CTEs for hierarchical or recursive data structures.
- Transforming rows into columns with pivot tables.
- Implementing intricate data transformations and aggregations.
- Understanding and using transactions, including
COMMIT,ROLLBACK, andSAVEPOINT. - Understanding normalization, indexing, and schema design principles.
Here are some questions you can practice:
18. Top Earning Employee by Department
Write a SQL query to fetch the top-earning employee by department, ordered by department name.
You've been given the following schema.
employees projects
+---------------+---------+ +---------------+---------+
| id | int |<----+ +->| id | int |
| first_name | varchar | | | | title | varchar |
| last_name | varchar | | | | start_date | date |
| salary | int | | | | end_date | date |
| department_id | int |--+ | | | budget | int |
+---------------+---------+ | | | +---------------+---------+
| | |
departments | | | employees_projects
+---------------+---------+ | | | +---------------+---------+
| id | int |<-+ | +--| project_id | int |
| name | varchar | +-----| employee_id | int |
+---------------+---------+ +---------------+---------+Your answer should be in the following format:
department_name | employee_id | first_name | last_name | salary
----------------+-------------+------------+-----------+--------
varchar | int | varchar | varchar | intHere's a sample solution:
WITH ranked_employees AS (
SELECT
e.id AS employee_id,
e.first_name,
e.last_name,
e.salary,
d.name AS department_name,
ROW_NUMBER() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS rank
FROM
employees e
JOIN
departments d ON e.department_id = d.id
)
SELECT
department_name,
employee_id,
first_name,
last_name,
salary
FROM
ranked_employees
WHERE
rank = 1
ORDER BY
department_name;- The CTE
ranked_employeesranks employees within each department based on their salary in descending order. - The
ROW_NUMBER()function is used to assign a rank to each employee within their department. - The main query selects the top-ranked employee (rank = 1) from each department. This results in only one top earner in each department.
- The
employeestable is joined with thedepartmentstable to getdepartment _name. The result is then ordered by department name.
19. Group Users by Number of Tweets in 2022
Given a tweets table with tweet_id, user_id, msg, and tweet_date, group the users by the number of tweets they posted in 2022. Count the number of users in each group.
WITH tweet_cte AS (
SELECT user_id, COUNT(*) as tweet_bucket
FROM tweets
WHERE EXTRACT(year from tweet_date) = 2022
GROUP BY user_id
)
SELECT tweet_bucket, COUNT(*) as users_num
FROM tweet_cte
GROUP BY tweet_bucket;- The
tweet_ctecounts tweets per user for 2022, resulting inuser_idandtweet_bucket. - The main query then groups and counts the users by
tweet_bucket.
20. Monthly Active Users in July 2022
You have a table of Facebook user_actions. Obtain the number of monthly active users in July 2022, including the month in numerical format "1, 2, 3."
The user_actions table has user_id, event_id, event_type, and event_date.
Here's a sample solution:
WITH june_actions AS (
SELECT DISTINCT user_id
FROM user_actions
WHERE event_date BETWEEN '2022-06-01' AND '2022-06-30'
AND event_type IN ('sign-in', 'like', 'comment')
),
july_actions AS (
SELECT DISTINCT user_id
FROM user_actions
WHERE event_date BETWEEN '2022-07-01' AND '2022-07-31'
AND event_type IN ('sign-in', 'like', 'comment')
)
SELECT 7 AS month, COUNT(*) AS monthly_active_users
FROM july_actions
WHERE user_id IN (SELECT user_id FROM june_actions);- The
june_actionsandjuly_actionsCTEs selectdistinct user_idswho performed relevant actions in June and July 2022, respectively. - The main query counts the number of users who appear in both
june_actionsandjuly_actions, thus identifying users active in both months.
21. Success Rate of Post When Previous Post Failed
Given post and post_user tables, find the success rate of posts (%) when the user's previous post had failed.
The user table contains post_id, post_date, user_id, interface, and is_successful_post. The post_user table contains user_id, user_type, and age.
Your output should have the following columns: user_id and next_post_sc_rate (success rate of the post when the user’s previous post had failed). Order results by increasing next_post_sc_rate.
Here's a sample solution:
WITH post_seq AS (
SELECT
p.user_id,
p.post_id,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY post_date) AS post_seq_id,
is_successful_post
FROM post as p
),
post_pairings AS (
SELECT
ps.user_id,
ps.post_seq_id AS fail_post_id,
ps.post_seq_id + 1 AS next_post_id
FROM post_seq AS ps
WHERE ps.is_successful_post = 0
)
SELECT
pp.user_id,
ROUND(SUM(p2.is_successful_post) * 1.0 / COUNT(p2.is_successful_post), 2) AS next_post_sc_rate
FROM post_pairings AS pp
JOIN post AS p2
ON pp.next_post_id = p2.post_id
GROUP BY 1
ORDER BY next_post_sc_rate ASC;- The
post_seqCTE assigns a sequential IDpost_seq_idto each post per user based on thepost_date. - The
post_pairingsCTE identifies pairs of posts where the previous post was unsuccessful. Thenext_post_idis the sequential ID of the post following the unsuccessful post. - The final
SELECTjoins thepost_pairingswith theposttable to get details about the next post. It computes the success rate of the next posts following an unsuccessful post. SUM()calculates the ratio of successful next posts to the total number of next posts.ROUND(..., 2)rounds the success rate to two decimal places.GROUP BYgroups byuser_idand orders the results bynext_post_sc_ratein ascending order.
22. Identify Repeat Payments
Identify any payments made at the same merchant using the same credit card with the same amount within 10 minutes of each other. Such transactions should be considered as repeat payments.
The first transaction of such payments shouldn't be considered as a repeat payment. This means that if there are three transactions by a merchant with the same credit card, for the same amount within 10 minutes, only 2 payments will be considered as repeated.
The transactions table contains the following columns: transaction_id, merchant_id, credit_card_id, amount, and transaction_timestamp.
Here's a sample solution:
WITH PreviousTransactions AS (
SELECT
transaction_id,
merchant_id,
credit_card_id,
amount,
transaction_timestamp,
LAG(transaction_timestamp) OVER (
PARTITION BY merchant_id, credit_card_id, amount
ORDER BY transaction_timestamp
) AS previous_transaction
FROM transactions
),
RepeatPayments AS (
SELECT
transaction_id,
merchant_id,
credit_card_id,
amount,
transaction_timestamp
FROM PreviousTransactions
WHERE previous_transaction IS NOT NULL
AND transaction_timestamp <= previous_transaction + INTERVAL '10 MINUTE'
)
SELECT
COUNT(*) AS payment_count
FROM RepeatPayments;- The
PreviousTransactionsCTE usesLAG()to get the timestamp of the previous transaction for each partition ofmerchant_id,credit_card_id, andamount. It orders them bytransaction_timestampto ensure thatLAG()correctly identifies the previous transaction. RepeatPaymentsfilters transactions where there is a validprevious_transactionand thetransaction_timestampof the current transaction is within 10 minutes of theprevious_transaction.
23. Top Customers by Order Count in Last 5 Years
Given orders, departments, and customers tables, identify the customers who placed the most orders for each of the last 5 years.
The orders table has order_id, customer_id, order_date, order_amount, and department_id. The departments table has department_id and department_name. And the customers table has customer_id, first_name, and last_name.
Output a list of the 5 top customers. Your output should have the following columns: customer_id, first_name, last_name, and years.
WITH orders_per_year AS (
SELECT
c.customer_id,
c.first_name,
c.last_name,
EXTRACT(YEAR FROM o.order_date) AS years, -- PostgreSQL function for extracting the year
COUNT(o.order_id) AS total_orders,
rank() OVER (PARTITION BY EXTRACT(YEAR FROM o.order_date) ORDER BY COUNT(o.order_id) DESC) AS order_rankings
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE EXTRACT(YEAR FROM o.order_date) IN (2019, 2020, 2021, 2022, 2023)
GROUP BY EXTRACT(YEAR FROM o.order_date), c.customer_id, c.first_name, c.last_name
)
SELECT
first_name,
last_name,
total_orders,
years
FROM orders_per_year
WHERE order_rankings = 1
ORDER BY years ASC;24. Calculate Total Server Uptime
A cloud service provider like AWS wants to optimize the usage of its fleet of servers. Calculate the total time that the fleet of servers was running in days. The total time a server fleet is running is the sum of the server's uptime.
A server can start and stop several times. The server_utilization table has the following columns: server_id, status_time, and session_status (start, stop).
The expected output is total_uptime_days.
WITH SessionDurations AS (
SELECT
server_id,
status_time AS start_time,
LEAD(status_time) OVER (PARTITION BY server_id ORDER BY status_time) AS stop_time,
session_status
FROM server_utilization
),
FilteredSessions AS (
SELECT
server_id,
start_time,
stop_time
FROM SessionDurations
WHERE session_status = 'start' AND stop_time IS NOT NULL
),
CalculatedDurations AS (
SELECT
server_id,
EXTRACT(EPOCH FROM (stop_time - start_time)) AS uptime_seconds
FROM FilteredSessions
)
SELECT
ROUND(SUM(uptime_seconds) / (24 * 60 * 60), 0) AS total_uptime_days
FROM
CalculatedDurations;- In the
SessionDurationsCTE,LEADpairs each start time with the next stop time for the sameserver_id. - In
FilteredSessionsCTE, we filter rows where session_status = 'start' and stop_time is NOT NULL. - The
CalculatedDurationsCTE gets the duration of each session in seconds. - The final
SELECTsums theuptime_secondsand converts them to days by dividing by24 * 60 * 60.
25. Top 2 Players by Highest Score
You work for a leading game development company where players can team up and compete. Each player's performance in different game sessions is recorded as distinct score entries in the database.
You're provided a players table with player_id, player_name, and team_id columns and a scores table with score_id, player_id, and game_score.
Return the top 2 players from each team based on their single highest score across all sessions. If multiple players share the same highest score, include all of them, which may result in more than two top players for some teams.
Here's an example solution:
WITH PlayerMaxScores AS (
SELECT
p.team_id,
p.player_name,
MAX(s.game_score) AS max_score
FROM
players p
JOIN
scores s ON p.player_id = s.player_id
GROUP BY
p.team_id, p.player_name
),
RankedPlayers AS (
SELECT
team_id,
player_name,
max_score,
DENSE_RANK() OVER (PARTITION BY team_id ORDER BY max_score DESC) AS rank
FROM
PlayerMaxScores
)
SELECT
team_id,
player_name,
max_score
FROM
RankedPlayers
WHERE
rank <= 2
ORDER BY
team_id, max_score DESC, player_name;- The
PlayerMaxScoresCTE aggregates the maximum score for each player. - The
DENSE_RANK()window function in theRankedPlayersCTE assigns a rank to each player within their team based on their maximum score. DENSE_RANK()ensures that players with the same score get the same rank.- The final
SELECTpicks the top two players from each team.
SQL Interview Tips
It is impossible to cover all the possible questions since SQL is a broad and varied technology!
Hopefully, these questions have given you a glimpse into what to expect in your data science interviews.
- Explore dozens of mock interviews and practice lessons in our data science interview course.
- Schedule a free mock interview session to practice answering questions with other peers.
- Get data science interviewing coaching from scientists at top companies.
Good luck with your upcoming interview!
Learn everything you need to ace your data science interviews.
Exponent is the fastest-growing tech interview prep platform. Get free interview guides, insider tips, and courses.
Create your free accountRelated Courses

Data Science Interview Prep

SQL Interviews
Related Blog Posts

Top 25 Python Data Science Interview Questions (2026 Guide)

Top 25 Statistics Data Science Interview Questions (2026 Guide)

Data Science Career Path: Your Complete Guide


