Window Functions Essentials
Window functions are a core interview skill because they let you compute rankings, running totals, lag/lead comparisons, and “top N per group” without losing row-level detail.
In this lesson, we will cover how window functions work, the difference between windowed aggregates vs. GROUP BY, and when to use ROW_NUMBER, RANK, DENSE_RANK, LAG, and LEAD. You’ll learn the mental model to avoid common pitfalls (like incorrect partitions or ordering) and to recognize when a window function is the cleanest solution.
Example prompts include:
“Find each user’s most recent purchase.”
and
“Compute a 7-day rolling average of daily revenue.”
All examples below use the sales and orders tables in "Example data" section (at the bottom). Refer to that section if you want to confirm columns or relationships.
Top-K per group
Interview prompt: Find the top 2 orders per user.
SQLSELECT *
FROM (
SELECT o.*,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY amount DESC, o_id DESC
) AS rn
FROM orders o
) t
WHERE rn <= 2
ORDER BY user_id, rn;
Output
What this does
- For each user, takes the highest 2 orders by amount.
Why this works
- ROW_NUMBER() resets per user and ensures exactly two rows at most.
Gotchas
- If you want all ties at rank 2, use RANK() instead.
Clarifying questions
- Top by which metric?
- If tied, how should we break ties?
- Exactly K or all ties within K?
Common pitfalls
- Using RANK() and accidentally returning more than 2 rows.
- Missing deterministic tiebreaker.
Edge cases
- Use DENSE_RANK() for no gaps in ranking.
Latest row per key
Interview prompt: Find each user’s latest order.
SQLSELECT o_id, user_id, created_at, amount
FROM (
SELECT o.*,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY created_at DESC, o_id DESC
) AS rn
FROM orders o
) t
WHERE rn = 1
ORDER BY user_id;
Output
What this does
- Returns the most recent order per user.
Why this works
- Descending order + ROW_NUMBER() ensures one unambiguous “latest.”
Gotchas
- If status matters (e.g., only paid orders), filter inside the subquery.
Clarifying questions
- Latest by which timestamp?
- What if multiple rows have identical timestamps?
Common pitfalls
- Using MAX() then joining back without a unique key.
Edge cases
- Want latest 3? Use
rn <= 3.
Time deltas
Interview prompt: For each user-day, show previous spend and day-over-day change (row-to-row).
SQLSELECT user_id, dt, spend,
LAG(spend) OVER (
PARTITION BY user_id
ORDER BY dt
) AS prev_spend,
spend - LAG(spend) OVER (
PARTITION BY user_id
ORDER BY dt
) AS delta_spend
FROM sales
ORDER BY user_id, dt;
Output
What this does
- Shows previous row’s spend and the change from it.
Why this works
- LAG() exposes prior row value without collapsing rows.
Gotchas
- This is “previous row,” not “previous calendar day.”
Clarifying questions
- Should missing dates be treated as zero? Or fill forward?
- Should first row default to NULL or 0?
Common pitfalls
- Forgetting PARTITION BY.
- Assuming contiguous dates.
Edge cases
- Use default: LAG(spend, 1, 0).
- For true calendar deltas, generate a dense date table.
Running totals
Interview prompt: Compute cumulative spend per user over time.
SQLSELECT user_id, dt, spend, SUM(spend) OVER ( PARTITION BY user_id ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cum_spend FROM sales ORDER BY user_id, dt;
Output
What this does
- Cumulative spend per user.
Why this works
- Explicit ROWS frame prevents peer-row inflation (a known issue with RANGE when ordering by timestamps).
Clarifying questions
- Reset by which dimension?
- What happens when timestamps tie?
Common pitfalls
- Relying on implicit RANGE (which groups peer rows).
- Missing deterministic ordering.
Edge cases
- Rolling windows require changing the frame clause.
Rolling windows
Last 2 rows per user
Interview prompt: Compute a rolling sum over the current + previous row (per user).
SQLSELECT user_id, dt, spend, SUM(spend) OVER ( PARTITION BY user_id ORDER BY dt ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) AS sum_last_2_rows FROM sales ORDER BY user_id, dt;
Output
What this does
- Rolling sum of current + previous row.
Why this works
- ROWS refers to physical rows, so date gaps are irrelevant.
Gotchas
- Not equivalent to “last 2 days” if dates skipped.
Last N days
Interview prompt: Compute a time-based rolling sum over the last 2 calendar days (per user).
SQLSELECT user_id, dt, spend, SUM(spend) OVER ( PARTITION BY user_id ORDER BY dt RANGE BETWEEN INTERVAL '2 days' PRECEDING AND CURRENT ROW ) AS sum_last_2_days FROM sales ORDER BY user_id, dt;
What this does
- Includes rows within the last 2 calendar days.
Why this works
- RANGE INTERVAL creates a time-based frame.
Clarifying questions
- What if multiple rows share a timestamp?
- Does SQL engine support INTERVAL windows? (not all do)
Common pitfalls
- Using RANGE INTERVAL in Postgres versions that don’t support it.
- Ties may include peer rows without secondary sort order.
Edge cases
- If multiple rows exist for same date, pre-aggregate or add tiebreaker.
Example data
We used these tables in the examples above to demonstrate window functions.
sales (daily spend per user)
orders (one row per order)
Questions
Warm up
Practice
- Netflix genre ratings (Medium)
- Amazon order status (Hard)
- Find monthly revenue growth (Hard)
- E-commerce: Second earliest order (Hard)
Go deeper
Want to dive deeper into window functions? Check out our more detailed lesson here: Window Functions