Joins & Duplicate Control
Joins show up in almost every SQL interview because most real datasets are split across multiple tables, and the hardest part is usually controlling row explosion and getting the right level of aggregation.
In this lesson, you’ll learn how to pick the right join type, diagnose many-to-many joins, and prevent duplicates with patterns like pre-aggregating, joining on unique keys, and using DISTINCT/deduping intentionally. You’ll also see common “existence” patterns (find entities that have / don’t have something) and how to avoid double-counting after joins.
Example prompts include:
“Count weekly active users by country from users + event.”
and
“Find customers who purchased in 2025 but not 2024.”
All examples below use the users, orders, payments, and order_items tables in "Example data" section (at the bottom). Refer to that section if you want to confirm keys or relationships.
Which join to use?
- INNER JOIN: keep rows that match on both sides.
- LEFT JOIN: keep all left rows; matched right rows if present; otherwise NULLs.
- RIGHT JOIN: mirror of LEFT; rarer (you should just swap table order instead).
- FULL JOIN: keep every row from both sides (not in MySQL).

Existence joins
A lot of “has / has-not” interview questions are really existence checks: return rows from the left table based on whether a matching row appears on the right.
“Has”
Interview prompt: Find users who placed at least one order.
SQLSELECT users.u_id, users.name
FROM users
INNER JOIN orders ON orders.u_id = users.u_id;
Output (each order makes a row):
If you want each user once:
SQLSELECT DISTINCT users.u_id, users.name
FROM users
INNER JOIN orders ON orders.u_id = users.u_id;
Output (deduped):
Clarifying questions
- Should users appear once or once per order?
- Do only status='paid' orders count?
Common pitfalls
- Expecting one row per user from an INNER JOIN when the right side is one-to-many.
Edge cases
- If filtering by order status/date, include those predicates (see warm-up question 1 at the bottom of this lesson).
“Has not”
Interview prompt: Find users with no orders.
Query (LEFT join + IS NULL on the join key)
SQLSELECT users.u_id, users.name
FROM users
LEFT JOIN orders ON orders.u_id = users.u_id
WHERE orders.u_id IS NULL;
Output
Clarifying questions
- No orders ever, or none in a date/status window?
Common pitfalls
- Using WHERE orders.status IS NULL (wrong): matched rows with NULL status would be miscounted.
- Putting right-side filters in WHERE turns your LEFT into an INNER. Put them on the JOIN line.
Edge cases
- Multiple orders per user: LEFT + IS NULL is still correct (any match cancels the NULL).
Duplicate control (avoid row inflation)
When you join across one-to-many tables, row counts can multiply and totals can get double-counted. A common fix is to pre-aggregate the “many” side to a unique key (here: one row per order), then join.
Pre-aggregate & join
Interview prompt: Find total paid per user when orders can have multiple payments.
Idea: Compute one payment total per order, then sum by user.
SQLWITH pay_per_order AS (
SELECT o_id, SUM(amount) AS paid_amount
FROM payments
GROUP BY o_id
)
SELECT users.u_id,
COALESCE(SUM(pay_per_order.paid_amount), 0) AS total_paid
FROM users
LEFT JOIN orders ON orders.u_id = users.u_id
LEFT JOIN pay_per_order ON pay_per_order.o_id = orders.o_id
GROUP BY users.u_id
ORDER BY users.u_id;
Output
Clarifying questions
- What grain is the result (per user / per order / per product)?
- Include users with zero spend?
Common pitfalls
- Summing a value from two “many” tables after joining both (double counting).
Edge cases
- Duplicate payment rows: de-duplicate first or use SUM(DISTINCT amount) carefully.
Quick rules of thumb
- Join types:
- INNER = matched only
- LEFT = keep all left
- FULL only when you need all rows from both sides
- For “has-not”, use LEFT … WHERE right.key IS NULL
- To prevent double counting: pre-aggregate the “many” side or pick latest per key
- Always set the grain (per user? per order?)
- When using LEFT JOIN, place right-side filters on the JOIN line
Example data
These tables are used in examples above to demonstrate to join functions.
users
orders
payments (revenue comes from here)
order_items (many items per order → duplication risk in joins)
Questions
Warm up
Practice
- Find campaign purchases (Medium)
- Project budgets (Medium)
- Sales by customer city (Medium)
- Total transaction volume (Hard)
Go deeper
Want to dive deeper into joins? Check out our more detailed lesson here: Joins