Skip to main content

Joins & Duplicate Control

Joins show up in almost every SQL interview because most real datasets are split across multiple tables, and the hardest part is usually controlling row explosion and getting the right level of aggregation.

In this lesson, you’ll learn how to pick the right join type, diagnose many-to-many joins, and prevent duplicates with patterns like pre-aggregating, joining on unique keys, and using DISTINCT/deduping intentionally. You’ll also see common “existence” patterns (find entities that have / don’t have something) and how to avoid double-counting after joins.

Example prompts include:

“Count weekly active users by country from users + event.”

and

“Find customers who purchased in 2025 but not 2024.”

All examples below use the users, orders, payments, and order_items tables in "Example data" section (at the bottom). Refer to that section if you want to confirm keys or relationships.

Which join to use?

  • INNER JOIN: keep rows that match on both sides.
  • LEFT JOIN: keep all left rows; matched right rows if present; otherwise NULLs.
  • RIGHT JOIN: mirror of LEFT; rarer (you should just swap table order instead).
  • FULL JOIN: keep every row from both sides (not in MySQL).

Join Diagram

Existence joins

A lot of “has / has-not” interview questions are really existence checks: return rows from the left table based on whether a matching row appears on the right.

“Has”

Interview prompt: Find users who placed at least one order.

SQL
SELECT users.u_id, users.name FROM users INNER JOIN orders ON orders.u_id = users.u_id;

Output (each order makes a row):

u_idname
1Ana
1Ana
3Cam

If you want each user once:

SQL
SELECT DISTINCT users.u_id, users.name FROM users INNER JOIN orders ON orders.u_id = users.u_id;

Output (deduped):

u_idname
1Ana
3Cam

Clarifying questions

  • Should users appear once or once per order?
  • Do only status='paid' orders count?

Common pitfalls

  • Expecting one row per user from an INNER JOIN when the right side is one-to-many.

Edge cases

  • If filtering by order status/date, include those predicates (see warm-up question 1 at the bottom of this lesson).

“Has not”

Interview prompt: Find users with no orders.

Query (LEFT join + IS NULL on the join key)

SQL
SELECT users.u_id, users.name FROM users LEFT JOIN orders ON orders.u_id = users.u_id WHERE orders.u_id IS NULL;

Output

u_idname
2Ben

Clarifying questions

  • No orders ever, or none in a date/status window?

Common pitfalls

  • Using WHERE orders.status IS NULL (wrong): matched rows with NULL status would be miscounted.
  • Putting right-side filters in WHERE turns your LEFT into an INNER. Put them on the JOIN line.

Edge cases

  • Multiple orders per user: LEFT + IS NULL is still correct (any match cancels the NULL).

Duplicate control (avoid row inflation)

When you join across one-to-many tables, row counts can multiply and totals can get double-counted. A common fix is to pre-aggregate the “many” side to a unique key (here: one row per order), then join.

Pre-aggregate & join

Interview prompt: Find total paid per user when orders can have multiple payments.

Idea: Compute one payment total per order, then sum by user.

SQL
WITH pay_per_order AS ( SELECT o_id, SUM(amount) AS paid_amount FROM payments GROUP BY o_id ) SELECT users.u_id, COALESCE(SUM(pay_per_order.paid_amount), 0) AS total_paid FROM users LEFT JOIN orders ON orders.u_id = users.u_id LEFT JOIN pay_per_order ON pay_per_order.o_id = orders.o_id GROUP BY users.u_id ORDER BY users.u_id;

Output

u_idtotal_paid
170
20
330

Clarifying questions

  • What grain is the result (per user / per order / per product)?
  • Include users with zero spend?

Common pitfalls

  • Summing a value from two “many” tables after joining both (double counting).

Edge cases

  • Duplicate payment rows: de-duplicate first or use SUM(DISTINCT amount) carefully.

Quick rules of thumb

  • Join types:
    • INNER = matched only
    • LEFT = keep all left
    • FULL only when you need all rows from both sides
  • For “has-not”, use LEFT … WHERE right.key IS NULL
  • To prevent double counting: pre-aggregate the “many” side or pick latest per key
  • Always set the grain (per user? per order?)
  • When using LEFT JOIN, place right-side filters on the JOIN line

Example data

These tables are used in examples above to demonstrate to join functions.

users

u_idname
1Ana
2Ben
3Cam

orders

o_idu_idstatuscreated_at
101paid2025-01-02
111draft2025-01-05
123paid2025-01-03

payments (revenue comes from here)

p_ido_idamountpaid_at
110502025-01-02
210202025-01-05
312302025-01-03

order_items (many items per order → duplication risk in joins)

o_idpr_idqtyprice
10101150
10102120
11103110
12101130

Questions

Warm up

Practice

Go deeper

Want to dive deeper into joins? Check out our more detailed lesson here: Joins