Skip to main content

Grouping, Having, Conditional Aggregation & NULL-Safe Metrics

A huge share of interview questions boil down to computing the right metric at the right grain and doing it safely when data is messy.

In this lesson, we will focus on GROUP BY fundamentals, when to filter with WHERE vs. HAVING, and how to build metrics using conditional aggregation and ratios like conversion rate. You’ll also learn practical NULL-handling patterns so your metrics don’t silently break.

Example prompts include:

“Compute conversion rate by device.”

and

“Find creators with at least 3 active weeks.”

All examples below use the users, events, and orders tables in "Example data" section (at the bottom). Refer back to that section if you want to confirm keys or relationships.

WHERE vs HAVING

When do filter rows vs filter groups

“US signups” (use WHERE)

Interview prompt: Count signups in the US.

Idea: row filter before grouping → use WHERE

SQL
SELECT country, COUNT(*) AS signups FROM users WHERE country = 'US' GROUP BY country;

Output

countrysignups
US2

Explanation

  • WHERE filters rows (users from US) before grouping.

Clarifying questions

  • Filter rows (pre-agg) or groups by an aggregate result?

Common pitfalls

  • Putting a non-aggregate filter in HAVING (works but is slower/unclear).

Edge cases

  • If you later need to show all countries but only keep those with, say, ≥2 signups, that becomes a HAVING filter (next).

“Countries with ≥ 2 signups” (group filter → use HAVING)

Interview prompt: Return countries with at least 2 signups.

SQL
SELECT country, COUNT(*) AS signups FROM users GROUP BY country HAVING COUNT(*) >= 2;

Output

countrysignups
US2

Explanation

  • HAVING filters groups after aggregation (keep only groups where COUNT >= 2).

Clarifying questions

  • Threshold applies per group, right?
  • Include countries with zero signups? (impossible without a countries dim table and LEFT JOIN)

Common pitfalls

  • Using WHERE COUNT(*) >= 2 (invalid).
  • Misusing HAVING for simple row filters.

Edge cases

  • With LEFT JOIN to a dim table, HAVING can keep groups with zero signups if you pre-fill with 0 using COALESCE.

Conditional aggregation

Interview prompt: Find the number of signups, verified, activated users for each country.

SQL
SELECT u.country, COUNT(*) AS signups, SUM(CASE WHEN e1.has_verified = 1 THEN 1 ELSE 0 END) AS verified, SUM(CASE WHEN e2.has_activated = 1 THEN 1 ELSE 0 END) AS activated FROM users u LEFT JOIN ( SELECT user_id, 1 AS has_verified FROM events WHERE event_name = 'email_verified' GROUP BY user_id ) e1 ON e1.user_id = u.user_id LEFT JOIN ( SELECT user_id, 1 AS has_activated FROM events WHERE event_name = 'activated' GROUP BY user_id ) e2 ON e2.user_id = u.user_id GROUP BY u.country ORDER BY u.country;

Output

countrysignupsverifiedactivated
SG210
US211

Explanation

  • We keep all users by country and add boolean flags for downstream steps via LEFT JOINs to pre-aggregated 1-row-per-user tables, then sum the flags.

Clarifying questions

  • Is the funnel ever repeated per user? Count unique users once?
  • Time window for events?” (apply date filters inside the subqueries)

Common pitfalls

  • Joining the raw events table directly (1:n) → duplicates inflate counts.
  • Filtering event types in the outer WHERE after a LEFT JOIN → turns into INNER JOIN (drops users with no events).

Edge cases

  • Multiple email_verified events per user—pre-agg removes duplication.
  • Country moves after signup—decide if country should come from the user at signup time (snapshot needed).

Conversion rates & NULL-safe division

Interview prompt: Find verified rate by country (verified / signups).

SQL
SELECT country, COUNT(*) AS signups, SUM(CASE WHEN ev.user_id IS NOT NULL THEN 1 ELSE 0 END) AS verified, ROUND( 100.0 * SUM(CASE WHEN ev.user_id IS NOT NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 1 ) AS verified_rate_pct FROM users u LEFT JOIN ( SELECT DISTINCT user_id FROM events WHERE event_name = 'email_verified' ) ev ON ev.user_id = u.user_id GROUP BY country ORDER BY country;

Output

countrysignupsverifiedverified_rate_pct
SG2150.0
US2150.0

Explanation

  • Denominator is COUNT(*) from users (signups).
  • Numerator is count of users who verified (via LEFT JOIN to DISTINCT user_id).
  • NULLIF(COUNT(*), 0) prevents divide-by-zero. 100.0 forces float math.

Clarifying questions

  • “Rate over signups or over all users? Which date range?”
  • “Round to how many decimals? Percentage points or fraction?”

Common pitfalls

  • Integer division (0/1=0). Always multiply by 100.0 or cast to decimal.
  • Using COUNT(ev.user_id) vs SUM(ev.user_id IS NOT NULL)—both are fine, but ensure NULL semantics don’t bite you.

Edge cases

  • Countries with zero signups → division by zero (handled by NULLIF).
  • Multiple verify events per user → DISTINCT user_id (or pre-agg 1 row/user).

COUNT semantics & LEFT JOINs

Interview prompt: Show how many activated events each user has (keep users with 0).

SQL
SELECT u.user_id, COUNT(*) AS row_count_after_join, COUNT(ev.user_id) AS matched_events FROM users u LEFT JOIN events ev ON ev.user_id = u.user_id AND ev.event_name = 'activated' GROUP BY u.user_id ORDER BY u.user_id;

Output

user_idrow_count_after_joinmatched_events
111
210
310
410

Explanation

  • After a LEFT JOIN, users with no match still produce one null-extended row.
  • COUNT(*) counts that row (1). COUNT(ev.user_id) ignores NULL, so it’s 0.

Clarifying questions

  • “Do you want to count rows or count matches?”
  • “Should a user with zero matches appear? (LEFT vs INNER)”

Common pitfalls

  • Expecting COUNT(*) to be 0 when there’s no match—it’s 1 (the null-extended row).
  • Forgetting the filter event_name = ... on the JOIN itself → WHERE would turn LEFT into INNER.

Edge cases

  • If multiple matching events exist, row_count_after_join grows (duplication); prefer pre-agg to 1 row/user when needed.

Percentages

Interview prompt: Find paid revenue share by country.

SQL
WITH paid_rev AS ( SELECT u.country, SUM(o.amount) AS rev FROM orders o JOIN users u ON u.user_id = o.user_id WHERE o.status = 'paid' GROUP BY u.country ), total AS ( SELECT SUM(rev) AS total_rev FROM paid_rev ) SELECT p.country, p.rev, ROUND(100.0 * p.rev / NULLIF(t.total_rev, 0), 1) AS share_pct FROM paid_rev p CROSS JOIN total t ORDER BY p.country;

Output

countryrevshare_pct
SG00.0
US70100.0

Explanation

  • Compute per-country revenue from paid orders, then divide by the grand total. NULLIF protects against total 0; using a CTE avoids re-aggregating.

Clarifying questions

  • “Include only paid? Refunds/chargebacks?”
  • “Share across what scope—country, region, or overall?”

Common pitfalls

  • Dividing by SUM(amount) within each row without the grand total → every group shows 100%.
  • Forgetting to filter to paid → inflates denominators.

Edge cases

  • No paid revenue at all → total_rev = 0 (share becomes NULL; you may want 0.0).

HAVING with conditional aggregates

Interview prompt: Which countries have a verified rate is at least 50%.

SQL
SELECT u.country, COUNT(*) AS signups, SUM(CASE WHEN ev.user_id IS NOT NULL THEN 1 ELSE 0 END) AS verified FROM users u LEFT JOIN (SELECT DISTINCT user_id FROM events WHERE event_name='email_verified') ev ON ev.user_id = u.user_id GROUP BY u.country HAVING 1.0 * SUM(CASE WHEN ev.user_id IS NOT NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0) >= 0.50;

Output

countrysignupsverified
SG21
US21

(Both at 50%; if you need strictly greater than 50%, use > 0.50.)

Explanation

  • We filter groups (countries) based on an aggregate ratio (verified/signups).

Clarifying questions

  • ≥ 50% or > 50%? Round before compare or compare raw ratio?
  • Minimum signup count required? (add AND COUNT(*) >= N in HAVING)

Common pitfalls

  • Doing the ratio in WHERE (invalid).
  • Integer math → 0; force decimals with 1.0 * or casts.

Edge cases

  • If COUNT(*) is 0 → ratio is NULL (kept/dropped depending on predicate).

NULL-safe metrics patterns

Pattern 1: COUNT users who did X

SQL
SUM(CASE WHEN x_condition THEN 1 ELSE 0 END) -- or COUNT(CASE WHEN x_condition THEN 1 END) in engines that support COUNT of CASE

If COUNT was used instead of SUM. 0 will be counted as well. In this scenario, instead of 0 we have to use NULL to make sure the row is not counted when conditions are not met

Pattern 2: Rate with safe division & float math

SQL
100.0 * numerator / NULLIF(denominator, 0)

Pattern 3: Share that sums to 100

  • Compute group totals in CTE; CROSS JOIN total; divide; NULLIF guard.

Pattern 4: Keep zeros after LEFT JOIN

SQL
COALESCE(agg_value, 0) AS metric

Pattern 5: Distinct user actions

SQL
COUNT(DISTINCT user_id) -- beware performance; consider approx distinct if allowed

Example data

We used these tables in the examples above to demonstrate aggregate functions.

users

user_idcountrysigned_up_at
1US2025-01-01
2US2025-01-02
3SG2025-01-02
4SG2025-01-03

events (one row per user action)

user_idevent_namets
1signup2025-01-01
1email_verified2025-01-01
1activated2025-01-02
2signup2025-01-02
3signup2025-01-02
3email_verified2025-01-03
4signup2025-01-03

orders

order_iduser_idstatusamountcreated_at
101paid502025-01-03
111paid202025-01-05
122draft152025-01-03

Questions

Warm up

Practice

Go deeper

Want to dive deeper into SQL aggregations? Check out our more detailed lesson here: Introduction to SQL Aggregations