Grouping, Having, Conditional Aggregation & NULL-Safe Metrics
A huge share of interview questions boil down to computing the right metric at the right grain and doing it safely when data is messy.
In this lesson, we will focus on GROUP BY fundamentals, when to filter with WHERE vs. HAVING, and how to build metrics using conditional aggregation and ratios like conversion rate. You’ll also learn practical NULL-handling patterns so your metrics don’t silently break.
Example prompts include:
“Compute conversion rate by device.”
and
“Find creators with at least 3 active weeks.”
All examples below use the users, events, and orders tables in "Example data" section (at the bottom). Refer back to that section if you want to confirm keys or relationships.
WHERE vs HAVING
When do filter rows vs filter groups
“US signups” (use WHERE)
Interview prompt: Count signups in the US.
Idea: row filter before grouping → use WHERE
SQLSELECT country, COUNT(*) AS signups
FROM users
WHERE country = 'US'
GROUP BY country;
Output
Explanation
WHEREfilters rows (users from US) before grouping.
Clarifying questions
- Filter rows (pre-agg) or groups by an aggregate result?
Common pitfalls
- Putting a non-aggregate filter in
HAVING(works but is slower/unclear).
Edge cases
- If you later need to show all countries but only keep those with, say, ≥2 signups, that becomes a
HAVINGfilter (next).
“Countries with ≥ 2 signups” (group filter → use HAVING)
Interview prompt: Return countries with at least 2 signups.
SQLSELECT country, COUNT(*) AS signups
FROM users
GROUP BY country
HAVING COUNT(*) >= 2;
Output
Explanation
HAVINGfilters groups after aggregation (keep only groups whereCOUNT >= 2).
Clarifying questions
- Threshold applies per group, right?
- Include countries with zero signups? (impossible without a countries dim table and LEFT JOIN)
Common pitfalls
- Using
WHERE COUNT(*) >= 2(invalid). - Misusing
HAVINGfor simple row filters.
Edge cases
- With
LEFT JOINto a dim table,HAVINGcan keep groups with zero signups if you pre-fill with 0 usingCOALESCE.
Conditional aggregation
Interview prompt: Find the number of signups, verified, activated users for each country.
SQLSELECT
u.country,
COUNT(*) AS signups,
SUM(CASE WHEN e1.has_verified = 1 THEN 1 ELSE 0 END) AS verified,
SUM(CASE WHEN e2.has_activated = 1 THEN 1 ELSE 0 END) AS activated
FROM users u
LEFT JOIN (
SELECT user_id, 1 AS has_verified
FROM events
WHERE event_name = 'email_verified'
GROUP BY user_id
) e1 ON e1.user_id = u.user_id
LEFT JOIN (
SELECT user_id, 1 AS has_activated
FROM events
WHERE event_name = 'activated'
GROUP BY user_id
) e2 ON e2.user_id = u.user_id
GROUP BY u.country
ORDER BY u.country;
Output
Explanation
- We keep all users by country and add boolean flags for downstream steps via LEFT JOINs to pre-aggregated 1-row-per-user tables, then sum the flags.
Clarifying questions
- Is the funnel ever repeated per user? Count unique users once?
- Time window for events?” (apply date filters inside the subqueries)
Common pitfalls
- Joining the raw
eventstable directly (1:n) → duplicates inflate counts. - Filtering event types in the outer
WHEREafter a LEFT JOIN → turns into INNER JOIN (drops users with no events).
Edge cases
- Multiple
email_verifiedevents per user—pre-agg removes duplication. - Country moves after signup—decide if country should come from the user at signup time (snapshot needed).
Conversion rates & NULL-safe division
Interview prompt: Find verified rate by country (verified / signups).
SQLSELECT
country,
COUNT(*) AS signups,
SUM(CASE WHEN ev.user_id IS NOT NULL THEN 1 ELSE 0 END) AS verified,
ROUND(
100.0 * SUM(CASE WHEN ev.user_id IS NOT NULL THEN 1 ELSE 0 END)
/ NULLIF(COUNT(*), 0), 1
) AS verified_rate_pct
FROM users u
LEFT JOIN (
SELECT DISTINCT user_id
FROM events
WHERE event_name = 'email_verified'
) ev ON ev.user_id = u.user_id
GROUP BY country
ORDER BY country;
Output
Explanation
- Denominator is
COUNT(*)fromusers(signups). - Numerator is count of users who verified (via LEFT JOIN to
DISTINCT user_id). NULLIF(COUNT(*), 0)prevents divide-by-zero.100.0forces float math.
Clarifying questions
- “Rate over signups or over all users? Which date range?”
- “Round to how many decimals? Percentage points or fraction?”
Common pitfalls
- Integer division (0/1=0). Always multiply by
100.0or cast to decimal. - Using
COUNT(ev.user_id)vsSUM(ev.user_id IS NOT NULL)—both are fine, but ensure NULL semantics don’t bite you.
Edge cases
- Countries with zero signups → division by zero (handled by
NULLIF). - Multiple verify events per user →
DISTINCT user_id(or pre-agg 1 row/user).
COUNT semantics & LEFT JOINs
Interview prompt: Show how many activated events each user has (keep users with 0).
SQLSELECT
u.user_id,
COUNT(*) AS row_count_after_join,
COUNT(ev.user_id) AS matched_events
FROM users u
LEFT JOIN events ev
ON ev.user_id = u.user_id
AND ev.event_name = 'activated'
GROUP BY u.user_id
ORDER BY u.user_id;
Output
Explanation
- After a LEFT JOIN, users with no match still produce one null-extended row.
COUNT(*)counts that row (1).COUNT(ev.user_id)ignores NULL, so it’s 0.
Clarifying questions
- “Do you want to count rows or count matches?”
- “Should a user with zero matches appear? (LEFT vs INNER)”
Common pitfalls
- Expecting
COUNT(*)to be 0 when there’s no match—it’s 1 (the null-extended row). - Forgetting the filter
event_name = ...on the JOIN itself →WHEREwould turn LEFT into INNER.
Edge cases
- If multiple matching events exist,
row_count_after_joingrows (duplication); prefer pre-agg to 1 row/user when needed.
Percentages
Interview prompt: Find paid revenue share by country.
SQLWITH paid_rev AS (
SELECT u.country, SUM(o.amount) AS rev
FROM orders o
JOIN users u ON u.user_id = o.user_id
WHERE o.status = 'paid'
GROUP BY u.country
),
total AS (
SELECT SUM(rev) AS total_rev FROM paid_rev
)
SELECT
p.country,
p.rev,
ROUND(100.0 * p.rev / NULLIF(t.total_rev, 0), 1) AS share_pct
FROM paid_rev p CROSS JOIN total t
ORDER BY p.country;
Output
Explanation
- Compute per-country revenue from paid orders, then divide by the grand total.
NULLIFprotects against total 0; using a CTE avoids re-aggregating.
Clarifying questions
- “Include only paid? Refunds/chargebacks?”
- “Share across what scope—country, region, or overall?”
Common pitfalls
- Dividing by
SUM(amount)within each row without the grand total → every group shows 100%. - Forgetting to filter to
paid→ inflates denominators.
Edge cases
- No paid revenue at all →
total_rev = 0(share becomes NULL; you may want 0.0).
HAVING with conditional aggregates
Interview prompt: Which countries have a verified rate is at least 50%.
SQLSELECT
u.country,
COUNT(*) AS signups,
SUM(CASE WHEN ev.user_id IS NOT NULL THEN 1 ELSE 0 END) AS verified
FROM users u
LEFT JOIN (SELECT DISTINCT user_id FROM events WHERE event_name='email_verified') ev
ON ev.user_id = u.user_id
GROUP BY u.country
HAVING 1.0 * SUM(CASE WHEN ev.user_id IS NOT NULL THEN 1 ELSE 0 END)
/ NULLIF(COUNT(*), 0) >= 0.50;
Output
(Both at 50%; if you need strictly greater than 50%, use > 0.50.)
Explanation
- We filter groups (countries) based on an aggregate ratio (verified/signups).
Clarifying questions
- ≥ 50% or > 50%? Round before compare or compare raw ratio?
- Minimum signup count required? (add
AND COUNT(*) >= NinHAVING)
Common pitfalls
- Doing the ratio in
WHERE(invalid). - Integer math → 0; force decimals with
1.0 *or casts.
Edge cases
- If
COUNT(*)is 0 → ratio isNULL(kept/dropped depending on predicate).
NULL-safe metrics patterns
Pattern 1: COUNT users who did X
SQLSUM(CASE WHEN x_condition THEN 1 ELSE 0 END) -- or COUNT(CASE WHEN x_condition THEN 1 END) in engines that support COUNT of CASE
If COUNT was used instead of SUM. 0 will be counted as well. In this scenario, instead of 0 we have to use NULL to make sure the row is not counted when conditions are not met
Pattern 2: Rate with safe division & float math
SQL100.0 * numerator / NULLIF(denominator, 0)
Pattern 3: Share that sums to 100
- Compute group totals in CTE; CROSS JOIN total; divide;
NULLIFguard.
Pattern 4: Keep zeros after LEFT JOIN
SQLCOALESCE(agg_value, 0) AS metric
Pattern 5: Distinct user actions
SQLCOUNT(DISTINCT user_id) -- beware performance; consider approx distinct if allowed
Example data
We used these tables in the examples above to demonstrate aggregate functions.
users
events (one row per user action)
orders
Questions
Warm up
Practice
- Monthly post success analysis (Easy)
- Find campaign purchases (Medium)
- Post success by interface (Medium)
Go deeper
Want to dive deeper into SQL aggregations? Check out our more detailed lesson here: Introduction to SQL Aggregations