Skip to main content

Dates & Bucketing

Date logic is everywhere in SQL interviews because many business questions are time-based: weekly trends, month-over-month growth, cohort retention, and “last N days” analyses.

This lesson covers how to bucket timestamps into days/weeks/months, how to define clean time windows, and how to avoid off-by-one mistakes in date filters. You’ll also learn practical patterns for comparing periods (this week vs last week) and building time-series outputs that don’t miss empty periods.

Example prompts include:

“Show weekly active users for the last 12 weeks.”

and

“Compute month-over-month revenue growth.”

All examples below use the users, events, and orders tables in "Example data" section (at the bottom). Refer back to that section if you want to confirm columns or relationships.

Bucket by day/week/month

Interview prompt: Bucket paid orders by day, week start, and month start.

SELECT DATE(created_at) AS day, DATE(created_at, 'weekday 1', '-7 days') AS week_monday, DATE(created_at, 'start of month') AS month, COUNT(*) AS n_orders FROM orders WHERE status = 'paid' GROUP BY DATE(created_at), DATE(created_at, 'weekday 1', '-7 days'), DATE(created_at, 'start of month') ORDER BY day;

Output

dayweek_mondaymonthn_orders
2025-01-01 00:00:002024-12-30 00:00:002025-01-01 00:00:001
2025-01-05 00:00:002024-12-30 00:00:002025-01-01 00:00:001

Explanation

  • PostgreSQL:DATE_TRUNC snaps timestamps to a bucket start (day, ISO week, month) so groupings are clean.
  • SQLite: DATE(created_at, 'weekday 1', '-7 days') →'weekday 1' jumps to the next Monday, '-7 days' brings you back to the Monday of the current week → Monday-based week start.
  • MySQL:
    • WEEKDAY(created_at) → 0 = Monday, 6 = Sunday, so subtracting that many days from the date gives you the Monday of that week.
    • DATE_FORMAT(created_at, '%Y-%m-01') → normalizes to the first of that month → monthly bucket.

Clarify

  • Week start Monday (ISO) or Sunday?
  • Do we need local timezone bucketing (see Section D)?

Pitfalls

  • Grouping directly on raw timestamps → each row is its own bucket.
  • Mixing UTC and local clock without conversion.

Inclusive vs exclusive ranges

A common pitfall when utilizing date ranges is using inclusive and exclusive date ranges interchangeably.

The safest way to filter time windows is a half-open interval: include the start, exclude the end.

  • Use created_at >= start and created_at < next_start
  • Don't use BETWEEN on timestamps as it is easy to miss boundary times.

Interview prompt: Find number of paid orders in January 2025.

SELECT COUNT(*) AS n_paid_jan FROM orders WHERE status = 'paid' AND created_at >= TIMESTAMP '2025-01-01' AND created_at < TIMESTAMP '2025-02-01';

Output

n_paid_jan
2

Explanation

  • Counts all paid orders whose created_at timestamp falls in January 2025

Clarify

  • Interpret dates in UTC or a local time zone?
  • Include end boundary?

Pitfalls

  • BETWEEN '2025-01-01' AND '2025-01-31' misses times on Jan 31 after 00:00, and fails on variable month lengths.

Edge

  • Parameterize start and compute start + INTERVAL '1 month' for next_start.

Cohorting

Interview prompt: Compute D1 and D7 retention by signup month.

WITH cohort AS ( SELECT user_id, DATE_TRUNC('month', signed_up_at) AS signup_month, DATE_TRUNC('day', signed_up_at) AS signup_day FROM users ), act AS ( SELECT e.user_id, DATE_TRUNC('day', e.ts) AS event_day FROM events e ) SELECT c.signup_month, COUNT(DISTINCT c.user_id) AS cohort_users, SUM( CASE WHEN a.event_day = c.signup_day + INTERVAL '1 day' THEN 1 ELSE 0 END ) AS d1_users, SUM( CASE WHEN a.event_day = c.signup_day + INTERVAL '7 day' THEN 1 ELSE 0 END ) AS d7_users FROM cohort c LEFT JOIN act a ON a.user_id = c.user_id GROUP BY c.signup_month ORDER BY c.signup_month;

Output

signup_monthcohort_usersd1_usersd7_users
2024-12-01 00:00:00100
2025-01-01 00:00:00210

Explanation

  • Form cohorts by signup month; compute D+1/D+7 activity by comparing calendar days.

Clarify

  • Calendar day deltas or 24h rolling windows?
  • Any event counts, or specific events only?

Pitfalls

  • Joining raw events without day truncation inflates counts.
  • Using DATEDIFF on timestamps with time zones inconsistently.

Edge

  • For D7 within same month only? If not, keep day arithmetic (it crosses months safely).

Example data

We used these tables in the examples above to demonstrate how to handle dates.

users

user_idcountrysigned_up_at
1US2024-12-30 20:10:00
2US2025-01-01 03:05:00
3SG2025-01-02 09:00:00

events

user_idevent_namets
1open_app2024-12-31 23:50:00
1open_app2025-01-01 00:05:00
2email_verified2025-01-01 08:00:00
3open_app2025-01-03 10:00:00

orders

order_iduser_idamountstatuscreated_at
10150paid2025-01-01 01:00:00
11170paid2025-01-05 11:30:00
12215draft2025-01-03 09:00:00

Questions

Practice

Go deeper

Want to dive deeper into date and time functions? Check out our more detailed lesson here: Date and Time