Working with Date and Time: DATE_TRUNC, DATEDIFF, and more
SQL provides several functions to work with dates and timestamps. Let's look at the most important ones:
Date and Time Functions
Extract parts of date with DATE_TRUNC or EXTRACT
The date_trunc() function allows you to round a date or timestamp to a specified unit of time, such as a day, week, month, or year. As we'll see below, this is particularly useful for analyzing and aggregating data over periods of time.
Here's the basic syntax:
SQLdate_trunc('unit', date_or_timestamp)
Where 'unit' is the unit of time to round to (e.g., 'day', 'week', 'month', 'year') and 'date_or_timestamp' is the date or timestamp you want to round.
Not all databases implement the same date methods. Check the table below for equivalents in different databases.
Datediff
The datediff() function calculates the difference between two dates or timestamps, typically returning the number of days between the two dates. Here's the basic syntax:
SQLdatediff(start_date, end_date)
In PostgreSQL, there isn’t a direct datediff() function. Instead, you can subtract dates directly or use the AGE() function to calculate differences in specific units like days, months, or years:
PostgreSQL-- Days difference
SELECT end_date - start_date AS days_diff;
-- Months or years difference
SELECT EXTRACT(MONTH FROM AGE(end_date, start_date)) AS months_diff,
EXTRACT(YEAR FROM AGE(end_date, start_date)) AS years_diff;
PostgreSQL offers flexibility in calculating date differences using these approaches.
Example 1: Calculating the number of students who signed up each month
Let's say we have a table named students with the following data:
We can calculate the number of students who signed up each month using the following query:
SQLSELECT
date_trunc('month', signup_date) AS signup_month,
COUNT(*) AS signups
FROM students
GROUP BY signup_month
The result of this query would be:
Example 2: Calculating the number of days between two dates
Let's say we have a table named orders with the following data:
id | order_date | delivery_date ---|-------------|--------------- 1 | 2022-01-01 | 2022-01-03 2 | 2022-02-01 | 2022-02-05 3 | 2022-03-01 | 2022-03-02
We can calculate the number of days between the order date and delivery date for each order using the following query:
SELECT
id,
order_date,
delivery_date,
datediff('day', order_date, delivery_date) AS days_between
FROM ordersThe result of this query would be:
id | order_date | delivery_date | days_between ---|-------------|---------------|-------------- 1 | 2022-01-01 | 2022-01-03 | 2 2 | 2022-02-01 | 2022-02-05 | 4 3 | 2022-03-01 | 2022-03-02 | 1
Example 3: Calculating customer retention
Given the date a customer first signed up for a service and their churn date, you can use datediff to determine how long they were a customer. This information can be used to calculate customer retention rates and understand which customers are most likely to churn.
Let's say we have a users table with the following data:
id | signup_date | churn_date ---|-------------|--------------- 1 | 2023-01-01 | 2023-01-11 2 | 2023-02-01 | 2023-02-21 3 | 2023-03-01 | 2023-03-31
To calculate average retention for users, we can use the datediff function to calculate the number of days each user has been a customer and then calculate the average. Here's the query to do that:
SELECT
AVG(datediff('day', signup_date, churn_date)) AS avg_retention
FROM usersThe result of this query would be:
avg_retention -------------- 20
This tells us that, on average, users churn after 20 days.
To get a more accurate picture of churn over time, we may want to analyze cohorts of users. For that, we could combine datediff and date_trunc to group users based on their signup date. For example, we could group users by the month they signed up:
SELECT
date_trunc('month', signup_date) AS signup_month,
AVG(datediff('day', signup_date, churn_date)) AS avg_retention
FROM users
GROUP BY signup_monthThe result of this query would be:
signup_month | avg_retention --------------|-------------- 2023-01-01 | 10 2023-02-01 | 20 2023-03-01 | 30
This shows us that average retention varies based on the month that users signed up and has improved over time.
How dates and timestamps are stored
In most databases, dates and times are stored internally as a number that represents the number of seconds or milliseconds elapsed since a specific point in time, known as an epoch. The exact epoch and the unit of time used to represent dates and times can vary between databases.
For example, in the widely used Unix time representation, dates and times are stored as the number of seconds elapsed since January 1st, 1970, 00:00:00 UTC—also known as the Unix epoch. This means that a date and time value of January 1st, 1970, 00:01:00 UTC would be stored as 60 (1 minute) in the database.
In SQL databases, the exact representation of dates and times can depend on the specific database management system (DBMS) being used. For example, in PostgreSQL, the timestamp type is stored as a double-precision floating-point number that represents the number of seconds elapsed since January 1st, 2000, 00:00:00 UTC.
Regardless of the specific representation used, the database management system provides functions to convert between the internal representation and human-readable date and time formats when querying or inserting data.
Units of time
The following table lists some of the different units of time that can be used in date and time functions in SQL:
These units can be used in functions such as date_trunc() and datediff() to specify the level of precision required when working with date and time data.
Try it yourself
Given a table of orders with timestamps in the column order_date, write a query that will aggregate and return the number of orders per month.
Since we use SQLite in this course, you'll need to use the DATE() format described in the table above instead of DATE_TRUNC.