Skip to main content

Introduction to Window Functions

Premium

Window functions are a powerful feature in SQL that allow you to perform complex calculations within a query, based on the data in a "window" or a set of rows. Window functions provide a way to perform calculations that are based on multiple rows of data, but are not possible with traditional aggregate functions.

With window functions, you can calculate running totals, rankings, percentiles, and more, all within a single query. This makes it possible to perform complex analysis on your data without having to write multiple queries or create intermediate results.

Basic syntax

The basic syntax of a window function is as follows:

SQL
SELECT column, window_function(column) OVER (PARTITION BY column1 ORDER BY column2) FROM table;

The syntax of a window function is comprised of two parts: the window function itself and the OVER clause that follows it. In the window function, column is the column that you want to perform the window function on, window_function is the specific window function that you want to use. In the OVER clause, PARTITION BY and ORDER BY control the window, or set of rows, that the calculation should be based on.

Example window function usage

Let's look at a simple example of how to use window functions. Suppose we have a table sales that contains information about sales transactions. The table has columns for the date of the sale, the product that was sold, and the amount of the sale.

+------------+---------+--------+ | sale_date | product | amount | +------------+---------+--------+ | 2021-01-01 | Apples | 10 | | 2021-01-02 | Oranges | 20 | | 2021-01-03 | Apples | 15 | | 2021-01-04 | Bananas | 5 | | 2021-01-05 | Apples | 12 | +------------+---------+--------+

We want to calculate the running total of sales for each product. To do this, we can use the SUM function as a window function. The query would look like this:

SQL
SELECT sale_date, product, amount, SUM(amount) OVER (PARTITION BY product ORDER BY sale_date) as running_total FROM sales;

The result of this query would be:

+------------+---------+--------+---------------+ | sale_date | product | amount | running_total | +------------+---------+--------+---------------+ | 2021-01-01 | Apples | 10 | 10 | | 2021-01-03 | Apples | 15 | 25 | | 2021-01-05 | Apples | 12 | 37 | | 2021-01-02 | Oranges | 20 | 20 | | 2021-01-04 | Bananas | 5 | 5 | +------------+---------+--------+---------------+

As you can see, the running_total column contains the running total of sales for each product, calculated based on the values in the amount column. The PARTITION BY clause in the OVER clause of the window function tells it to calculate the running total separately for each product, and the ORDER BY clause tells it to order the rows by sale_date so that the running total is calculated correctly.

This is a simple example of how window functions can be used to calculate complex values based on the data in a query, preserving the individual identities of the rows.

What is the difference between window functions and aggregate functions?

You might be wondering what the difference is between window functions and aggregate functions since some of the functions have the same name, like SUM. Window functions and aggregate functions both perform calculations on data in a query, but they differ in their output. Aggregate functions, such as SUM, AVG, MIN, and MAX, group rows of data together and return a single value for each group. This means that the individual rows of data are lost and the result only contains aggregated information.

On the other hand, window functions perform calculations on the data in a query, but preserve the separate identities of each row. This means that the result of the query will contain all of the original rows, but with additional columns that contain the results of the window function calculations.

SQL Aggregate vs. Window Functions

Adjusting the window size

The size of the window in a window function can be adjusted by using the ROWS BETWEEN clause in the OVER statement. This clause determines the range of rows to be used for calculation. The ROWS BETWEEN clause can be used to specify either a range of rows relative to the current row (using PRECEDING and FOLLOWING) or a range of absolute rows using a numeric offset.

For example, to calculate the running total of sales for each day, you can use the following query:

SQL
SELECT date, sales, SUM(sales) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as running_total FROM sales_table;

This query uses the ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW clause to specify that the calculation should include all sales from the start of the table up to and including the current row.

Another example is to calculate the average of the three preceding rows for each row in the result set:

SQL
SELECT date, sales, AVG(sales) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING) as avg_of_4_rows FROM sales_table;

In this example, the clause ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING specifies that for each row, the calculation will include the current row, the two rows immediately before it, and the one row immediately after it (if they exist). The AVG function then calculates the average of the sales values within this window.

Useful window functions

Here's a selection of some of the most useful window functions:

Window FunctionDescription
SUM()Calculates the sum of values in the window frame
MIN()Finds the minimum value in the window frame
MAX()Finds the maximum value in the window frame
ROW_NUMBER()Generates a unique number for each row within a result set, starting at 1 for the first row
RANK()Generates a unique rank for each row within a result set, with the same rank assigned to rows with the same values
DENSE_RANK()Generates a unique rank for each row within a result set, with the same rank assigned to rows with the same values, and the next rank assigned to the next distinct value
LAG()Returns the value of the expression for the previous row in the result set
NTILE(n)Divides the result set into n equal groups, and assigns a group number to each row

Up next, we'll look at several of these window functions in depth and demonstrate how to use them in a variety of real-world problems.