Window functions are a very dreaded concept in the SQL community. But it doesn’t have to be this way, let’s change that for you, together!
Photo by Caspar Camille Rubin on Unsplash
Getting started with SQL? Follow the blog series!
[Introduction to SQL for Data Analysis 📈
The what why and how of SQL when you are just getting started.medium.com](https://medium.com/@aakriti.sharma18/introduction-to-sql-for-data-analysis-1c4177b36eba "medium.com/@aakriti.sharma18/introduction-t..")
Introduction
Window functions, operate over many rows together. They are comparable to Aggregate Functions . However, Aggregate functions group multiple rows together to generate a single output corresponding to the entire group. Whereas, Window functions treat each row as a single entity and operate over other rows related to it to generate the output corresponding to that particular row.
An example can be that if you think of summing up the values, the Aggregate Function SUM() would generate the sum of all the values present within a group and output the total per group. However a Window Function SUM() over a declared window would output the running total for each row that is, the sum of the current row’s value and all the rows above it.
Defining a Window
As the name suggests, Window Functions are meant to be applied on….. Windows (pfft). Windows are nothing but a bunch of rows that change dynamically depending on which row is under consideration. A window is defined using the the OVER clause which follows the function being called upon the window like this -
WINDOW_FUNC() OVER(-- WINDOW DEFINITON --)
The window is defined using the ORDER BY
clause that specifies which order the rows would be arranged in, for the function to be preformed. If the calculation had to not be performed on the entire data but a subset of it, we can categorize data using the PARTITION BY
clause, works very similar to the GROUP BY
clause, the rows that get considered while performing the window function are only the ones within the same subgroup as the current row. The complete syntax is the following :
WINDOW_FUNC() OVER( PARTITION BY column1 ORDER BY column2 )
The above running total example can be implemented as -
SUM(sales) OVER( ORDER BY date ) AS running_total
This will generate a new column running_total
which has the value of the current day’s sale added to all the rows above it for the sales of days before it.
If we want the running_total
for each month, we will divide the window to only consider the data for one month at a time.
SUM(sales) OVER( PARTITION BY MONTH(date) ORDER BY date ) AS running_total
For example a running total for Transactions occurring one after another in chronological order would be implemented this way -
SUM(Amount) OVER( ORDER BY TransactionDate ) AS RunningTotal
This generates the following result :