“At least 60%”
That would be my answer if you asked me how much of my technical tasks involve writing SQL—and I’m being conservative.
Whether I am:
Building new data models for the data warehouse
Gathering data for analysis
Creating ad-hoc reports for stakeholders
SQL plays a significant role in my day-to-day as a Data Scientist in tech
But the big reason I’m able to tackle complex problems and do so efficiently is thanks to analytic functions (or window functions as are also commonly known).
💡 If you’ve never use them, with analytic functions you can compute moving averages, rank a list of items, calculate cumulative sums, and perform other very useful and more complex analyses.
So today, I want to walk you through five real use cases in which I leverage window functions as a Data Scientist in tech.
Let’s get started!
1—Find the top 5 videos for every Youtube channel
Imagine you work with Youtube data and have access to a table containing one row per video and each column provides some additional information such as the total number of views and channel.
Analysis Scope:
Your stakeholder asks Can you find me the top 5 videos (based on their total views) for every channel?
Solution:
By using the analytic function ROW_NUMBER()
, we can give each row a rank based on the view_count of each channel_id.
ROW_NUMBER() OVER (PARTITION BY channel_id ORDER BY view_count DESC)
Here is the full SQL in practice:
2—Get the most up-to-date subscription data available for each customer
Imagine you have a table with subscription data, but every time the subscription gets updated in some way a new row is created.
Analysis Scope:
You are asked, Can you tell me what percentage of our active subscribers have a monthly membership?
✋ But of coarse, before calculating the 'billing frequency' distribution for your subscribers, you have to make sure you're using only the most up-to-date subscription data for each one.
Solution: By using the analytic function ROW_NUMBER()
, we can give each row a rank based on last_update_date for each subscription_id.
ROW_NUMBER() OVER (PARTITION BY subscription_id ORDER BY last_update_date)
Here is the full SQL in practice:
3—Calculate the cumulative revenue while retaining monthly revenue
Imagine you have monthly revenue data for your company. One row for the revenue generated each month.
Analysis Scope:
Your stakeholders ask you Can you show a graph comparing both monthly and cumulative revenue?
Solution:
Use SUM()
and ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
to aggregate the revenue from the first row of the partition up to the current row, which will calculate the total revenue up to every point available.
SUM(revenue) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
Here is the full SQL in practice:
4—Find the first and last time each user was active on each platform
Imagine you have activity data for each one of your users. Users could be active on either Desktop or Mobile.
Analysis Scope:
Your stakeholders approach you and ask you to put together a summary of each user activity along with the first and last time they were active on each platform.
Solution:
Use MIN()
and MAX()
to find the first and last date for each user_id within each platform available.
MIN(activity_date) OVER (PARTITION BY platform, user_id)
Here is the full SQL in practice:
5—Calculate Month-over-Month revenue growth
Imagine you have a table with monthly revenue data.
Analysis Scope:
Your stakeholders approach you and asks How well is the company growing its sales revenue over a given time period, for example year-over-year?
Solution:
Use LAG()
to help find the revenue from the preceding row, which in this case will be the revenue for the previous month.
LAG(revenue) OVER (ORDER BY date ASC) AS prev_month_revenue
Here is the full SQL in practice:
By the way, I have some exciting news to share! 🎉
Next week, I will start a new job as a Data Scientist at Nextory here in Stockholm, so I have an important question for you:
Thank you for reading! I hope these techniques will help you level up your SQL game
See you next week!
- Andres
Before you go, please hit the like ❤️ button at the bottom of this email to help support me. It truly makes a difference!