Dynamic PIVOTs in SQL with Snowflake
Easy to use dynamic pivots in Snowflake: How to use and create a JavaScript stored procedure within.
There’s something about SQL and PIVOTs: Everyone wants to PIVOT their results, but there never seems to be an easy way for it. Fortunately, Snowflake users have a native way to perform pivots in SQL, with the function PIVOT():
select *
from (
select * from table(result_scan(last_query_id(-1)))
)
pivot(max(pivot_value)
for pivot_column in ('Spider-Man', 'Wonder Woman', 'Iron Man'))
Nevertheless, PIVOT() in Snowflake has one limitation: Users need to explicitly call out the values for the columns to be pivoted into. So users then go to Stack Overflow to ask how to get Snowflake to deliver a PIVOT() with dynamic columns, and there doesn’t seem to be a straightforward answer.
That’s why I want to offer here a cool way to get dynamic pivots in Snowflake, by using a simple to call JavaScript stored procedure. First let me introduce how to use the solution, and then we’ll talk about the cool tricks in Snowflake I used to pull this off:
How to get dynamic pivots in Snowflake
- Write a query that aggregates the data you want to pivot. Make sure to define a column with the pivot_values, and a column with the pivot_columns:
select name
, date_trunc(quarter, month) pivot_column
, sum(month_views) pivot_value
from hero_views
group by 1,2
2. Call the stored procedure pivot_prev_results()
:
call pivot_prev_results();
3. Now find your pivoted results by scanning the output of that stored procedure:
select *
from table(result_scan(last_query_id(-2)));
That’s it!
Now, if you want to pivot by their names instead — just repeat the process, switching the names of the columns in the first query:
- Query
select name pivot_column
, date_trunc(quarter, month) month
, sum(month_views) pivot_value
from hero_views
group by 1,2
2. Call stored procedure
call pivot_prev_results();
3. Find the results:
select *
from table(result_scan(last_query_id(-2)));
Creating the stored procedure
If you want to call pivot_prev_results()
, first you need to create it. This is the source code:
Cool tricks that make pivot_prev_results()
work:
Let’s examine the stored procedure, to call out some cool ideas in Snowflake:
create or replace procedure pivot_prev_results()
Yes. You can define stored procedures in Snowflake.
returns string
We don’t need to return anything for this stored procedure to work, but we might as well use this later.
language javascript
This is cool: Stored procedures in Snowflake can be defined in JavaScript, opening up a world of possibilities.
execute as caller
A stored procedure can be executed with it’s own set of permissions and context, or within the context of the person calling it. In this case executing in the caller’s context will allow us to look at their previous query results, and leave the results of this procedure in their query history.
as
$$
We use ‘$$’ to delimit the JavaScript code. $$ allows us to avoid more complicated escaping within.
var cols_query = `
select ‘\\’’
|| listagg(distinct pivot_column, ‘\\’,\\’’) within group (order by pivot_column)
|| ‘\\’’
from table(result_scan(last_query_id(-1)))`;
This is the query that will allow us to determine the columns in the resulting pivot.
Note the use of result_scan(last_query_id(-1)
— with this we can look at the results of the previous query executed — which we expect to have the values to be pivoted (per the instructions before calling this stored procedure).
Key Snowflake concept: Snowflake queries are executed within a session — which allows you to refer to previous queries, results, and even set session variables.
To get all the different values in the pivot_column column we use listagg(distinct pivot_column)
and to make sure they look sorted in the final result we add the within group(order by pivot_column)
.
Then we execute the query within the procedure. What’s cool here is that Snowflake provides the JavaScript UDF environment an API so the JS code can ask Snowflake to execute queries:
var stmt1 = snowflake.createStatement({sqlText: cols_query});
var results1 = stmt1.execute();
results1.next();
The results of this query are the list of columns we are going to use to ask Snowflake for a pivot:
var col_list = results1.getColumnValue(1);
Then we just need to create a straightforward PIVOT() query in Snowflake, using the values of the columns we just figured out:
pivot_query = `
select *
from (select * from table(result_scan(last_query_id(-2))))
pivot(max(pivot_value) for pivot_column in (${col_list}))
`
var stmt2 = snowflake.createStatement({sqlText: pivot_query});
stmt2.execute();
Note that this query uses `max(pivot_value)`. The idea is that the previous query ran by the user will aggregate the results in any way they desire to, and in this step we should be picking at most one value to be aggregated.
That’s it. Once that query is executed, the pivoted results live in the caller’s query history. Since we don’t need to return anything, we can use that space to give the caller a hint of two ways of accessing the results of the query we just executed: Either by its id, or by looking back into their history:
return `select * from table(result_scan(‘${stmt2.getQueryId()}’));\n select * from table(result_scan(last_query_id(-2)));`;
$$;
Note that users will find the results with last_query_id(-2)
, because -1
are the results of calling the stored procedure.
Happy pivots!
Want more?
I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can follow me on Twitter, and check reddit.com/r/snowflake for the most interesting Snowflake news.