Snowflake Optimization Techniques for High Performance Low Latency Dashboards

Cloud Apps

02-01-2022

It’s no secret that most people don’t enjoy waiting long periods of time for a dashboard to load. Not only does waiting for dashboards that seemingly take hours to load (even if it is only 10s of seconds) waste our own time, it’s likely also hurting the adoption and usefulness of those same dashboards. When a user who needs to consume these dashboards gets hung up on data loading, slow responsiveness, or widgets not displaying, they are likely not going to keep coming back to get the data that they really need to do their job. In the end, this really defeats the purpose of having a dashboarding tool to begin with. So the question then becomes, how can we optimize our dashboarding experience, to keep these data consumers who need this data coming back to get the valuable insight they need?

The core of the issue described above typically has little to do with the actual business intelligence (BI) tool in question. Sure, some BI tools may perform slightly better than others, but that’s why you’re hopefully using Sisense, because it is better than the others. The root of the issue lies with the location where Sisense is querying data from. Snowflake is one of the most common cloud data warehouses that Sisense queries from for dashboard usage. For the purposes of this post, we are going to focus on Snowflake optimization techniques to produce high performance, low latency dashboards, solving the problem mentioned at the outset.

Dedicated Virtual Warehouse

The first level of optimization that should be considered, is having a dedicated virtual warehouse for your BI tool. As a quick refresher, a virtual warehouse in Snowflake is the amount of computational resources that have been designated to perform operations. There are two major advantages to using a separate warehouse for your BI tooling. First, you will be able to monitor your BI usage closely. Because the only operations performed in this warehouse are for BI purposes, you will have a deep understanding of the amount of usage and patterns that your BI tool is sending to Snowflake. Secondly, Snowflake automatically caches data in the local disk of each warehouse, so users or tools querying the same data have a better chance of retrieving cached data that has already been queried. This helps significantly increase query times, resulting in much higher dashboard performance. However, when a warehouse suspends, this erases anything in the cache.

Only Storing Necessary Columns

Snowflake is a columnar data store, which is much faster than a row-store system. The major difference here being data is physically stored in columns rather than rows. Because of this, only storing necessary columns needed for analysis lends itself to larger optimization gains in Snowflake. Especially when your BI tool is sending a select * from statement to Snowflake, having wide tables with lots of unnecessary columns significantly hinders dashboard performance. By limiting your tables/views to only the necessary columns, Snowflake will be able to return queries faster, resulting in lower latency and better performance in all dashboards.

Nevertheless, it is important to state that Sisense also provides the end-user the ability to decide what data they want to look at in every widget, meaning that only the table where the fields belong will be involved in the query and not the entire schema. Another option would be creating in Sisense different simple data models that would only incorporate the tables or data needed for each purpose, this way you will not end up with an unnecessary and complex data model.

Designated Data Models

Another level of optimization, which builds on the last point of only storing necessary columns, is using designated tables/views for reporting. Maybe you have identified that only certain columns need to be in a table for your BI tool, but other users in the business need access to those columns. In this case, you could use a tool like dbt to build a model for your BI tool that only contains the data necessary for reporting, and have another model that contains all the other columns for your data analysts. As seen previously, this reporting model you created helps increase performance of query times. Additionally, it’s likely that many of the queries from your reporting tool are similar in nature. Because of this, creating these data structures as narrow, materialized views in Snowflake will help in returning even faster results. Thank me later.

Also, depending on the use case, either an already pre-aggregated table or, also a table with pre-calculated measurements could enhance a visualization, this way you would discard unnecessary processing while building your BI views.

Only Storing What you Need

This is not Snowflake specific, but queries sent from BI tools in Snowflake can begin to lag if there are hundreds of millions of records in your data structures. You should evaluate whether all of the data being stored in the tables/views you are referencing is necessary. For example, maybe you only use two years of data in your reporting (current year and prior year), but your table contains 5 years worth of data. In this case, you should consider creating a separate model for your BI tool to just reference the two years worth of data you need. This will help query performance depending on the volume of data you consistently work with.

Multi-Cluster Warehousing

A special technique for optimizing Snowflake and improving the performance of your dashboards is multi-cluster warehousing. As a disclaimer, multi-cluster warehousing is a feature only available in the Enterprise edition or above in Snowflake. Multi-cluster warehousing is incredibly powerful because it allows for the dynamic scaling up of resources necessary to support concurrent queries coming from your BI tool. This feature allows you to set a minimum and maximum number of warehouses that will auto-scale for you when traffic would normally begin to queue at your normal (minimum) warehouse level.

For example, let’s say the Snowflake warehouse minimum is set to 1 and maximum is set to 5. At 9 AM, Sisense isn’t sending more than nine or ten queries an hour to Snowflake. In this situation, the Snowflake warehouse is running at its minimum of 1. At 11 AM multiple analysts and users are using Sisense and there are now 9-10 queries being run a minute. To handle the concurrent queries being run, Snowflake auto-scales to 3 warehouses, runs the queries, then scales back down to 1 warehouse when the demand no longer exceeds minimum usage. This allows for users to experience the same query performance regardless of the number of queries being run at once. Powerful stuff.

Conclusion

When it comes to Snowflake optimization and improving the performance of your dashboard, there is no guaranteed success formula. However, the techniques outlined here can make significant improvements to the experience of your reporting. Stop driving your data consumers away and give them the performance they need to answer the questions they need insights to, as soon as they need them.

Sisense wants its customers to have the best experience possible to optimize their data, and control warehouse costs, so they have partnered with Untitled Firm to create an offering for Sisense customers who are on Snowflake. If you have any questions please contact Ellie Puckett, Strategic Partnerships Director - [email protected] and we can help you determine which is the best path for your business.

Updated 02-16-2022

Version 5.0

partners & implementations

product features

Adriana_Onti

Cloud Apps

Joined January 24, 2022

View Profile