Snowflake Architecture Patterns for Secure Self-Serve BI Experiences

Cloud Apps

02-15-2022

Self-service business intelligence is a buzzword that gets thrown around a lot in analytics circles. While the idea of true self-service BI is glamorous, actually executing a data strategy to achieve self-service BI is a different story. Decisions such as which tools to use in the stack, how data should be modeled, and how data stays secure, make self-service BI more of a dream rather than a feasible reality. Thankfully, using Snowflake and Sisense, self-service BI is something all organizations can take advantage of by implementing specific Snowflake architecture patterns to provide secure, self-service BI experiences.

Designated Databases

The first step in achieving an architecture pattern that supports secure, self-service BI is by having multiple databases within your Snowflake account. To be more specific, at the very minimum, you should have two separate databases. The first database captures all raw data loaded into Snowflake. We’ll call this database raw. The second database stores data that has been cleaned, modeled, and is analytics-ready. Let’s call this database analytics.

While there is a multitude of reasons why this architecture is advantageous for self-service BI, we’re going to focus specifically on two: data integrity and data security.

Having the architecture outlined above, raw and analytics databases, enables you to have data immutability. By having a raw database, your ETL tools and processes are orchestrated to load data into this database. This is especially advantageous if you use an ELT, where you are extracting and loading the data, before transforming it. By doing this, you can have confidence that the data in your raw database is immutable. In other words, your data is unchanged and true to the source from which it came. This allows you to have a source of truth that has all of the data integrity intact.

Using a tool like dbt, specific and different types of patterns that can be expected to be found in an analytics database can then be built on top of this raw, immutable data, without actually modifying the underlying data, meaning that for example, data mart and denormalized tables can be created with the goal of supporting specific analysis in your BI tool. This means that there is always an unchanged record of raw data that can always be referenced as a source of truth. The transformations and models built on top of the raw data can be materialized in the analytics database, providing a layer of separation from the raw data that shouldn’t be modified, and the data that is used in the analysis.

By separating raw data from analytics (production) data, not only do we gain data integrity, but we can also provision each database so that only users with the correct roles and privileges can access raw data. Because analytics-ready data models are built on raw data and materialized in the analytics database, administrators in Snowflake can assign roles to BI tools and users, only allowing them to access necessary data. This provides an additional level of security as analytics data is not only provisioned in an entirely separate database from raw data, but it is controlled through security roles created and assigned by administrators.

This architecture allows for raw data to be stored in a secure, provisioned data store while putting analytics-ready data in a database where only analytic use cases can occur. With this separation, we have the opportunity to optimize queries, creating also a secure and reliable database for your analytic use cases, all with the purpose of providing better performance and experience to the end-users. However, there are additional measures that can be added to these databases to create a more robust Snowflake architecture pattern.

Multiple Warehouses

With our raw and analytics databases in place, there needs to be at least one warehouse in Snowflake that can interact with those databases. At Untitled, we believe that one isn’t enough. In fact, the pattern we believe in the most is having three warehouses. These warehouses are outlined below:

Loading Warehouse: This warehouse is designated for all processes that load data into Snowflake. It’s that simple. Do you use Fivetran? Connect it to this warehouse. Do you have a custom ETL pipeline? Connect it to this warehouse.
Transformation Warehouse: This warehouse should be used for all matters related to getting data from its raw form into a usable state. No more, but no less. You should never be running ad hoc queries in this warehouse.
Reporting Warehouse: This warehouse is solely put in place to serve all the querying needs of the organization. Sisense connects here. You analysts run queries here. You get the idea.

There are no rules for how many warehouses you should have, but we believe it’s best practice to have at minimum these three. The reason we break these warehouses into their respective processes is twofold. First, some processes don’t need the same level of computing as others. By breaking warehouses into processes, we can control costs at a much more granular level as opposed to associating all processes with the same warehouse. Secondly, by separating these warehouses, we get a much better understanding of the workloads supported by each, allowing for optimization to take place. Again, if all processes are lumped into the same warehouse, it’s going to be much harder to understand if traffic is coming from an ETL tool, a dbt model running, or an ad hoc query run by an analyst. We inherently know where traffic is coming from if we only assign each process to a specific warehouse that is responsible for that process.

Ideally, by implementing this architecture, users not only save costs, but have a positive BI experience. By having a deep understanding of patterns and trends in each warehouse, administrators can configure and optimize Snowflake to better serve each function. This means that BI users get tailored treatment as their work belongs to one of these processes.

Roles and Privileges

We have already briefly discussed this, but because it is so important, it is worth a second mention. All objects (databases, warehouses, schemas, tables, etc.) in Snowflake can be secured. While we could write an entire blog series on Snowflake access control, to quickly catch up, you can view their documentation here.

When used in conjunction with the architecture discussed above, creating a role that only has privileges to only query data in the analytics database, using the reporting warehouse, allows for a control framework that allows for your self-service BI experiences to be completely secure as well as providing your BI tools and teams with data that is analytics-ready without ever compromising the integrity of your data.

Summary

Using multiple databases in conjunction with designated warehouses allows for dynamic architecture patterns to be developed to serve your self-serve BI needs. While this blog post was a brief peek inside how you can develop a robust self-service BI architecture, often there are additional complexities that arise when implementing self-service BI. If you need a partner to help overcome those hurdles, reach out to Untitled Firm. Ellie Puckett, Strategic Partnership Director, ellie.puckett@untitledfirm.com

Sisense wants its customers to have the best experience possible to optimize their data, and control warehouse costs, so they have partnered with Untitled Firm to create an offering for Sisense customers who are on Snowflake. If you have any questions please contact Ellie Puckett, Strategic Partnerships Director - ellie.puckett@untitledfirm.com and we can help you determine which is the best path for your business.

Updated 02-16-2022

Version 4.0

partners & implementations

product features

Adriana_Onti

Cloud Apps

Joined January 24, 2022

View Profile