cancel
Showing results for 
Search instead for 
Did you mean: 
Community_Admin
Community Team Member
Community Team Member

Adding Data Security to an existing Elasticube Model can have little to great effect on the performance of the system. It is important to already have planned the Elasticube Model with the intention to apply Data Security in the future. This article will detail modeling strategies to overcome Data Security pitfalls.

The Problem:

Since Data Security forces "join" between tables that are not normally joined together it can produce inaccurate results or create instability in the system.

The above-mentioned problem is particularly for the dashboard filters if your business use case requires you to limit the members that are displayed on the filter, using Data Security. 

Scenario:

Elasticube Model for Analyzing Sales and Inventory

Community_Admin_0-1634680887043.png

In the above scenario, we want to limit our employees to only see the Sale and Purchases corresponding to themselves. We facilitate this by applying data security on the EmployeeId column and joining the DimEmployees to Fact_Sale_Orders and Fact_Purchase_Orders tables. Now when a user logs into a dashboard sourced from our sales Elasticube they will only see data relevant to themselves.

This model now presents us with a problem. Sisense has two options to produce the members' results when a dashboard filter is created using the following dimension tables: DimProducts / DimCountries / DimDate. The Data Security will try and join Dim_Employees (where the Data Security is defined) and one of the other three dimension tables. Sisense can either join through the Fact_Sale_Orders or Fact_Sale_Orders tables, which is a random path problem. Because of this, a user could see inaccurate results on the dashboard filter members - reduced members or members that they are not supposed to see.  

Eg: The following screenshot illustrates what happens when a dashboard filter is created on a field from the DimCountries table. There are two paths that are possible from the security table (DimEmployees) to the dashboard filter's table (DimCountries). Sisense chooses between paths (1) and (2) in random as both the query paths are tied for the shortest path. 

Community_Admin_1-1634680887154.png

Solution 1 - Key Table:

The Key table is a central table that stores all the relationships between the primary key values. This solution solves our problem because the key table has all the possible combinations, so when we join to produce results, we will have all the possible options available from the Fact tables. You can find more details on Key Tables here.

Downsides:

  • Key Table Size - Because the key table is essentially a cross join between the dimension tables it can grow exponentially in size.
  • Dashboard Performance - The Key Table increase the path length between our dimension and fact tables which will result in slower dashboard performance
  • Maintenance - Iterating on an Elasticube Model containing a Key Table could be challenging because planning is key when creating a key table.
  • Build Time - Additional build time will be needed to calculate Key Table

Best Practices:

  • Keep the Key table size below 50M records
  • Try creating partial key tables instead of including all your dimensions if not needed
  • Plan your Key Table!

Security Exception on certain Dimension Tables: 

Sometimes it is not necessary to apply security on all the dimension tables - hence, with the approach of Key Table, you need not include those dimension tables to the Key table and they can be directly connected to the Fact tables. Refer to the screenshot below - usually, it is the DimDate table that need not be protected.  

Community_Admin_2-1634680887021.png

After this, while setting up Data Security, you need to ensure you set the scope correctly. DimDate table should be excluded from the Data Security's scope. Refer to the screenshot below - this means that, when you create a dashboard filter using the DimDate table or a widget just using fields from the DimDate table, the Data Security will not take effect. 

Community_Admin_3-1634680886944.png

You can read more about "Scope Limitations" in Row-level Data Security from here to understand all the different options in detail. (Scope Limitations are available from version 8.2.1 and above)

Solution 2 - Consolidation:

If our fact tables are on the smaller side (<100M records) we could look to consolidate our fact tables into a single fact table.

Community_Admin_4-1634680887031.png

You can create a Custom SQL table to union the two fact tables together. A flag can be created to differentiate the fact tables. In this example, we have created the "Type" field that has two values: Sale or Purchase.

Now, the data security is defined on the DimEmployees table (Security) and if there is a dashboard filter created off of DimCountries / DimDate / DimProducts (Filter), there is only one path for Sisense to reach from the Security dimension table to the Filter dimension table which is via the only fact table in the model - Fact_Sales_Purchase. Thus, there is no random path issue here.  

Community_Admin_5-1634680887140.png

Downsides:

  • Additional Build time to process Custom SQL Table
  • All Dashboard formulas will need to contain a value filter on Type

Best Practices:

  • Consolidated Elasticube record size does not exceed 250M records

Solution 3 - Bridge Table: 

An intermediary table can be created to increase the path length between dimensions and facts to control query routing. Generally, to create a query path from table 1 to table 2, Sisense uses the shortest path algorithm. When we have two fact tables in the model, we choose one of them as the preferred fact table and we try to force the Sisense query path through that fact table. To force the query path, we manipulate the length of the paths by adding one extra table to the path involving the non-preferred fact table, thus indirectly routing Sisense to choose the path involving the preferred fact to reach from the Dimension dimension table to the Filter dimension table. 

Creating Bridge Table and setting up the path: 

For every Dimension table that needs to be filtered by Data Security (DimEmployees), we need to create one bridge table. In our example, we need DimCountries and DimProducts to be filtered by Security dimension. For each of these tables, follow the steps below: (Steps below for DimProducts)

  • Step1: Duplicate the key in the DimProducts table and name it "Product ID (1)"
  • Step2: Create a custom table with the following SQL and name it "Product_Bridge"
Community_Admin_6-1634680886681.png
  • Step3: Join DimProducts table and Fact_Sale_Orders table (preferred fact table) directly using DimProducts.Product ID and Fact_Sale_Orders.Product ID
  • Step4: Join DimProducts table and Product_Bridge table using DimProducts.Product ID (1) and Product_Bridge.Product ID (1). Then join Product_Bridge and Fact_Purchase_Orders (non-preferred fact table) on Product_Bridge.Product ID and Fact_Purchase_Orders.Product ID. 

Similarly, repeat the above steps for the DimCountries table as well by creating the Country_Bridge table. 

The other dimension table in the data model that doesn't have a bridge table is the DimDates table which will not be affected by Data Security as the data in the table is insensitive. So, use Scope Limitation (as explained in the "Security Exception on certain Dimension Tables" part of Solution 1) to exclude DimDates from the Data Security's effect. 

 

How does the Query Path work? 

Community_Admin_7-1634680887254.png

You can see from the above image that two tables have been created to increase the path length. The DimProduct table is directly joined to the Fact_Sales_Orders table but joined to the Product_Bridge table to reach SalesOrderDetail. Alternatively, the DimCountries table is directly joined to the Fact_Sales_Orders table but joined to the Country_Bridge table to reach SalesOrderDetail.

So, when DimProducts is used as the Filter Dimension for a dashboard filter and DimEmployees is the security dimension table, taking the query path through Fact_Sale_Orders (in green - 2 joins) will be shorter than the query path through the Fact_Purchase_Orders (in blue - 3 joins) table as the addition of Product_Bridge table increases the query path's length of the latter. Refer to the screenshot below. 

Community_Admin_8-1634680887258.png

 

Downsides:

  • Potential Many 2 Many joins could occur through query routing created by the Sisense Translator

Best Practices:

  • Duplicate the Primary Key in the Dimension table so Sisense does not create a join through the bridge table to both Fact tables.
  • Also, create two copies of keys in the bridge table where one is used to connecting to the dimension table and the other is used to connect to the fact table - this will not create a direct join between the dimension table and the fact table (non-preferred fact)

Solution 4 - Dual Bridge:

There would be two intermediary tables - one on the Fact tables' side (Fact_Bridge) and the other on the Dimension tables' side (Dim_Bridge). These intermediary tables are identical copies of the Security Dimension table and will act as a "bridge" between the Security Dimension table and all Fact tables & all Dim tables. The only constraint to use this approach is that the foreign key to the security dimension table should be present in all the sensitive tables - Fact and Dimension tables whose data is to be securely maintained by data security. 

Creating two Bridge Tables and setting up their path: 

Remove all the connections on the Security Dimension table and duplicate the Security Dimension table twice - Fact_Bridge and Dim_Bridge. Just keep the primary key field + another few copies of the primary key (ID -> ID_Sec1, ID_Sec2 in both Bridge tables - equivalent to the number of Fact Tables and Dim Tables to be secured) in the duplicated (bridge) tables. Also, make two copies of the primary key on the Security Dimension table and name them ID_FB and ID_DB.

All the following three tables will have the same column but duplicated 'n' times to create independent joins: 

DimCustomers - Duplicate the ID field -> ID_FB & ID_DB => Fact_Bridge & Dim_Bridge respectively 

Fact_Bridge - Total of 3 Columns -> ID, ID_Sec1, ID_Sec2 => to connect to two Dim Tables

Dim_Bridge - Total of 3 Columns -> ID, ID_Sec1, ID_Sec2 => to connect to two Fact Tables

*If you have more fact tables / dim tables to secure, then duplicate the ID field accordingly

In the example below, the above steps have been performed to the data model. The Security Dimension table is: DimCustomers (CustomerID) and the foreign key to Dim_Customers is available on sensitive dimension tables (DimProducts and DimEmployees) and all the fact tables.

Community_Admin_9-1634680887047.png

The Fact_Bridge will be between the Security Dimension table and all the Dim tables in the data model. The Dim_Bridge will be between the Security Dimension table and all the Fact tables. Perform the following steps: 

  • Connect DimCustomers and Fact_Bridge on [DimCustomers.ID_FB] & [Fact_Bridge.ID]
  • Connect DimCustomers and Dim_Bridge on [DimCustomers.ID_DB] & [Dim_Bridge.ID]
  • Connect Dim_Bridge and all the Fact tables on [Dim_Bridge.ID_Sec1] & [Dim_Bridge.ID_Sec2] to Fact Tables' Customer_ID (Fact_Sale_orders & Fact_Purchase_Orders)
  • Connect Fact_Bridge and all the sensitive Dimension tables on [Fact_Bridge.ID_Sec1] & [Fact_Bridge.ID_Sec2] to Sensitive Dimension Tables' Customer_ID. (Only on DimEmployees & DimProducts)

After making the above connections, the model will look like the one on the following screenshot: 

Community_Admin_10-1634680887135.png

The other dimension tables in the data model that don't have the CustomerID are DimDate and DimCountries tables which will not be affected by Data Security as the data in the table is insensitive - hence, they are not connected to the Fact_Bridge table. So, use Scope Limitation (as explained in the "Security Exception on certain Dimension Tables" part of Solution 1) to exclude DimDate & DimCountries tables from the Data Security's effect. 

Here is the Scope Limitation setting: 

Community_Admin_11-1634680886906.png

How does the Query Path work? 

When you create a dashboard filter using the table DimEmployees table, Data Security will come into effect and hence Sisense would create a query path between DimCustomers and DimEmployees table. In our data model (as in the screenshot below), there are three possible paths where #1 is through the Fact_Bridge and #2 and #3 are through the Dim_Bridge table. Going through the Dim_Bridge table is a longer route compared to taking the Fact_Bridge table in this case - So Sisense will prefer the shortest path here ending up taking query path #1, which is the desired path. 

Community_Admin_12-1634680887010.jpeg

Now, when you create a widget with Fact_Sale_Orders, Sisense would try to create a query path between DimCustomers and Fact_Sales_Orders - there will be three paths possible in our data model (as in the screenshot below) where #1 will go through the Dim_Bridge and the other two will go through Fact_Bridge and then, each of the sensitive dimension tables. Sisense will obviously choose the path through the Dim_Bridge which is the shortest path here in this scenario and is the desired one. 

Community_Admin_13-1634680887143.png

With the help of the Bridge, we could play with the length of the query paths and force Sisense to take the desired path in different scenarios. 

Downsides:

  • Potential Many 2 Many joins could occur through query routing created by the Sisense Translator when there are complicated front-end formula functions used - like ALL() in certain scenarios and PAST/NEXT/GROWTH functions when applied on the date field that's a part of the sensitive dimensions in the data model. (ex: If DimDate is made sensitive to data security in the example and then PAST/NEXT/GROWTH functions are used in the context of DimDate.Date field)
  • The Dependent filters (dashboard filters) on two different dimension tables doesn't work properly due to Many 2 Many joins

Best Practices:

  • Duplicate the Primary Key in the Security Dimension table twice - one to connect to Dim_Bridge and the other to connect to Fact_Bridge
  • Also, create the required number of copies of keys in each of the bridge tables where each copy is used to independently connecting to the dimension tables and fact tables that need to be secured. 

The Conclusion:

One could use one of the above approaches depending on factors like how the model is structured, how the fact tables in the model are related, what is the desired fact table to apply the security on the dimensions, whether the sensitive dimension tables are directly related to the security dimension table and what the specific use case is. 

If you need more assistance on setting up Data Security on your elasticube, please reach out to your Customer Success Manager (CSM). 

 

Version history
Last update:
‎10-19-2021 03:06 PM
Updated by:
Contributors
Community Toolbox

Recommended quick links to assist you in optimizing your community experience:

Need additional support?:

Community Support Request