Adding Data Security to an existing Elasticube Model can have little to great effect on the performance of the system. It is important to already have planned the Elasticube Model with the intention to apply Data Security in the future. This article will detail modeling strategies to overcome Data Security pitfalls.
Since Data Security forces "join" between tables that are not normally joined together it can produce inaccurate results or create instability in the system.
The above-mentioned problem is particularly for the dashboard filters if your business use case requires you to limit the members that are displayed on the filter, using Data Security.
Elasticube Model for Analyzing Sales and Inventory
In the above scenario, we want to limit our employees to only see the Sale and Purchases corresponding to themselves. We facilitate this by applying data security on the EmployeeId column and joining the DimEmployees to Fact_Sale_Orders and Fact_Purchase_Orders tables. Now when a user logs into a dashboard sourced from our sales Elasticube they will only see data relevant to themselves.
This model now presents us with a problem. Sisense has two options to produce the members' results when a dashboard filter is created using the following dimension tables: DimProducts / DimCountries / DimDate. The Data Security will try and join Dim_Employees (where the Data Security is defined) and one of the other three dimension tables. Sisense can either join through the Fact_Sale_Orders or Fact_Sale_Orders tables, which is a random path problem. Because of this, a user could see inaccurate results on the dashboard filter members - reduced members or members that they are not supposed to see.
Eg: The following screenshot illustrates what happens when a dashboard filter is created on a field from the DimCountries table. There are two paths that are possible from the security table (DimEmployees) to the dashboard filter's table (DimCountries). Sisense chooses between paths (1) and (2) in random as both the query paths are tied for the shortest path.
The Key table is a central table that stores all the relationships between the primary key values. This solution solves our problem because the key table has all the possible combinations, so when we join to produce results, we will have all the possible options available from the Fact tables. You can find more details on Key Tables here.
Security Exception on certain Dimension Tables:
Sometimes it is not necessary to apply security on all the dimension tables - hence, with the approach of Key Table, you need not include those dimension tables to the Key table and they can be directly connected to the Fact tables. Refer to the screenshot below - usually, it is the DimDate table that need not be protected.
After this, while setting up Data Security, you need to ensure you set the scope correctly. DimDate table should be excluded from the Data Security's scope. Refer to the screenshot below - this means that, when you create a dashboard filter using the DimDate table or a widget just using fields from the DimDate table, the Data Security will not take effect.
You can read more about "Scope Limitations" in Row-level Data Security from here to understand all the different options in detail. (Scope Limitations are available from version 8.2.1 and above)
If our fact tables are on the smaller side (<100M records) we could look to consolidate our fact tables into a single fact table.
You can create a Custom SQL table to union the two fact tables together. A flag can be created to differentiate the fact tables. In this example, we have created the "Type" field that has two values: Sale or Purchase.
Now, the data security is defined on the DimEmployees table (Security) and if there is a dashboard filter created off of DimCountries / DimDate / DimProducts (Filter), there is only one path for Sisense to reach from the Security dimension table to the Filter dimension table which is via the only fact table in the model - Fact_Sales_Purchase. Thus, there is no random path issue here.
An intermediary table can be created to increase the path length between dimensions and facts to control query routing. Generally, to create a query path from table 1 to table 2, Sisense uses the shortest path algorithm. When we have two fact tables in the model, we choose one of them as the preferred fact table and we try to force the Sisense query path through that fact table. To force the query path, we manipulate the length of the paths by adding one extra table to the path involving the non-preferred fact table, thus indirectly routing Sisense to choose the path involving the preferred fact to reach from the Dimension dimension table to the Filter dimension table.
Creating Bridge Table and setting up the path:
For every Dimension table that needs to be filtered by Data Security (DimEmployees), we need to create one bridge table. In our example, we need DimCountries and DimProducts to be filtered by Security dimension. For each of these tables, follow the steps below: (Steps below for DimProducts)
Similarly, repeat the above steps for the DimCountries table as well by creating the Country_Bridge table.
The other dimension table in the data model that doesn't have a bridge table is the DimDates table which will not be affected by Data Security as the data in the table is insensitive. So, use Scope Limitation (as explained in the "Security Exception on certain Dimension Tables" part of Solution 1) to exclude DimDates from the Data Security's effect.
How does the Query Path work?
You can see from the above image that two tables have been created to increase the path length. The DimProduct table is directly joined to the Fact_Sales_Orders table but joined to the Product_Bridge table to reach SalesOrderDetail. Alternatively, the DimCountries table is directly joined to the Fact_Sales_Orders table but joined to the Country_Bridge table to reach SalesOrderDetail.
So, when DimProducts is used as the Filter Dimension for a dashboard filter and DimEmployees is the security dimension table, taking the query path through Fact_Sale_Orders (in green - 2 joins) will be shorter than the query path through the Fact_Purchase_Orders (in blue - 3 joins) table as the addition of Product_Bridge table increases the query path's length of the latter. Refer to the screenshot below.
There would be two intermediary tables - one on the Fact tables' side (Fact_Bridge) and the other on the Dimension tables' side (Dim_Bridge). These intermediary tables are identical copies of the Security Dimension table and will act as a "bridge" between the Security Dimension table and all Fact tables & all Dim tables. The only constraint to use this approach is that the foreign key to the security dimension table should be present in all the sensitive tables - Fact and Dimension tables whose data is to be securely maintained by data security.
Creating two Bridge Tables and setting up their path:
Remove all the connections on the Security Dimension table and duplicate the Security Dimension table twice - Fact_Bridge and Dim_Bridge. Just keep the primary key field + another few copies of the primary key (ID -> ID_Sec1, ID_Sec2 in both Bridge tables - equivalent to the number of Fact Tables and Dim Tables to be secured) in the duplicated (bridge) tables. Also, make two copies of the primary key on the Security Dimension table and name them ID_FB and ID_DB.
All the following three tables will have the same column but duplicated 'n' times to create independent joins:
DimCustomers - Duplicate the ID field -> ID_FB & ID_DB => Fact_Bridge & Dim_Bridge respectively
Fact_Bridge - Total of 3 Columns -> ID, ID_Sec1, ID_Sec2 => to connect to two Dim Tables
Dim_Bridge - Total of 3 Columns -> ID, ID_Sec1, ID_Sec2 => to connect to two Fact Tables
*If you have more fact tables / dim tables to secure, then duplicate the ID field accordingly
In the example below, the above steps have been performed to the data model. The Security Dimension table is: DimCustomers (CustomerID) and the foreign key to Dim_Customers is available on sensitive dimension tables (DimProducts and DimEmployees) and all the fact tables.
The Fact_Bridge will be between the Security Dimension table and all the Dim tables in the data model. The Dim_Bridge will be between the Security Dimension table and all the Fact tables. Perform the following steps:
After making the above connections, the model will look like the one on the following screenshot:
The other dimension tables in the data model that don't have the CustomerID are DimDate and DimCountries tables which will not be affected by Data Security as the data in the table is insensitive - hence, they are not connected to the Fact_Bridge table. So, use Scope Limitation (as explained in the "Security Exception on certain Dimension Tables" part of Solution 1) to exclude DimDate & DimCountries tables from the Data Security's effect.
Here is the Scope Limitation setting:
How does the Query Path work?
When you create a dashboard filter using the table DimEmployees table, Data Security will come into effect and hence Sisense would create a query path between DimCustomers and DimEmployees table. In our data model (as in the screenshot below), there are three possible paths where #1 is through the Fact_Bridge and #2 and #3 are through the Dim_Bridge table. Going through the Dim_Bridge table is a longer route compared to taking the Fact_Bridge table in this case - So Sisense will prefer the shortest path here ending up taking query path #1, which is the desired path.
Now, when you create a widget with Fact_Sale_Orders, Sisense would try to create a query path between DimCustomers and Fact_Sales_Orders - there will be three paths possible in our data model (as in the screenshot below) where #1 will go through the Dim_Bridge and the other two will go through Fact_Bridge and then, each of the sensitive dimension tables. Sisense will obviously choose the path through the Dim_Bridge which is the shortest path here in this scenario and is the desired one.
With the help of the Bridge, we could play with the length of the query paths and force Sisense to take the desired path in different scenarios.
One could use one of the above approaches depending on factors like how the model is structured, how the fact tables in the model are related, what is the desired fact table to apply the security on the dimensions, whether the sensitive dimension tables are directly related to the security dimension table and what the specific use case is.
If you need more assistance on setting up Data Security on your elasticube, please reach out to your Customer Success Manager (CSM).