data

73 Topics

Dealing With Missing Values In Python
Python pandas offers a few different options to deal with null values. Based on your dataset, there will likely be a preferred method to account for null values that 1. accurately represents your data and 2. preserves a decent sample size for rigorous analyses. Option 1: Remove the null columns We can use the following command, dropna, to remove columns that have either have all null values or any null values. Be careful when dropping columns that have any null values - there may be cases where your remaining data set would have very few results to analyze! df=df.dropna(axis=1, how='all') df=df.dropna(axis=1, how='any') Option 2: Remove the null rows Alternatively, we can use dropna to remove rows with all or any null values. This looks just like dropping columns, except the axis parameter is set to 0. Again, use discretion when dropping null rows to ensure your remaining results are representative of the larger set of data. Option 3: Replace the null values We can also pick a value that replaces the missing values. For this, we use the fillna function: df[col]=df[col].fillna(value) "Value" can either be a static number (such as 0), or it can just as easily be a summary metric that best represents your data, such as a median or a mean. Option 4: Interpolate results There may be times when backfilling or using a static value isn't sufficient for handling null values. In the cases that the missing values are numeric, the interpolate function can be used! For example, let's say this is our data: We can use python to fill in those three blank values with the following code: df["y"] = df["y"].interpolate(method="quadratic") This will give the following result: This will give the following result: Pretty good!! We can round this by appending .round() to the end of the line: df["y"] = df["y"].interpolate(method="quadratic").round() Quadratic interpolation is just one of the many ways the values can be interpolated. See the Pandas Documentation for more, including cubic and polynomial! How does your data team handle null values? Share your use cases below!
intapiuser
03-02-2023 Place CDT
14KViews
0likes
0Comments
Generating a HAR file for troubleshooting Sisense
When troubleshooting complex issues in Sisense such as PDF export, widget not loading etc., it is sometimes necessary for our support team to obtain additional information about the network requests that are generated in your browser while an issue occurs. A technical solutions engineer may request you to record a .HAR file, or a log of network requests, while that issue is occurring and then provide that to them for further analysis. Below are some instructions about how you can easily generate a HAR file using different browsers. Note: Keep in mind that HAR files contain sensitive data, including content of the pages you downloaded while recording and your cookies. This allows anyone with the HAR file to impersonate your account and all the information that you submitted while recording (personal details, passwords, credit card numbers, etc.). To generate the HAR file for Chrome Open Google Chrome and go to the page where the issue is occurring. Look for the Vertical ellipsis button and select More Tools > Developer Tools. From the panel opened, select the Network tab. Look for a round Record button in the upper left corner of the tab, and make sure it is red. If it is grey, click it once to start recording. Check the box Preserve log. Click the Clear button to clear out any existing logs from the Network tab. Reproduce the issue that you were experiencing before, while the network requests are being recorded. Once you have reproduced the issue, in Chrome, click Download. Then save the file to your computer: Save as HAR with Content Upload your HAR file to your ticket or attach it to your email so that our Support team can analyze it. The instructions are the same for a MAC computer or a Windows computer. Here is a brief animation showing this process: To generate the HAR file for Firefox Open Firefox and go to the page where you are experiencing trouble. Select the Firefox menu (three horizontal parallel lines) at the top-right of your browser window, then select Web Developer > Network. The Developer Network Tools opens as a docked panel at the side or bottom of Firefox. Click the Network tab. The recording autostarts when you start performing actions in the browser. Once you have reproduced the issue and you see that all of the actions have been generated in the Developer Network Panel (should just take a few seconds), right-click anywhere under the File column, and click on Save all as Har. Save the HAR file somewhere convenient. Upload your HAR file to your ticket or attach it to your email so that we may analyze it. The instructions are the same for a MAC computer or a Windows computer. To generate the HAR file for Edge Edge natively produces HAR files. For more instructions, see the instructions from the Microsoft website. Open the Network tool in F12 developer tools. Reproduce the issue. Export captured traffic as a HAR (CTRL + S). To generate the HAR file for Internet Explorer Open Internet Explorer and go to the page where the issue is occurring. Press F12 on your keyboard (or click the gear icon > F12 Developer Tools) Click the Network tab. Reproduce the issue that you were experiencing before, while the network requests are being recorded. Once done click the Save button. Give the trace a filename and click the Save button which will save it as a .har file or .xml file. Upload your HAR file to your ticket or attach it to your email so that we may analyze it. The instructions are the same for a MAC computer or a Windows computer. Please NOTE: You need to be aware that HAR files contain sensitive data, including the content of the pages you downloaded while recording and your cookies. Examples of information contained in the HAR file include personal details, user ID, passwords, and credit card numbers. You can always when the file is downloaded, replace the sensitive data.
intapiuser
03-02-2023 Place Troubleshooting
14KViews
1like
2Comments
Build Better Data Models and Unlock Next-Level Intelligence
In this article, we’ll dig into what data modeling is, provide some best practices for setting up your data model, and walk through a handy way of thinking about data modeling that you can use when building your own.
GuyRaz
10-13-2021 Place Product and Website News
7.8KViews
0likes
0Comments
Sisense Elasticube vs Live Models with Snowflake
What makes the Sisense and Snowflake relationship powerful is the ability to use either ElastiCubes or Live Models with any Snowflake instance. Learn more below!
Adriana_Onti
01-25-2022 Place Product and Website News
5.1KViews
1like
0Comments
Automate Data Workflows with the New Sisense Data Model APIs
Find out what you can do with the new Data Model APIs’ and how you can create and duplicate data models, modify source connections, move data across environments, and schedule data loads, and more.
GuyRaz
10-14-2021 Place Product and Website News
3.9KViews
2likes
0Comments
Elasticube Modeling With Data Security: Best Practices
Adding Data Security to an existing Elasticube Model can have little to great effect on the performance of the system. It is important to already have planned the Elasticube Model with the intention to apply Data Security in the future. This article will detail modeling strategies to overcome Data Security pitfalls. The Problem: Since Data Security forces "join" between tables that are not normally joined together it can produce inaccurate results or create instability in the system. The above-mentioned problem is particularly for the dashboard filters if your business use case requires you to limit the members that are displayed on the filter, using Data Security. Scenario: Elasticube Model for Analyzing Sales and Inventory In the above scenario, we want to limit our employees to only see the Sale and Purchases corresponding to themselves. We facilitate this by applying data security on the EmployeeId column and joining the DimEmployees to Fact_Sale_Orders and Fact_Purchase_Orders tables. Now when a user logs into a dashboard sourced from our sales Elasticube they will only see data relevant to themselves. This model now presents us with a problem. Sisense has two options to produce the members' results when a dashboard filter is created using the following dimension tables: DimProducts / DimCountries / DimDate. The Data Security will try and join Dim_Employees (where the Data Security is defined) and one of the other three dimension tables. Sisense can either join through the Fact_Sale_Orders or Fact_Sale_Orders tables, which is a random path problem. Because of this, a user could see inaccurate results on the dashboard filter members - reduced members or members that they are not supposed to see. Eg: The following screenshot illustrates what happens when a dashboard filter is created on a field from the DimCountries table. There are two paths that are possible from the security table (DimEmployees) to the dashboard filter's table (DimCountries). Sisense chooses between paths (1) and (2) in random as both the query paths are tied for the shortest path. Solution 1 - Key Table: The Key table is a central table that stores all the relationships between the primary key values. This solution solves our problem because the key table has all the possible combinations, so when we join to produce results, we will have all the possible options available from the Fact tables. You can find more details on Key Tables here. Downsides: Key Table Size - Because the key table is essentially a cross join between the dimension tables it can grow exponentially in size. Dashboard Performance - The Key Table increase the path length between our dimension and fact tables which will result in slower dashboard performance Maintenance - Iterating on an Elasticube Model containing a Key Table could be challenging because planning is key when creating a key table. Build Time - Additional build time will be needed to calculate Key Table Best Practices: Keep the Key table size below 50M records Try creating partial key tables instead of including all your dimensions if not needed Plan your Key Table! Security Exception on certain Dimension Tables: Sometimes it is not necessary to apply security on all the dimension tables - hence, with the approach of Key Table, you need not include those dimension tables to the Key table and they can be directly connected to the Fact tables. Refer to the screenshot below - usually, it is the DimDate table that need not be protected. After this, while setting up Data Security, you need to ensure you set the scope correctly. DimDate table should be excluded from the Data Security's scope. Refer to the screenshot below - this means that, when you create a dashboard filter using the DimDate table or a widget just using fields from the DimDate table, the Data Security will not take effect. You can read more about "Scope Limitations" in Row-level Data Security from here to understand all the different options in detail. (Scope Limitations are available from version 8.2.1 and above) Solution 2 - Consolidation: If our fact tables are on the smaller side (<100M records) we could look to consolidate our fact tables into a single fact table. You can create a Custom SQL table to union the two fact tables together. A flag can be created to differentiate the fact tables. In this example, we have created the "Type" field that has two values: Sale or Purchase. Now, the data security is defined on the DimEmployees table (Security) and if there is a dashboard filter created off of DimCountries / DimDate / DimProducts (Filter), there is only one path for Sisense to reach from the Security dimension table to the Filter dimension table which is via the only fact table in the model - Fact_Sales_Purchase. Thus, there is no random path issue here. Downsides: Additional Build time to process Custom SQL Table All Dashboard formulas will need to contain a value filter on Type Best Practices: Consolidated Elasticube record size does not exceed 250M records Solution 3 - Bridge Table: An intermediary table can be created to increase the path length between dimensions and facts to control query routing. Generally, to create a query path from table 1 to table 2, Sisense uses the shortest path algorithm. When we have two fact tables in the model, we choose one of them as the preferred fact table and we try to force the Sisense query path through that fact table. To force the query path, we manipulate the length of the paths by adding one extra table to the path involving the non-preferred fact table, thus indirectly routing Sisense to choose the path involving the preferred fact to reach from the Dimension dimension table to the Filter dimension table. Creating Bridge Table and setting up the path: For every Dimension table that needs to be filtered by Data Security (DimEmployees), we need to create one bridge table. In our example, we need DimCountries and DimProducts to be filtered by Security dimension. For each of these tables, follow the steps below: (Steps below for DimProducts) Step1: Duplicate the key in the DimProducts table and name it "Product ID (1)" Step2: Create a custom table with the following SQL and name it "Product_Bridge" Step3: Join DimProducts table and Fact_Sale_Orders table (preferred fact table) directly using DimProducts.Product ID and Fact_Sale_Orders.Product ID Step4: Join DimProducts table and Product_Bridge table using DimProducts.Product ID (1) and Product_Bridge.Product ID (1). Then join Product_Bridge and Fact_Purchase_Orders (non-preferred fact table) on Product_Bridge.Product ID and Fact_Purchase_Orders.Product ID. Similarly, repeat the above steps for the DimCountries table as well by creating the Country_Bridge table. The other dimension table in the data model that doesn't have a bridge table is the DimDates table which will not be affected by Data Security as the data in the table is insensitive. So, use Scope Limitation (as explained in the "Security Exception on certain Dimension Tables" part of Solution 1) to exclude DimDates from the Data Security's effect. How does the Query Path work? You can see from the above image that two tables have been created to increase the path length. The DimProduct table is directly joined to the Fact_Sales_Orders table but joined to the Product_Bridge table to reach SalesOrderDetail. Alternatively, the DimCountries table is directly joined to the Fact_Sales_Orders table but joined to the Country_Bridge table to reach SalesOrderDetail. So, when DimProducts is used as the Filter Dimension for a dashboard filter and DimEmployees is the security dimension table, taking the query path through Fact_Sale_Orders (in green - 2 joins) will be shorter than the query path through the Fact_Purchase_Orders (in blue - 3 joins) table as the addition of Product_Bridge table increases the query path's length of the latter. Refer to the screenshot below. Downsides: Potential Many 2 Many joins could occur through query routing created by the Sisense Translator Best Practices: Duplicate the Primary Key in the Dimension table so Sisense does not create a join through the bridge table to both Fact tables. Also, create two copies of keys in the bridge table where one is used to connecting to the dimension table and the other is used to connect to the fact table - this will not create a direct join between the dimension table and the fact table (non-preferred fact) Solution 4 - Dual Bridge: There would be two intermediary tables - one on the Fact tables' side (Fact_Bridge) and the other on the Dimension tables' side (Dim_Bridge). These intermediary tables are identical copies of the Security Dimension table and will act as a "bridge" between the Security Dimension table and all Fact tables & all Dim tables. The only constraint to use this approach is that the foreign key to the security dimension table should be present in all the sensitive tables - Fact and Dimension tables whose data is to be securely maintained by data security. Creating two Bridge Tables and setting up their path: Remove all the connections on the Security Dimension table and duplicate the Security Dimension table twice - Fact_Bridge and Dim_Bridge. Just keep the primary key field + another few copies of the primary key (ID -> ID_Sec1, ID_Sec2 in both Bridge tables - equivalent to the number of Fact Tables and Dim Tables to be secured) in the duplicated (bridge) tables. Also, make two copies of the primary key on the Security Dimension table and name them ID_FB and ID_DB. All the following three tables will have the same column but duplicated 'n' times to create independent joins: DimCustomers - Duplicate the ID field -> ID_FB & ID_DB => Fact_Bridge & Dim_Bridge respectively Fact_Bridge - Total of 3 Columns -> ID, ID_Sec1, ID_Sec2 => to connect to two Dim Tables Dim_Bridge - Total of 3 Columns -> ID, ID_Sec1, ID_Sec2 => to connect to two Fact Tables *If you have more fact tables / dim tables to secure, then duplicate the ID field accordingly In the example below, the above steps have been performed to the data model. The Security Dimension table is: DimCustomers (CustomerID) and the foreign key to Dim_Customers is available on sensitive dimension tables (DimProducts and DimEmployees) and all the fact tables. The Fact_Bridge will be between the Security Dimension table and all the Dim tables in the data model. The Dim_Bridge will be between the Security Dimension table and all the Fact tables. Perform the following steps: Connect DimCustomers and Fact_Bridge on [DimCustomers.ID_FB] & [Fact_Bridge.ID] Connect DimCustomers and Dim_Bridge on [DimCustomers.ID_DB] & [Dim_Bridge.ID] Connect Dim_Bridge and all the Fact tables on [Dim_Bridge.ID_Sec1] & [Dim_Bridge.ID_Sec2] to Fact Tables' Customer_ID (Fact_Sale_orders & Fact_Purchase_Orders) Connect Fact_Bridge and all the sensitive Dimension tables on [Fact_Bridge.ID_Sec1] & [Fact_Bridge.ID_Sec2] to Sensitive Dimension Tables' Customer_ID. (Only on DimEmployees & DimProducts) After making the above connections, the model will look like the one on the following screenshot: The other dimension tables in the data model that don't have the CustomerID are DimDate and DimCountries tables which will not be affected by Data Security as the data in the table is insensitive - hence, they are not connected to the Fact_Bridge table. So, use Scope Limitation (as explained in the "Security Exception on certain Dimension Tables" part of Solution 1) to exclude DimDate & DimCountries tables from the Data Security's effect. Here is the Scope Limitation setting: How does the Query Path work? When you create a dashboard filter using the table DimEmployees table, Data Security will come into effect and hence Sisense would create a query path between DimCustomers and DimEmployees table. In our data model (as in the screenshot below), there are three possible paths where #1 is through the Fact_Bridge and #2 and #3 are through the Dim_Bridge table. Going through the Dim_Bridge table is a longer route compared to taking the Fact_Bridge table in this case - So Sisense will prefer the shortest path here ending up taking query path #1, which is the desired path. Now, when you create a widget with Fact_Sale_Orders, Sisense would try to create a query path between DimCustomers and Fact_Sales_Orders - there will be three paths possible in our data model (as in the screenshot below) where #1 will go through the Dim_Bridge and the other two will go through Fact_Bridge and then, each of the sensitive dimension tables. Sisense will obviously choose the path through the Dim_Bridge which is the shortest path here in this scenario and is the desired one. With the help of the Bridge, we could play with the length of the query paths and force Sisense to take the desired path in different scenarios. Downsides: Potential Many 2 Many joins could occur through query routing created by the Sisense Translator when there are complicated front-end formula functions used - like ALL() in certain scenarios and PAST/NEXT/GROWTH functions when applied on the date field that's a part of the sensitive dimensions in the data model. (ex: If DimDate is made sensitive to data security in the example and then PAST/NEXT/GROWTH functions are used in the context of DimDate.Date field) The Dependent filters (dashboard filters) on two different dimension tables doesn't work properly due to Many 2 Many joins Best Practices: Duplicate the Primary Key in the Security Dimension table twice - one to connect to Dim_Bridge and the other to connect to Fact_Bridge Also, create the required number of copies of keys in each of the bridge tables where each copy is used to independently connecting to the dimension tables and fact tables that need to be secured. The Conclusion: One could use one of the above approaches depending on factors like how the model is structured, how the fact tables in the model are related, what is the desired fact table to apply the security on the dimensions, whether the sensitive dimension tables are directly related to the security dimension table and what the specific usecase is. If you need more assistance on setting up Data Security on your elasticube, please reach out to your Customer Success Manager (CSM).
intapiuser
03-02-2023 Place Best Practices
3.6KViews
1like
0Comments
Data Transpose Guide
Source data that is flattened may require some or all of the columns of a source table to be transposed, in order to work seamlessly with SiSense’s functionality. This post will outline the cause and two possible solutions to this issue. The first solution uses an excel macro to tranpose data. The second solution provides a method for creating a custom SQL expression that will transpose the data in the ElastiCube Manager. Required: Excel, basic SQL knowledge. Suggested: Notepad++, basic knowledge of regular expressions (Notepad++ Download Page) Transposing data can make development easier on data sets that are not properly structured. This guide also provides an example of advanced custom SQL expressions and bulk creation of SQL statements. For our example, we are using survey data, where every question is a column, and the answer is the value in the column. The question we want to answer with the data is “How often do married people answer ‘Best’ to a question?” With the original table format, we are not able to answer this question a dashboard. In order to answer this question, we have to transpose some of the data. To do this, we need to create a more general ‘Question’ column, where each value is a question name, and a general ‘Answer’ column, where each value is an answer to the corresponding question. The original source data and the desired reformatted data can be found in the attached excel book Prism Reformatter.xls. This picture show the desired result of the process. Method 1 - Excel Macro in Prism Reformatter.xls The attached document Prism Reformatter.xls contains a macro that will transpose the data in a sheet based upon parameters a user can set. This method should be used if your data already in excel, would be easy to put into excel, and has less than 65,000 rows. Prism Reformatter.xls contains 3 sheets. Data- Will contain the data to be transposed. It is currently populated with an example data set. Formatted- This contains a formatted version of the example data set. Configuration- This is the sheet where you define the settings for your data transpose, and it also contains a button that executes the macro to transpose data. The following settings can be configured: Row Fields: The columns in the data sheet to be selected, but not transposed. In our example that would be Date, Name, Gender, Marital Status, and Age Col Fields: The columns to be transposed. In our example that would be Question 1, Question 2, Question 3. Source Sheet: The sheet in the workbook containing the original data. Target Sheet: The sheet to put the reformatted data in. This sheet will be created when the macro is executed Rows/sheet: How many rows to put on a sheet before creating another sheet to put data into. This is to account for the amount of data a worksheet can contain in different version of excel Once these settings are configured, click the Reformat button, and you’re done. Use the new sheet as the source for the Elasticube. Method 2- Custom SQL This method should be used if your data would be difficult to import into excel, or the amount of data is causing poor performance in excel. This can provide a basic methodology for more complex transpose logic that may not be possible with the excel document. At a high level, this solution requires a table to be created that defines the columns to be transposed, as well as a custom SQL table that will build the transposed data in the ElastiCube manager. Listed below is the final custom SQL expression used for this example to transpose the table DATA. This is to give you a sense of what we’re working toward. SELECT D.Date, D.Name, D.Gender, D.[Marital Status], D.Age, CRS.col_nm AS Question, CASE WHEN CRS.VAL=1 THEN D.[Question 1] WHEN CRS.VAL=2 THEN D.[Question 2] WHEN CRS.VAL=3 THEN D.[Question 3] END AS ANSWER FROM Data D CROSS JOIN [CRS_JN] CRS The CROSS JOIN joins every row from Data to every row in CRS_JN. The following diagram gives a high level overview of the process. Create the Custom SQL Expression The steps below explain the process for creating this SQL statement. It can be extended to transpose any number of columns. 1.Create a list of the columns you want to transpose. With our sample data set, we can copy all of the column names that need to be transposed (Question 1,2,3), and transpose them into a new worksheet, and name the column ‘col_nm’. If your source is not excel, a different method will be needed to retrieve the column names, and it will depend on your source system. Once the column names are in an excel sheet, generate a numbered list next to it and name the column ‘val’. I've named the sheet CRS_JN in this scenario. The table should look like this in excel. 2. Import the new workseet into ElastiCube Manager. In your elasticube, click Add Data –> Excel worksheet, and add the sheet that was created in step 1 (CRS_JN). You’ll end up with a new stand-alone table that looks like this: 3. Create the case statement for your custom SQL. This code can be written by hand, but in scenarios where you are transposing large number of columns, this step can be very helpful. Instead of writing each WHEN statement by hand, I used the text editor Notepad++, because it is able to find and replace lines of text based upon patterns. Take the data from the CRS_JN worksheet, without column names, and copy it into Notepad++. It should just be a list of a number, a tab, and a value. Open up the Find and Replace menu (Ctrl+h), and enter the following for your search pattern. (.*)\t(.*) And you replace with this pattern: WHEN CRS.VAL=\1 THEN D\.[\2] Make sure the ‘Regular Expression’ Search Mode is enabled. The following diagram show what happens on a row by row basis in this find and replace. 4. Write the rest of the custom SQL. Write the part of the select statement that doesn't require transposing, and add the cross join. In this case it is SELECT D.Date, D.Name, D.Gender, D.[Marital Status], D.Age FROM Data D CROSS JOIN [CRS_JN] CRS Next, select col_nm from the cross join table, alias it, and write the beginning and end of the case statement. SELECT D.Date, D.Name, D.Gender, D.[Marital Status], D.Age, CRS.col_nm AS Question, CASE END AS ANSWER FROM Data D CROSS JOIN [CRS_JN] CRS Insert the WHEN clause you created in step 3 between the CASE and END indentifiers SELECT D.Date, D.Name, D.Gender, D.[Marital Status], D.Age, CRS.col_nm AS Question, CASE WHEN CRS.VAL=1 THEN D.[Question 1] WHEN CRS.VAL=2 THEN D.[Question 2] WHEN CRS.VAL=3 THEN D.[Question 3] END AS ANSWER FROM Data D CROSS JOIN [CRS_JN] CRS 5. Build the new table. Open the Elasticube Manager, click Add Data -> Custom SQL Expression. Insert the query you generated, parse the SQL to see if it’s correct. If the query parses, build the table and validate the data. How often do married people answer ‘Best’ to a question? We can now answer this question. Create a dashboard with the ElastiCube that contains the transposed table. Create a pivot table that selects Marital Status and Answer as attributes, and add a value that calculates the DupCount(Answer). We now know married people answered 'Best' once. Attached Files Prism Reformatter.xls Transpose_Test.ecdata Related Links Regular Expression Guide: http://letconex.blogspot.com/2013/06/how-to-use-regular-expressions-in.html Cross Join Explanation: http://en.wikipedia.org/wiki/Join_(SQL)#Cross_join
intapiuser
03-02-2023 Place Data Models
3.3KViews
0likes
0Comments
Analyze Data Faster with Google Cloud’s BigQuery Storage API
As an official Google Cloud partner, Sisense is focused on helping builders of all kinds do more with their data in BigQuery. Our native BigQuery connector and cloud-native, cloud-agnostic architecture make dealing with your BigQuery deployment easy and efficient.
GuyRaz
10-14-2021 Place Product and Website News
2.9KViews
0likes
0Comments
Key Table: Store data relationships between primary key values
Analytical Need A key table (click the link to download a demo Data Model) is a central table that stores all the relationships between the primary key values. Modeling Challenge The main advantages of using Key Table: It helps to tidy up a schema that has tables all over the place It resolved data modeling issues like Many to Many, tables circular joins, Slowly Changing Dimensions, etc. It allows better security control There are too many tables that have multiple fields connecting to each other (each table has a different number of key fields) Tables are connected to one another with a different level of granularity The goal is to create a star (and sometimes snowflake) schema with dimension and fact tables connected to a central key table. Unlike in classical data warehousing, the central table doesn't often have the measures - they generally stay in the fact tables. Each table in the Schema, with some exceptions, is linked to the central key table and to the Key Table only. The exceptions are generally some lookup tables that are not really relevant to have associated with other tables. For example, in the schema below, we have dim_ProductSubcategory linked to dim_Product. It is not really appropriate or necessary to bring the ProductSubcategoryID into the Key Table (although it is possible) so we can leave it out and linked to dim_Product. When all the tables are linked via the Key Table alone, this is generally called a "Star Schema". When we have other tables that remain linked off the main dimension tables (such as Division or Category), is often called a "Snowflake Schema". Note: This table causes an added join between the facts & the dimensions. Since this table can get quite large (depending on the different key combinations in the data set), please perform performance tests to the dashboards that rely on this cube in order to make sure that performance is not impacted. Image 1. Key Table schema (end result) Solution Step 1 - Make sure all the primary keys of the Dimension tables are unique If a table doesn't have one unique key, derive one using a function like Rank. Here is a simple example of how to test primary key uniqueness using custom SQL statement: SELECT count(ProductID) AS [count] , Distinct_Count(ProductID) AS [Distinct_Count] FROM dim_Product Uniqueness is validated when the number of the distinct values to be equal to the number of rows within the dimension table: Image 2. Uniqueness test Step 2 - Planning is caring! Create a reference spreadsheet that maps all the Primary and Foreign keys that exist in the schema: Step 3 - Create custom “Key” field in every Fact table. Make sure to separate the foreign keys with +‘|’+ delimiter and replace the missing fields with ‘_’. For example - "Key" field in fact_Orders : IfString(isnull(tostring(VendorID)),'_',tostring(VendorID)) +'|'+ IfString(isnull(tostring((EmployeeID))),'_',tostring(EmployeeID)) +'|'+ '_' //CustomerID +'|'+ '_' //SalesPersonID +'|'+ IfString(isnull(tostring((ProductID))),'_',tostring(ProductID)) +'|'+ IfString(isnull(tostring((getyear(OrderDate)*10000+getmonth(OrderDate)*100 +getday(Orderdate)))),'_',tostring((getyear(OrderDate)*10000+ getmonth(OrderDate)*100+getday(Orderdate)))) Step 4 - Create the KeyTable that is uniquely identified by the “Key” field Use a mixture of Concatenate and Join to generate the Key table using the existing data. For example: SELECT DISTINCT Key, VendorID, EmployeeID, toint(null) AS CustomerID, toint(null) AS SalesPersonID, ProductID, OrderDate AS Date FROM [fact_Orders] WHERE NOT isnull(Key) UNION SELECT DISTINCT Key, toint(null) AS VendorID, toint(null) AS EmployeeID, CustomerID, SalesPersonID, ProductID, OrderDate AS Date FROM [fact_Sales] WHERE NOT isnull(Key) UNION SELECT DISTINCT Key, toint(null) AS VendorID, toint(null) AS EmployeeID, toint(null) AS CustomerID, SalesPersonID, toint(null) AS ProductID, Month AS Date FROM [MonthlyTarget] WHERE NOT isnull(Key) Step 5 - Make sure the the Key field of the Key table is unique Image 3. Key Table uniqueness test Step 6 - Create the relationships Link all the dimension tables’ primary keys to the associated foreign keys within the Key table Link the “Key” field within the Key Table to the fact tables’ “Key” field Attachments - 9.8 MB - Key.ecdata
intapiuser
03-02-2023 Place Data Models
2.7KViews
0likes
0Comments
Snowflake Optimization Techniques for High Performance Low Latency Dashboards
What makes the Sisense and Snowflake relationship powerful is the ability to optimize Snowflake to produce a high performance, low latency dashboarding experience that allows data consumers to continue getting insight into the valuable data they need.
Adriana_Onti
02-01-2022 Place Product and Website News
2.7KViews
0likes
0Comments