Data

5 Topics

Alert on a M2M Risk
Introduction This article describes how to enable an alert* when there is a risk of Many-To-Many relationship (M2M) between two or more tables. * This implementation involve the use of Pulse and it's supported in Sisense v6.5 and above. Motivation Almost every Data Model designer that dealt with a bit more advanced modeling at some point ran into data\performance issue and after a long troubleshoot process he\her found out that the root cause came from M2M. M2M can occur due to the following**: Connecting 2 tables with a pure M2M relationship such as two Fact tables with a non-unique key. A unique key became a non-unique due to changes or data issue within the source system. A business need to have M2M between two entities. From the described scenario you can assume that the M2M can be identified not only during the EC design but also during its ongoing use. Therefore, having a proactive alert for the EC designer can prevent from the user experiencing poor dashboard performances, wrong result sets and save many hours of troubleshooting, panicking users and general feeling of frustration. ** There can be other more complex cases of M2M which we won't describe here as the mechanism of the M2M alert is less relevant for them. Please refer to this article for a more detailed description of M2M relationship Implementation As an Data Model designer, during the design and after the final version of my schema is completed, I want to track on my connection and make sure that my assumptions of uniqueness on one side of the relationship (to get 1:M relationship) is being captured. Step 1 - Build a M2M Detection Table We'll create a table that will reflect the uniqueness of each table within the final schema that has to be unique according to its relevant primary or surrogate key. Basically we'll count the number of total rows within a table, the number of unique values of the table key and compare between them. if they are equal then the key is unique within the table, but if not there is a risk for M2M and so we'll raise a flag as you can see here: We can learn from the output of the table that for our case we have only one table that is not unique (Cost). Here's the query I used in order to create this table: SELECT 'DimDates' AS Table, 'Date' AS Key, count(t.[Date]) [Total Count], DISTINCT_count(t.[Date]) [Distinct Count], count(t.[Date])-DISTINCT_count(t.[Date]) AS [Duplicated Records], ifint(count(t.[Date])-DISTINCT_count(t.[Date])>0,1,0) AS M2M_Flag FROM DimDates t UNION ALL SELECT 'Customer' AS Table, 'CustomerID' AS Key, count(t.CustomerID) [Total Count], DISTINCT_count(t.CustomerID) [Distinct Count], count(t.CustomerID)-DISTINCT_count(t.CustomerID) AS [Duplicated Records], ifint(count(t.CustomerID)-DISTINCT_count(t.CustomerID)>0,1,0) AS M2M_Flag FROM Customer t UNION ALL SELECT 'SalesPerson' AS Table, 'BusinessEntityID' AS Key, count(t.BusinessEntityID) [Total Count], DISTINCT_count(t.BusinessEntityID) [Distinct Count], count(t.BusinessEntityID)-DISTINCT_count(t.BusinessEntityID) AS [Duplicated Records], ifint(count(t.BusinessEntityID)-DISTINCT_count(t.BusinessEntityID)>0,1,0) AS M2M_Flag FROM SalesPerson t UNION ALL SELECT 'Product' AS Table, 'ProductID' AS Key, count(t.ProductID) [Total Count], DISTINCT_count(t.ProductID) [Distinct Count], count(t.ProductID)-DISTINCT_count(t.ProductID) AS [Duplicated Records], ifint(count(t.ProductID)-DISTINCT_count(t.ProductID)>0,1,0) AS M2M_Flag FROM Product t UNION ALL SELECT 'SalesOrderHeader' AS Table, 'SalesOrderID' AS Key, count(t.SalesOrderID) [Total Count], DISTINCT_count(t.SalesOrderID) [Distinct Count], count(t.SalesOrderID)-DISTINCT_count(t.SalesOrderID) AS [Duplicated Records], ifint(count(t.SalesOrderID)-DISTINCT_count(t.SalesOrderID)>0,1,0) AS M2M_Flag FROM SalesOrderHeader t --Example for uniqueness by surrogate key UNION ALL SELECT 'Cost' AS Table, 'Size + Weight' AS Key, count(t2.Key) [Total Count], DISTINCT_count(t2.Key) [Distinct Count], count(t2.Key)-DISTINCT_count(t2.Key) AS [Duplicated Records], ifint(count(t2.Key)-DISTINCT_count(t2.Key)>0,1,0) AS M2M_Flag FROM ( SELECT t.Size + '|' + tostring(t.Weight) AS Key FROM Cost t) t2 --Example for checking uniqueness for a table which not in the final schema but effects another table (as a source of a lookup custom field) UNION ALL SELECT 'Store' AS Table, 'BusinessEntityID' AS Key, count(t.BusinessEntityID) [Total Count], DISTINCT_count(t.BusinessEntityID) [Distinct Count], count(t.BusinessEntityID)-DISTINCT_count(t.BusinessEntityID) AS [Duplicated Records], ifint(count(t.BusinessEntityID)-DISTINCT_count(t.BusinessEntityID)>0,1,0) AS M2M_Flag FROM Store t Step 2 - Create relevant KPI within a widget We'll create an indicator with the following formula: if([Total M2M_Flag]>0,1,0) This widget can be temporary as it only will serve us for alerting when the total of M2M flags is bigger than 0. Step 3 - Add the KPI to Pulse After Adding the widget to the Pulse, we'll mark the condition according the threshold 'Greater than' 0. Feel free to adjust the notification mode and other preferences according your requirements.
intapiuser
04-09-2024 Place Data Models
968Views
0likes
0Comments
QuickBooks DESKTOP - Generating Tables For Profit & Loss Report
In order to generate a profit and loss report from the QuickBooks Desktop Connector, you will need the following tables. Account Custom Profit & Loss Detail The Account table is a standard table available with the CData connector, while the custom table requires a custom stored procedure run to create the RSD schema file Sisense can use to generate a table. This documentation describes how to generate an RSD file for another connector. To generate the RSD use the following query and make sure you enter your own date ranges. You can use the "as of" syntax described in the CData documentation. EXEC CreateReportSchema @ReportName = 'ProfitAndLossDetail', @ReportType = 'PROFITANDLOSSDETAIL', @IncludeColumns = 'ACCOUNT,NAME,DATE, AMOUNT, DEBIT, CREDIT', @DetailLevel = 'ALL', @ReportPeriod = 'YYYY-MM-DD:YYYY-MM-DD'; Use the account numbers from the account tables to group the various line items found in the custom table.
intapiuser
02-15-2024 Place Data Models
919Views
0likes
0Comments
Key Table: Store data relationships between primary key values
Analytical Need A key table (click the link to download a demo Data Model) is a central table that stores all the relationships between the primary key values. Modeling Challenge The main advantages of using Key Table: It helps to tidy up a schema that has tables all over the place It resolved data modeling issues like Many to Many, tables circular joins, Slowly Changing Dimensions, etc. It allows better security control There are too many tables that have multiple fields connecting to each other (each table has a different number of key fields) Tables are connected to one another with a different level of granularity The goal is to create a star (and sometimes snowflake) schema with dimension and fact tables connected to a central key table. Unlike in classical data warehousing, the central table doesn't often have the measures - they generally stay in the fact tables. Each table in the Schema, with some exceptions, is linked to the central key table and to the Key Table only. The exceptions are generally some lookup tables that are not really relevant to have associated with other tables. For example, in the schema below, we have dim_ProductSubcategory linked to dim_Product. It is not really appropriate or necessary to bring the ProductSubcategoryID into the Key Table (although it is possible) so we can leave it out and linked to dim_Product. When all the tables are linked via the Key Table alone, this is generally called a "Star Schema". When we have other tables that remain linked off the main dimension tables (such as Division or Category), is often called a "Snowflake Schema". Note: This table causes an added join between the facts & the dimensions. Since this table can get quite large (depending on the different key combinations in the data set), please perform performance tests to the dashboards that rely on this cube in order to make sure that performance is not impacted. Image 1. Key Table schema (end result) Solution Step 1 - Make sure all the primary keys of the Dimension tables are unique If a table doesn't have one unique key, derive one using a function like Rank. Here is a simple example of how to test primary key uniqueness using custom SQL statement: SELECT count(ProductID) AS [count] , Distinct_Count(ProductID) AS [Distinct_Count] FROM dim_Product Uniqueness is validated when the number of the distinct values to be equal to the number of rows within the dimension table: Image 2. Uniqueness test Step 2 - Planning is caring! Create a reference spreadsheet that maps all the Primary and Foreign keys that exist in the schema: Step 3 - Create custom “Key” field in every Fact table. Make sure to separate the foreign keys with +‘|’+ delimiter and replace the missing fields with ‘_’. For example - "Key" field in fact_Orders : IfString(isnull(tostring(VendorID)),'_',tostring(VendorID)) +'|'+ IfString(isnull(tostring((EmployeeID))),'_',tostring(EmployeeID)) +'|'+ '_' //CustomerID +'|'+ '_' //SalesPersonID +'|'+ IfString(isnull(tostring((ProductID))),'_',tostring(ProductID)) +'|'+ IfString(isnull(tostring((getyear(OrderDate)*10000+getmonth(OrderDate)*100 +getday(Orderdate)))),'_',tostring((getyear(OrderDate)*10000+ getmonth(OrderDate)*100+getday(Orderdate)))) Step 4 - Create the KeyTable that is uniquely identified by the “Key” field Use a mixture of Concatenate and Join to generate the Key table using the existing data. For example: SELECT DISTINCT Key, VendorID, EmployeeID, toint(null) AS CustomerID, toint(null) AS SalesPersonID, ProductID, OrderDate AS Date FROM [fact_Orders] WHERE NOT isnull(Key) UNION SELECT DISTINCT Key, toint(null) AS VendorID, toint(null) AS EmployeeID, CustomerID, SalesPersonID, ProductID, OrderDate AS Date FROM [fact_Sales] WHERE NOT isnull(Key) UNION SELECT DISTINCT Key, toint(null) AS VendorID, toint(null) AS EmployeeID, toint(null) AS CustomerID, SalesPersonID, toint(null) AS ProductID, Month AS Date FROM [MonthlyTarget] WHERE NOT isnull(Key) Step 5 - Make sure the the Key field of the Key table is unique Image 3. Key Table uniqueness test Step 6 - Create the relationships Link all the dimension tables’ primary keys to the associated foreign keys within the Key table Link the “Key” field within the Key Table to the fact tables’ “Key” field Attachments - 9.8 MB - Key.ecdata
intapiuser
02-14-2024 Place Data Models
2.8KViews
0likes
0Comments
Date table with Comprehensive To Date Fields
This article outlines how to create a date table that flags whether or not a particular date is within a particular range, based off of the current date. For example, if today is the 100th day of the year, we may want to analyze information for the first 100 days of the past several years. The flags can identify if it is up to the current Day of the week Week of the Year Day of the Month Day of the Quarter Day of the Year Implementation This will be implemented as a table created via a custom SQL expression in the Elasticube. This expression uses a date table, which is described and can be downloaded under this Date Dimension File Article. Additionally, it joins against the data table that is being used for analysis, to only contain the necessary dates. The following code is used to define the date table SELECT a.*, CASE WHEN a.DayNumOfWeek<=t.DayNumOfWeek THEN 'Yes' ELSE 'No' END AS [Is Week to Date?], CASE WHEN a.WeekNum<=t.WeekNum THEN 'Yes' ELSE 'No' END AS [Is Week of Year to Date?], CASE WHEN a.DayNumOfMonth<=t.DayNumOfMonth THEN 'Yes' ELSE 'No' END AS [Is Month to Date?], CASE WHEN a.DayNumOfQuarter<=t.DayNumOfQuarter THEN 'Yes' ELSE 'No' END AS [Is Quarter to Date?], CASE WHEN a.DayNumOfYear<=t.DayNumOfYear THEN 'Yes' ELSE 'No' END AS [Is Year to Date?] FROM [Date Dimension.csv] a --Establish the necessary date range INNER JOIN (SELECT MIN(s.OrderDate) AS Min_Date, MAX(s.OrderDate) AS Max_Date FROM [SalesOrderHeader] s) mm ON a.Date BETWEEN mm.Min_Date AND mm.Max_Date --Find date information for the current day CROSS JOIN (SELECT d.date, d.DayNumOfMonth, d.DayNumOfQuarter, d.DayNumOfYear, d.WeekNum, d.DayNumOfWeek FROM [Date Dimension.csv] d WHERE d.Date=CreateDate(GetYear(Now()), GetMonth(Now()), GetDay(Now()))) t
intapiuser
02-09-2024 Place Data Models
866Views
0likes
0Comments
Data Transpose Guide
Source data that is flattened may require some or all of the columns of a source table to be transposed, in order to work seamlessly with SiSense’s functionality. This post will outline the cause and two possible solutions to this issue. The first solution uses an excel macro to tranpose data. The second solution provides a method for creating a custom SQL expression that will transpose the data in the ElastiCube Manager. Required: Excel, basic SQL knowledge. Suggested: Notepad++, basic knowledge of regular expressions (Notepad++ Download Page) Transposing data can make development easier on data sets that are not properly structured. This guide also provides an example of advanced custom SQL expressions and bulk creation of SQL statements. For our example, we are using survey data, where every question is a column, and the answer is the value in the column. The question we want to answer with the data is “How often do married people answer ‘Best’ to a question?” With the original table format, we are not able to answer this question a dashboard. In order to answer this question, we have to transpose some of the data. To do this, we need to create a more general ‘Question’ column, where each value is a question name, and a general ‘Answer’ column, where each value is an answer to the corresponding question. The original source data and the desired reformatted data can be found in the attached excel book Prism Reformatter.xls. This picture show the desired result of the process. Method 1 - Excel Macro in Prism Reformatter.xls The attached document Prism Reformatter.xls contains a macro that will transpose the data in a sheet based upon parameters a user can set. This method should be used if your data already in excel, would be easy to put into excel, and has less than 65,000 rows. Prism Reformatter.xls contains 3 sheets. Data- Will contain the data to be transposed. It is currently populated with an example data set. Formatted- This contains a formatted version of the example data set. Configuration- This is the sheet where you define the settings for your data transpose, and it also contains a button that executes the macro to transpose data. The following settings can be configured: Row Fields: The columns in the data sheet to be selected, but not transposed. In our example that would be Date, Name, Gender, Marital Status, and Age Col Fields: The columns to be transposed. In our example that would be Question 1, Question 2, Question 3. Source Sheet: The sheet in the workbook containing the original data. Target Sheet: The sheet to put the reformatted data in. This sheet will be created when the macro is executed Rows/sheet: How many rows to put on a sheet before creating another sheet to put data into. This is to account for the amount of data a worksheet can contain in different version of excel Once these settings are configured, click the Reformat button, and you’re done. Use the new sheet as the source for the Elasticube. Method 2- Custom SQL This method should be used if your data would be difficult to import into excel, or the amount of data is causing poor performance in excel. This can provide a basic methodology for more complex transpose logic that may not be possible with the excel document. At a high level, this solution requires a table to be created that defines the columns to be transposed, as well as a custom SQL table that will build the transposed data in the ElastiCube manager. Listed below is the final custom SQL expression used for this example to transpose the table DATA. This is to give you a sense of what we’re working toward. SELECT D.Date, D.Name, D.Gender, D.[Marital Status], D.Age, CRS.col_nm AS Question, CASE WHEN CRS.VAL=1 THEN D.[Question 1] WHEN CRS.VAL=2 THEN D.[Question 2] WHEN CRS.VAL=3 THEN D.[Question 3] END AS ANSWER FROM Data D CROSS JOIN [CRS_JN] CRS The CROSS JOIN joins every row from Data to every row in CRS_JN. The following diagram gives a high level overview of the process. Create the Custom SQL Expression The steps below explain the process for creating this SQL statement. It can be extended to transpose any number of columns. 1.Create a list of the columns you want to transpose. With our sample data set, we can copy all of the column names that need to be transposed (Question 1,2,3), and transpose them into a new worksheet, and name the column ‘col_nm’. If your source is not excel, a different method will be needed to retrieve the column names, and it will depend on your source system. Once the column names are in an excel sheet, generate a numbered list next to it and name the column ‘val’. I've named the sheet CRS_JN in this scenario. The table should look like this in excel. 2. Import the new workseet into ElastiCube Manager. In your elasticube, click Add Data –> Excel worksheet, and add the sheet that was created in step 1 (CRS_JN). You’ll end up with a new stand-alone table that looks like this: 3. Create the case statement for your custom SQL. This code can be written by hand, but in scenarios where you are transposing large number of columns, this step can be very helpful. Instead of writing each WHEN statement by hand, I used the text editor Notepad++, because it is able to find and replace lines of text based upon patterns. Take the data from the CRS_JN worksheet, without column names, and copy it into Notepad++. It should just be a list of a number, a tab, and a value. Open up the Find and Replace menu (Ctrl+h), and enter the following for your search pattern. (.*)\t(.*) And you replace with this pattern: WHEN CRS.VAL=\1 THEN D\.[\2] Make sure the ‘Regular Expression’ Search Mode is enabled. The following diagram show what happens on a row by row basis in this find and replace. 4. Write the rest of the custom SQL. Write the part of the select statement that doesn't require transposing, and add the cross join. In this case it is SELECT D.Date, D.Name, D.Gender, D.[Marital Status], D.Age FROM Data D CROSS JOIN [CRS_JN] CRS Next, select col_nm from the cross join table, alias it, and write the beginning and end of the case statement. SELECT D.Date, D.Name, D.Gender, D.[Marital Status], D.Age, CRS.col_nm AS Question, CASE END AS ANSWER FROM Data D CROSS JOIN [CRS_JN] CRS Insert the WHEN clause you created in step 3 between the CASE and END indentifiers SELECT D.Date, D.Name, D.Gender, D.[Marital Status], D.Age, CRS.col_nm AS Question, CASE WHEN CRS.VAL=1 THEN D.[Question 1] WHEN CRS.VAL=2 THEN D.[Question 2] WHEN CRS.VAL=3 THEN D.[Question 3] END AS ANSWER FROM Data D CROSS JOIN [CRS_JN] CRS 5. Build the new table. Open the Elasticube Manager, click Add Data -> Custom SQL Expression. Insert the query you generated, parse the SQL to see if it’s correct. If the query parses, build the table and validate the data. How often do married people answer ‘Best’ to a question? We can now answer this question. Create a dashboard with the ElastiCube that contains the transposed table. Create a pivot table that selects Marital Status and Answer as attributes, and add a value that calculates the DupCount(Answer). We now know married people answered 'Best' once. Attached Files Prism Reformatter.xls Transpose_Test.ecdata Related Links Regular Expression Guide: http://letconex.blogspot.com/2013/06/how-to-use-regular-expressions-in.html Cross Join Explanation: http://en.wikipedia.org/wiki/Join_(SQL)#Cross_join
intapiuser
02-09-2024 Place Data Models
3.5KViews
0likes
0Comments