Choosing the Right Data Model
This post has become outdated. You can find guidance on choosing a data model on our documentation site here. https://docs.sisense.com/main/SisenseLinux/choosing-the-right-data-model.htm Introduction Customers often run into the question of which data model they should use (an ElastiCube, a Live model, or a Build-to-Destination). The following article presents some of the aspects you should consider when choosing between them. Sisense recommends that you discuss your needs and requirements with Sisense's technical team during the Jumpstart process, so the result will best meet your business expectations. Table of Contents Definitions The ElastiCube Data Model Importing data into an ElastiCube data model allows the customer to pull data from multiple data sources on-demand or at a scheduled time, and create a single source of truth inside Sisense. The imported data can then be transformed and aggregated to meet your business needs. Once imported, the data snapshot is used to generate analytical information. The process of importing the data, known as a "Build", includes the following steps: Extract the data: Query the different data source(s) for data. Load the data: Write the data extracted to Sisense (the local MonetDB). Transform the data: Transform the local MonetDB (using SQL queries). To read more about ElastiCubes, see Introducing ElastiCubes. The Live Data Model Using a Live data model does not require importing data. Only the data's schema needs to be defined. Once configured, analytical information required by the user is queried directly against the backend data source. To read more about Live models, see Introducing Live Models. Determining Factors Refresh Rate One of the most fundamental aspects of determining your data model is your data's refresh rate. The data refresh rate refers to the age of the data in your dashboards: For Live models, the data displayed on your dashboards is near-real-time, as every query is passed directly to the backend database. A good example of using a live model (due to refresh rate requirements) is a dashboard that shows stock prices. For ElastiCubes, the data displayed on your dashboard is current to the last successful build event. Every query is passed to the local database for execution. A good example of using an ElastiCube (due to refresh rate requirements) is a dashboard that shows historical stock prices. In this case, a daily ETL process will provide results that are good enough. To make a choice based on this factor, answer the following questions: How frequently do I need to pull new data from the database? Do all my widgets require the same data refresh frequency? How long does an entire ETL process take? Data Transformation Options The ETL process includes a "Transformation" phase. This transformation phase usually includes: Migrating the data tables into a dim-fact schema Enriching your data Pre-aggregating the data to meet your business needs The amount of data transformation on Sisense helps determine the suitable data model: For Live models, Sisense allows minimal to no data transformation. Data is not imported before a query is issued from the front end. Therefore, data cannot be pre-conditioned or pre-aggregated. Most data sources used by Live models are data warehouses that may perform all data preparations themselves. For ElastiCubes, data is imported before a query is issued from the front end. Therefore, it may be pre-conditioned and pre-aggregated. A user may customize the data model to optimally answer their business questions. To make a choice based on this factor, answer the following questions: Is my data in a fact-dim schema? Does my data require enriching or pre-conditioning? Can my data be pre-aggregated? Operational Database Load Your operational databases do more than serve your analytical system. Any application loading the operational databases should be closely examined: For Live models, Sisense will constantly query information from your operational databases, and feed it into your dashboard widgets. This occurs every time a user loads a dashboard. For ElastiCubes, Sisense highly stresses your operational databases during an ETL process while reading all tables. To make a choice based on this factor, answer the following questions: Does the analytical system stress my operational database(s)? Can the query load be avoided by using a "database replica"? Operational Database Availability Your operational database(s) availability is critical for collecting information for your analytical system. For Live models, all queries are redirected to your data sources. If the data source is not available, widgets will generate errors and not present any data. For ElastiCubes, data source availability is critical during the ETL process. If the data source is not available, the data in your widgets will always be available, but not necessarily be up to date. To make a choice based on this factor, answer the following questions: How frequently are analytical data sources offline? How critical is my analytical system? Is being offline (showing out-of-date information) acceptable? Additional Vendor Costs Various database vendors use a chargeback charging model, meaning that you will be charged by the amount of data you pull from the database or the computational power required to process your data. For Live models, every time a user loads a dashboard, each widget will trigger (at least) one database query. A combination of a chargeback charging model and a large user load may result in high costs. For ElastiCubes, every time the user triggers an ETL process, a large amount of data is queried from the database and loaded into Sisense. To make a choice based on this factor, answer the following questions: What is the number of users using my dashboards / What is my "build" frequency? Which data model will result in lower costs? What is the tipping point? Are you willing to pay more for real-time data? Database Size For ElastiCubes, please refer to these documents: Introducing ElastiCubes Minimum Requirements for Sisense in Linux Environments For Live models, there is no limitation as data is not imported to Sisense, only the data's schema. To make a choice based on this factor, answer the following questions: What is the amount of data I need in my data model? What is the amount of history I need to store? Can I reduce the amount of data (e.g., trimming historical data? reducing the number of columns? etc.) Query Performance Query performance depends on the underlying work required to fetch data and process it. Although every widget generates a query, the underlying data model will determine the work necessary to execute it. For ElastiCubes, every query is handled inside Sisense: The client-side widget sends a JAQL query to the Sisense analytical system. The query Is translated into SQL syntax, and run against an internal database. The query result is transformed back to JAQL syntax and returned to the client-side. For Live models, every query is forwarded to an external database and then processed internally: The client-side widget sends a JAQL query to the Sisense analytical system. The query Is translated into SQL syntax, and run against an external database. Sisense waits for the query to execute. Once returned, the query result is transformed back into JAQL syntax and returned to the client-side. To make a choice based on this factor, answer the following questions: How sensitive is the client to a delay in the query's result? When showing real-time data, is this extra latency acceptable? Connector Availability Sisense supports hundreds of data connectors (see Data Connectors). However, not all connectors are available for live data models. The reasoning behind this has to do with the connector's performance. A "slow connector" or one that requires a significant amount of processing may lead to a bad user experience when using Live models (that is, widgets take a long time to load): For ElastiCubes, Sisense allows the user to utilize all the data connectors. For Live models, Sisense limits the number of data connectors to a few high-performing ones (including most data warehouses and high-performing databases). To make a choice based on this factor, answer the following questions: Does my data source's connector support both data model types? Should I consider moving my data to a different data source to allow live connectivity? Caching Optimization Sisense optimizes performance by caching query results. In other words, query results are stored in memory for easier retrieval, in case they are re-executed. This ability provides a great benefit and improves the end-user experience: For ElastiCubes, Sisense recycles (caches) query results. For Live models, Sisense performs minimal caching to make sure data is near real-time. (Note that caching can be turned off upon request.) To make a choice based on this factor, answer the following questions: Do I want to leverage Sienese's query caching? How long do I want to cache data? Dashboard Design Limitations Specific formulas (such as Mode and Standard Deviation) and widget types (such as Box plots or Whisker plots) may result in "heavy" database queries: For Live models, Sisense limits the use of these functions and visualizations as the results of these formulas and visualizations may take a long time, causing a bad user experience. For ElastiCubes, Sisense allows the user to use them, as processing them is internal to Sisense. To make a choice based on this factor, answer the following questions: Do I need these functions and visualizations? Can I pre-aggregate the data and move these calculations to the data source instead of Sisense? See also Choosing a Data Strategy for Embedded Self-Service.3.7KViews2likes1CommentFlattening JSON Objects In SQL Server
Question: In instances where tables within SQL Server contain JSON objects within a column you can use the import sql query editor feature in Sisense to flatten these objects into individual fields. For example: Can be turned into: Solution: To accomplish this, SQL Server has native functions that are capable of parsing json objects and flattening them into individual fields. The below code uses the JSON_Value extracts a scalar value from a JSON string in which the first parameter takes an expression which is typically the name of a variable or column that contains JSON text. The second parameter takes a JSON path that specifies the property to extract. The default path mode is lax mode, which takes the syntax '$.<keyname>'. Below is the SQL used to flatten the JSON values in the sample data. select SalesOrderNumber , JSON_Value(JSONValue,'$.CustomerID') as CustomerID , JSON_Value(JSONValue,'$.OrderDate') as OrderDate , JSON_Value(JSONValue,'$.TotalDue') as TotalDue , JSON_Value(JSONValue,'$.ShipMethodID') as ShipMethodID , JSON_Value(JSONValue,'$.TerritoryID') as TerritoryID , JSON_Value(JSONValue,'$.SalesPersonID') as SalesPersonID from OrderHeaderJSON Other useful functions regarding working with JSON in SQL Server can be found here: ISJSON (Transact-SQL) tests whether a string contains valid JSON. JSON_VALUE (Transact-SQL) extracts a scalar value from a JSON string. JSON_QUERY (Transact-SQL) extracts an object or an array from a JSON string. JSON_MODIFY (Transact-SQL) changes a value in a JSON string.11KViews0likes0CommentsEnsuring Elasticube is in Sync with Data Source (Deletion & Updates in Accumulate Build)
Capturing accurate business transactions is imperative in understanding business operations. If data is constantly changing, such as being deleted or updated, how do you ensure your Elasticube is in sync with your data source? Depending on the use case, sometimes “Replace All” data could be an option, but that might not be the most efficient option if you have a huge amount of data. You also have “Accumulated By” as an option, but that could result in duplicated records and deleted records still being shown in Elasticube. Given these potential results, another path you should consider is a custom code solution. Custom Code Solution Leverage the demo below to learn more. Note in the demo, we will be using an oracle database with a transaction table. The transaction table has an updated_date column that captures the latest data changes and an audit table where deleted transaction IDs are stored with timestamps via a trigger. Please note, all tables in the demo contain dummy records. You can find the DDL for table and trigger here. Using custom code to create new tables in Elasticube will allow the Elasticube to be in sync with your data source table. This new table will only show the latest records, while also removing any deleted records via trigger. The audit table will capture deleted records. (NOTE: Adding triggers could impact database performance as every time a record is deleted there needs to be a write operation). Additionally, this custom code solution provides the ability for change data capture (CDC) to answer business questions like at what intervals the record was updated and/or deleted via the audit table. Let’s begin by building an Elasticube. Use updated_date as “Accumulated By” for both the tables and build the Elasticube by selecting BY TABLE. After this, we will add Custom Code that will contain all the logic required to obtain the latest records and also remove any deleted records. Step 1: Create tables, triggers, and records in Source Database. Sample code is available here. Step 2: Create Elasticube by adding both tables and selecting deleted_on column as accumulated by for audit table and updated_on for fact_sales table. Then build using “BY Table” (Note: The audit table will be empty since we haven’t deleted any records yet) Step 3: Create Custom Code Table If not enabled, go to Admin > Feature Management > Custom Code and save. Select Create your own Notebook In the input parameter, add both base tables, making sure you select the correct parameters and add columns for the output schema. Then hit the open code editor to open a jupyter notebook in a new tab where you will be able to add the logic. Once the jupyter notebook is open, clear all the cells and copy-paste the code from the .ipynb file from the repo. Make changes to the notebook based on your use case. For example, if the Elasticube and table names are different, provide them in cell 2. For this example, the date_change_threshold is the parameter we are choosing to restrict. Save the notebook, return to the previous tab, select the table for input parameters, and click done to create a new Custom Code table in Elasticube. First Build Changes Only which will create the data for the Custom Code Table Step 4: Insert a new record, update an old record, and/or delete a record from the Fact Sales table in your data source. The update trigger will change the Updated_On value for id 9 while the delete trigger will create a new record in the Audit Table Step 5: Build the Elasticube by selecting “By Table”. Now, in the Elasticube you have a fact_sales table which contains all the history/changes of the record and the Custom Code table which has the latest record that are in sync with the data source table. Example below of Table capturing all data changes: Example below of Table in-sync with data source: These steps should result in accurate data in your Elasticube leading to smarter business decisions. Please note, this is one possible workaround for users with similar use cases. Keep in mind that editing the jupyter code is another possibility. For example, you can try to use dask dataframe instead of pandas and see which performance is better with your data. Disclaimer: Please note, that this blog post contains one possible custom workaround solution for users with similar use cases. We cannot guarantee that the custom code solution described in this post will work in every scenario or with every Sisense software version. As such, we strongly advise users to test solutions in their own environment prior to deploying them to ensure that the solutions proffered function as desired in their environment. For the avoidance of doubt, the content of this blog post is provided to you “as-is” and without warranty of any kind, express, implied or otherwise, including without limitation any warranty of security and or fitness for a particular purpose. The workaround solution described in this post incorporates custom coding which is outside the Sisense product development environment and is therefore not covered by not covered by Sisense warranty and support services.5.7KViews2likes0CommentsReason For Adding A Custom Table
Question What is the reason of adding a custom tables in the ElastiCube? As I understand all the customizations and calculations should be done in the Dashboard itself, so why do we need to create custom tables? Answer The reason to create the table in the EC level is reducing the load on the dashboard side. All the calculations you will do on the dashboard will impact the load time on the Front-end. If you need to execute a large calculation it is better to implement and calculate it upon the ElastiCube Build. Also any SQL changes (data type change, or translations).636Views0likes0CommentsStar Schema Modeling Challenge Along With Security Implementation
Question We have multiple clients and to maintain this we modeled the dim_client table and we implemented the data security rules on this table. We have 2 fact tables (fact_sales and fact_budget) to have the sales information and sales target information. Finally we have one dimension table (dim_channel) to maintain the channels information and we are using this channel in filter area. We have one more common date dimension table, that is dim_date Note: According to our business, Channels are dependent on the client, meaning every client is having their own channels. We create the star schema data model with 2 fact tables and 3 dim tables and in addition to this we draw one more additional relationship between dim_client and dim_channel to implement the data security, without this relation data security rules are not working properly because on the filter area we should display the channels related to the logged in user (client). According to the best practices of Sisense we should not create any relationship between to dim tables. Answer You can create a relationship between dim tables, you just have to be careful how and when you do it. Remember, they are all inner joins. As long as there is a 1:1 relationship between client and channel, you can create something like the diagram below. dim_channel will filter dim_client, which in turn will filter your fact tables. Your dim_channel should exist as an "outrigger" dimension of "dim_client.902Views0likes0CommentsLookup With Two Conditions
Question How to do a lookup using two conditions (i.e., pulling project number on a unique equipment for a specific day)? Answer You can only use one field as the key field in Lookup function. So, it is better to create a combined key (using concat or '+') with Equipment in both the tables as Custom Columns and then, use the newly created field to perform the lookup.864Views0likes0Comments