cancel
Showing results for 
Search instead for 
Did you mean: 
Ophir_Buchman
12 - Data Integration
12 - Data Integration

This post has become outdated. You can find guidance on choosing a data model on our documentation site here.

https://docs.sisense.com/main/SisenseLinux/choosing-the-right-data-model.htm

 

Introduction

Customers often run into the question of which data model they should use (an ElastiCube, a Live model, or a Build-to-Destination). The following article presents some of the aspects you should consider when choosing between them.

Sisense recommends that you discuss your needs and requirements with Sisense's technical team during the Jumpstart process, so the result will best meet your business expectations.

Table of Contents

Definitions

The ElastiCube Data Model

Importing data into an ElastiCube data model allows the customer to pull data from multiple data sources on-demand or at a scheduled time, and create a single source of truth inside Sisense. The imported data can then be transformed and aggregated to meet your business needs. Once imported, the data snapshot is used to generate analytical information.

The process of importing the data, known as a "Build", includes the following steps:

  1. Extract the data: Query the different data source(s) for data.
  2. Load the data: Write the data extracted to Sisense (the local MonetDB).
  3. Transform the data: Transform the local MonetDB (using SQL queries).

To read more about ElastiCubes, see Introducing ElastiCubes.

The Live Data Model

Using a Live data model does not require importing data. Only the data's schema needs to be defined. Once configured, analytical information required by the user is queried directly against the backend data source.

To read more about Live models, see Introducing Live Models.

Determining Factors

Refresh Rate

One of the most fundamental aspects of determining your data model is your data's refresh rate.
The data refresh rate refers to the age of the data in your dashboards:

  • For Live models, the data displayed on your dashboards is near-real-time, as every query is passed directly to the backend database. A good example of using a live model (due to refresh rate requirements) is a dashboard that shows stock prices.
  • For ElastiCubes, the data displayed on your dashboard is current to the last successful build event. Every query is passed to the local database for execution. A good example of using an ElastiCube (due to refresh rate requirements) is a dashboard that shows historical stock prices. In this case, a daily ETL process will provide results that are good enough.

To make a choice based on this factor, answer the following questions:

  • How frequently do I need to pull new data from the database?
  • Do all my widgets require the same data refresh frequency?
  • How long does an entire ETL process take?

Data Transformation Options

The ETL process includes a "Transformation" phase. This transformation phase usually includes:

  • Migrating the data tables into a dim-fact schema
  • Enriching your data
  • Pre-aggregating the data to meet your business needs

The amount of data transformation on Sisense helps determine the suitable data model:

  • For Live models, Sisense allows minimal to no data transformation. Data is not imported before a query is issued from the front end. Therefore, data cannot be pre-conditioned or pre-aggregated. Most data sources used by Live models are data warehouses that may perform all data preparations themselves.
  • For ElastiCubes, data is imported before a query is issued from the front end. Therefore, it may be pre-conditioned and pre-aggregated. A user may customize the data model to optimally answer their business questions.

To make a choice based on this factor, answer the following questions:

  • Is my data in a fact-dim schema?
  • Does my data require enriching or pre-conditioning?
  • Can my data be pre-aggregated?

Operational Database Load

Your operational databases do more than serve your analytical system. Any application loading the operational databases should be closely examined:

  • For Live models, Sisense will constantly query information from your operational databases, and feed it into your dashboard widgets. This occurs every time a user loads a dashboard.
  • For ElastiCubes, Sisense highly stresses your operational databases during an ETL process while reading all tables.

To make a choice based on this factor, answer the following questions:

  • Does the analytical system stress my operational database(s)?
  • Can the query load be avoided by using a "database replica"?

Operational Database Availability

Your operational database(s) availability is critical for collecting information for your analytical system.

  • For Live models, all queries are redirected to your data sources. If the data source is not available, widgets will generate errors and not present any data.
  • For ElastiCubes, data source availability is critical during the ETL process. If the data source is not available, the data in your widgets will always be available, but not necessarily be up to date.

To make a choice based on this factor, answer the following questions:

  • How frequently are analytical data sources offline?
  • How critical is my analytical system? Is being offline (showing out-of-date information) acceptable?

Additional Vendor Costs

Various database vendors use a chargeback charging model, meaning that you will be charged by the amount of data you pull from the database or the computational power required to process your data. 

  • For Live models, every time a user loads a dashboard, each widget will trigger (at least) one database query. A combination of a chargeback charging model and a large user load may result in high costs.
  • For ElastiCubes, every time the user triggers an ETL process, a large amount of data is queried from the database and loaded into Sisense.

To make a choice based on this factor, answer the following questions:

  • What is the number of users using my dashboards / What is my "build" frequency?
  • Which data model will result in lower costs? What is the tipping point?
  • Are you willing to pay more for real-time data?

Database Size

Sisense is not a data warehouse. It has a limitation on the amount of user data it may store:

  • For ElastiCubes, the rule-of-thumb limitation is 300M rows per ElastiCube (based on an average number of columns and data types).
  • For Live models, there is no limitation as data is not imported to Sisense, only the data's schema.

To make a choice based on this factor, answer the following questions:

  • What is the amount of data I need in my data model?
  • What is the amount of history I need to store?
  • Can I reduce the amount of data (e.g., trimming historical data? reducing the number of columns? etc.)

Query Performance

Query performance depends on the underlying work required to fetch data and process it. Although every widget generates a query, the underlying data model will determine the work necessary to execute it.

  • For ElastiCubes, every query is handled inside Sisense: The client-side widget sends a JAQL query to the Sisense analytical system. The query Is translated into SQL syntax, and run against an internal database. The query result is transformed back to JAQL syntax and returned to the client-side.
  • For Live models, every query is forwarded to an external database and then processed internally: The client-side widget sends a JAQL query to the Sisense analytical system. The query Is translated into SQL syntax, and run against an external database. Sisense waits for the query to execute. Once returned, the query result is transformed back into JAQL syntax and returned to the client-side.

To make a choice based on this factor, answer the following questions:

  • How sensitive is the client to a delay in the query's result?
  • When showing real-time data, is this extra latency acceptable?

Connector Availability

Sisense supports hundreds of data connectors (see Data Connectors). However, not all connectors are available for live data models. The reasoning behind this has to do with the connector's performance. 

A "slow connector" or one that requires a significant amount of processing may lead to a bad user experience when using Live models (that is, widgets take a long time to load):

  • For ElastiCubes, Sisense allows the user to utilize all the data connectors.
  • For Live models, Sisense limits the number of data connectors to a few high-performing ones (including most data warehouses and high-performing databases).

To make a choice based on this factor, answer the following questions:

  • Does my data source's connector support both data model types?
  • Should I consider moving my data to a different data source to allow live connectivity?

Caching Optimization

Sisense optimizes performance by caching query results. In other words, query results are stored in memory for easier retrieval, in case they are re-executed. This ability provides a great benefit and improves the end-user experience:

  • For ElastiCubes, Sisense recycles (caches) query results.
  • For Live models, Sisense performs minimal caching to make sure data is near real-time. (Note  that caching can be turned off upon request.)

To make a choice based on this factor, answer the following questions:

  • Do I want to leverage Sienese's query caching?
  • How long do I want to cache data?

Dashboard Design Limitations

Specific formulas (such as Mode and Standard Deviation) and widget types (such as Box plots or Whisker plots) may result in "heavy" database queries:

  • For Live models, Sisense limits the use of these functions and visualizations as the results of these formulas and visualizations may take a long time, causing a bad user experience.
  • For ElastiCubes, Sisense allows the user to use them, as processing them is internal to Sisense.

To make a choice based on this factor, answer the following questions:

  • Do I need these functions and visualizations?
  • Can I pre-aggregate the data and move these calculations to the data source instead of Sisense?

See also Choosing a Data Strategy for Embedded Self-Service.

Rate this article:
Comments
sbatlanki
7 - Data Storage
7 - Data Storage

Question on Caching Optimization for Live Models (Sisense hosted cloud version - L2023.1 & above versions) :

Where does Sisense store the cache ? Is it in an elastic cube or in-momory? If it's in-memory - what's the impact on performance and any other downsides that customers should be aware of?

Version history
Last update:
‎01-29-2024 12:53 PM
Updated by: