cancel
Showing results for 
Search instead for 
Did you mean: 
Liran_Elnekave
Sisense Team Member
Sisense Team Member

Introduction

This article will guide you on how to configure the Data Groups’ parameters based on the current performance and business requirements.

What are "Data Groups”?

The “Data Groups” administrative feature allows configuring Quality-Of-Service (QoS) to Sisense instances and limiting/reserving the resources of individual data models. By utilizing data groups you’ll make sure available resources are controlled and managed to support your business needs.

Why Should One Configure “Data Groups”?

The main benefits of configuring “Data Groups” are:

  • Controlling which cluster nodes are used for building and querying
  • Limiting the amount of RAM and CPU cores a data model uses
  • Configuring indexing, parallel processing, and caching behavior of a data model
  • Mitigating “Out-Of-Memory" issues caused by lack of resources
  • Preventing “Safe Mode” exceptions caused by lack of resources

Read this article to learn about the different “Data Groups” parameters

“Data Group” Use Cases

The following two scenarios are good examples of using "Data Groups”:

Scenario

A customer embeds Sisense (OEM use case) and has many tenants that can design dashboards (a.k.a. Self-Service BI). Each tenant has a dedicated Elasticube data model.

Challenge

A single tenant is capable of creating a large number of dashboards - Causing the underlying data model to utilize a significant portion of the available CPU/RAM resources. The excessive use of resources by a single tenant may lead to resource depletion and degrade the user experience for them and other tenants.

Solution

Resource Governance – Set up resource utilization limitations using "Data Groups”. Doing so will limit each tenant’s usable resources and prevent tenants from affecting each other.

 

Scenario

A customer has many Elasticube data models created by various departments of the organization. Some of the data models are used to generate high-priority strategical dashboards (used by C-Level managers). Other dashboards are prioritized as well (e.g., operational dashboards vs. dashboards used for testing)

Challenge

C-Level dashboards must have the highest priority and should always be available. Other dashboards should still be operational and behave based on their priority. In case of conflict or a temporary lack of resources, a critical Elasticube may run out of memory or trigger a ‘safe mode’ event.

Solution

Resource Governance – Setting up priority-based Data Groups would result in allocating business-critical data models with more resources and limiting the less critical ones.

 

Data Groups Resource Governance Strategies

Limiting a Data Model’s Resources - Governance rules can be used to limit the resources used by a single data model or a group of multiple data models. This can be done by configuring a "Maximal” amount of allocated CPU/RAM.

Note that the data model would be limited to the configured resource restrictions even though additional resources are available for use.

Pre-Allocating a Data Model’s Resources - Governance rules can be used to pre-allocate the resources used by a single data model or a group of multiple data models. This can be done by configuring a “Reserved” amount of allocated CPU/RAM.

Note that other data models would be limited to partial server resources even though a pre-allocated resource might be idle.

Prioritizing Data Models - Governance rules can be used to prioritize certain data models by allocating a different amount of resources to different data models. High-priority data models would be allocated with more resources than lower-level data models.

You may also choose not to limit the data model’s resources (no Data Group configuration). However, this will re-introduce the initial risk of resource depletion.

 

How To Configure Data Groups?

Configuring “Data Groups” requires high expertise and attention to detail.

A dedicated tool is introduced to assist you in this task.

Follow the directions below to acquire the tool and implement Data Groups:

Prerequisite

To base your calculations and make a decision on how to configure data groups you’ll require data monitoring enabled. To learn more about Sisense monitoring and the data reported read the following article:

https://support.sisense.com/kb/en/article/enable-sending-sisense-monitoring-logs

Step #1 – Download the “Data Groups Configuration Tool”

The data groups tool (Excel Document) is attached to this article.
Download a local copy to your computer.

Step #2 – Access Logz.IO

To access the Logz.IO platform:

  1. Log in to the Logz.io website
  2. Navigate to the “SA - Linux - Data Groups“ dashboard
  3. Set the timeframe filter to a 21-day timeframe

Step #3 – Fill out the “Data Groups Configuration Tool” Document

The “Elasticubes” sheet holds the data for the decision making regarding the different groups:

Field

Description

Comment

CubeName

The ElastiCube name

 

Size GB

The ElastiCube’s size on disk

The measure is taken from the “Data” tab ElastiCube list

Estimated size in memory

The estimated maximal ElastiCube size in memory

Auto-calculated

([Size GB] X 2.5)

Peak query memory consumption (GB)

The actual maximal ElastiCube size in memory (should be smaller than the estimated maximal size)

The measure is taken from the logz.io dashboard:

(“Max Query Memory” field)

Build frequency

The frequency of the ETL Sisense performs

The measure is taken from Sisense’s build scheduler

Average of concurrent Query

Average concurrent queries sent from dashboards to the ElastiCube

The measure is taken from the logz.io dashboard:

Search for the ‘Max concurrent queries per Cube’ widget and download the CSV file with the data. Calculate the average value of the “Max concurrent queries per Cube: column

Business Criticality

This is a business measure that determines the QoS of the ElastiCube

True (High) / False (Low)

Data security

Is “Data Security” applied to this ElastiCube

This column will help determine if the “ElastiCube Query Recycler” parameter will improve the performance or not. Explanation here under “ElastiCube Query Recycler”

Data Group

The Data Group this ElastiCube should be a part of

Try to fill in the column by classifying your ElastiCubes according to common characteristics both in terms of business and in terms of resource consumption.

 

Step #4 – Define your Data Groups

Using the information you’ve collected and the explanations in this article – Define the data groups you wish to use.

Fill in the information in the “Required data groups” sheet.

The “Required data groups” sheet provides the name and configuration for each data group. Use this to tab describe the Data Groups that meet your business needs, use the “Intent” column to describe each group's purpose. The configuration in this sheet will later be used to configure the Data Groups in the Sisense Admin console:

Field

Description

Comment

Group name

The Data Group’s name

 

Intent

The Data Group’s description (in the business point of view)

 

Instances

The number of query instances (pods) created in the Sisense cluster

This parameter is very useful, however, increasing this value will result in increasing the Elasticube’s memory footprint.

 

You should only consider changing this value if your CPU usage reaches 100%.

 

The high CPU consumption is mostly caused by high users concurrency

Build on Local Storage

 

Is enabled, for Multi-node. Using local storage can decrease the time required to query the cube, after the first build on local storage

Simultaneous Query Executions

The maximum number of queries processed simultaneously

In case your widget contains heavy formulas it is worth reducing the number to make the pressure lower. Used when experiencing out-of-memory (OOM) issues.

Parallelization is a trade-off between memory usage & query performance. 8 is the optimal amount of queries

% Concurrent Cores per Query

 

Related to Mitosis and Parallelism technology. To minimize the risk of OOM, set this value to the equivalent of 8 cores. e.g. if 32-core, set to 25; for 64 cores set to 14; the value should be an even number.  Can be increased up to a maximum of half the total cores - i.e. treat the default as the recommended max that can be adjusted down only. Change it when experiencing out-of-memory (OOM) issues

Max Cores

Maximum cores allowed to be used by the qry pod

Limit for less critical groups

Query: Max RAM (MB)

The Max RAM will be consumed per each of the ElastiCubes in each group.

Take from the column “MAX of Peak query memory consumption (GB)” in the Summary of data groups sheet

 

Maximum RAM that is allowed to be used by the qry pod (dashboards). By each of the query instances if were increased in the Instances. Please note that 85% of the pod usage OR overall server RAM will cause a Safe-Mode exception. In the case of Dashboards qry pod) will delete and start the pod again. At the Safe Mode, the users of a dashboard would see the error of a Safe Mode and the qry pod will be deleted and started again. So the next dashboard refresh will bring the data

Step #5 – Verify

The “Summary of Data Groups” sheet includes a pivot chart that will calculate the max memory consumption from each data group. This value correlates to the “Query: Max RAM in MB” configuration.

We need to take the maximal value of Peak query memory consumption (GB) from the Summary of data groups tab and multiply it by 1.20 to avoid Safe mode.

Step #6 – Process the results and configure Sisense Data Groups

Prerequisite - Read this article on how to create data-groups (5 min read): https://documentation.sisense.com/docs/creating-data-groups

On the Sisense Web Application, for each new data group:

  • Navigate to “Admin” --> Data Groups” and click the “+ Data Group” button.
  • Fill out the information from the “Required data groups” tab.
Liran_Elnekave_0-1644318516409.png

Step #7 – Monitor the Sisense Instance

The final step is to follow the information in the “Sisense Monitor” and to make sure the performance is improving.

Review the documentation below for more details regarding monitoring https://docs.sisense.com/main/SisenseLinux/monitoring-sisense-on-linux.htm

Good luck!

Rate this article:
Version history
Last update:
‎02-13-2024 12:32 PM
Updated by: