How to configure Data Groups

Liran_Elnekave · ‎02-10-2022

Introduction

This article will guide you on how to configure the Data Groups’ parameters based on the current performance and business requirements.

What are "Data Groups”?

The “Data Groups” administrative feature allows configuring Quality-Of-Service (QoS) to Sisense instances and limiting/reserving the resources of individual data models. By utilizing data groups you’ll make sure available resources are controlled and managed to support your business needs.

Why Should One Configure “Data Groups”?

The main benefits of configuring “Data Groups” are:

Controlling which cluster nodes are used for building and querying
Limiting the amount of RAM and CPU cores a data model uses
Configuring indexing, parallel processing, and caching behavior of a data model
Mitigating “Out-Of-Memory" issues caused by lack of resources
Preventing “Safe Mode” exceptions caused by lack of resources

Read this article to learn about the different “Data Groups” parameters

“Data Group” Use Cases

The following two scenarios are good examples of using "Data Groups”:

Scenario	A customer embeds Sisense (OEM use case) and has many tenants that can design dashboards (a.k.a. Self-Service BI). Each tenant has a dedicated Elasticube data model.
Challenge	A single tenant is capable of creating a large number of dashboards - Causing the underlying data model to utilize a significant portion of the available CPU/RAM resources. The excessive use of resources by a single tenant may lead to resource depletion and degrade the user experience for them and other tenants.
Solution	Resource Governance – Set up resource utilization limitations using "Data Groups”. Doing so will limit each tenant’s usable resources and prevent tenants from affecting each other.

Scenario	A customer has many Elasticube data models created by various departments of the organization. Some of the data models are used to generate high-priority strategical dashboards (used by C-Level managers). Other dashboards are prioritized as well (e.g., operational dashboards vs. dashboards used for testing)
Challenge	C-Level dashboards must have the highest priority and should always be available. Other dashboards should still be operational and behave based on their priority. In case of conflict or a temporary lack of resources, a critical Elasticube may run out of memory or trigger a ‘safe mode’ event.
Solution	Resource Governance – Setting up priority-based Data Groups would result in allocating business-critical data models with more resources and limiting the less critical ones.

Data Groups Resource Governance Strategies

Limiting a Data Model’s Resources - Governance rules can be used to limit the resources used by a single data model or a group of multiple data models. This can be done by configuring a "Maximal” amount of allocated CPU/RAM.

Note that the data model would be limited to the configured resource restrictions even though additional resources are available for use.

Pre-Allocating a Data Model’s Resources - Governance rules can be used to pre-allocate the resources used by a single data model or a group of multiple data models. This can be done by configuring a “Reserved” amount of allocated CPU/RAM.

Note that other data models would be limited to partial server resources even though a pre-allocated resource might be idle.

Prioritizing Data Models - Governance rules can be used to prioritize certain data models by allocating a different amount of resources to different data models. High-priority data models would be allocated with more resources than lower-level data models.

You may also choose not to limit the data model’s resources (no Data Group configuration). However, this will re-introduce the initial risk of resource depletion.

How To Configure Data Groups?

Configuring “Data Groups” requires high expertise and attention to detail.

A dedicated tool is introduced to assist you in this task.

Follow the directions below to acquire the tool and implement Data Groups:

Prerequisite

To base your calculations and make a decision on how to configure data groups you’ll require data monitoring enabled. To learn more about Sisense monitoring and the data reported read the following article:

https://support.sisense.com/kb/en/article/enable-sending-sisense-monitoring-logs

Step #1 – Download the “Data Groups Configuration Tool”

The data groups tool (Excel Document) is attached to this article.
Download a local copy to your computer.

Step #2 – Access Logz.IO

To access the Logz.IO platform:

Log in to the Logz.io website
Navigate to the “SA - Linux - Data Groups“ dashboard
Set the timeframe filter to a 21-day timeframe

Step #3 – Fill out the “Data Groups Configuration Tool” Document

The “Elasticubes” sheet holds the data for the decision making regarding the different groups:

Field	Description	Comment
CubeName	The ElastiCube name
Size GB	The ElastiCube’s size on disk	The measure is taken from the “Data” tab ElastiCube list
Estimated size in memory	The estimated maximal ElastiCube size in memory	Auto-calculated ([Size GB] X 2.5)
Peak query memory consumption (GB)	The actual maximal ElastiCube size in memory (should be smaller than the estimated maximal size)	The measure is taken from the logz.io dashboard: (“Max Query Memory” field)
Build frequency	The frequency of the ETL Sisense performs	The measure is taken from Sisense’s build scheduler
Average of concurrent Query	Average concurrent queries sent from dashboards to the ElastiCube	The measure is taken from the logz.io dashboard: Search for the ‘Max concurrent queries per Cube’ widget and download the CSV file with the data. Calculate the average value of the “Max concurrent queries per Cube: column
Business Criticality	This is a business measure that determines the QoS of the ElastiCube	True (High) / False (Low)
Data security	Is “Data Security” applied to this ElastiCube	This column will help determine if the “ElastiCube Query Recycler” parameter will improve the performance or not. Explanation here under “ElastiCube Query Recycler”
Data Group	The Data Group this ElastiCube should be a part of	Try to fill in the column by classifying your ElastiCubes according to common characteristics both in terms of business and in terms of resource consumption.

Step #4 – Define your Data Groups

Using the information you’ve collected and the explanations in this article – Define the data groups you wish to use.

Fill in the information in the “Required data groups” sheet.

The “Required data groups” sheet provides the name and configuration for each data group. Use this to tab describe the Data Groups that meet your business needs, use the “Intent” column to describe each group's purpose. The configuration in this sheet will later be used to configure the Data Groups in the Sisense Admin console:

Field	Description	Comment
Group name	The Data Group’s name
Intent	The Data Group’s description (in the business point of view)
Instances	The number of query instances (pods) created in the Sisense cluster	This parameter is very useful, however, increasing this value will result in increasing the Elasticube’s memory footprint. You should only consider changing this value if your CPU usage reaches 100%. The high CPU consumption is mostly caused by high users concurrency
Build on Local Storage		Is enabled, for Multi-node. Using local storage can decrease the time required to query the cube, after the first build on local storage
Simultaneous Query Executions	The maximum number of queries processed simultaneously	In case your widget contains heavy formulas it is worth reducing the number to make the pressure lower. Used when experiencing out-of-memory (OOM) issues. Parallelization is a trade-off between memory usage & query performance. 8 is the optimal amount of queries
% Concurrent Cores per Query		Related to Mitosis and Parallelism technology. To minimize the risk of OOM, set this value to the equivalent of 8 cores. e.g. if 32-core, set to 25; for 64 cores set to 14; the value should be an even number. Can be increased up to a maximum of half the total cores - i.e. treat the default as the recommended max that can be adjusted down only. Change it when experiencing out-of-memory (OOM) issues
Max Cores	Maximum cores allowed to be used by the qry pod	Limit for less critical groups
Query: Max RAM (MB)	The Max RAM will be consumed per each of the ElastiCubes in each group. Take from the column “MAX of Peak query memory consumption (GB)” in the Summary of data groups sheet	Maximum RAM that is allowed to be used by the qry pod (dashboards). By each of the query instances if were increased in the Instances. Please note that 85% of the pod usage OR overall server RAM will cause a Safe-Mode exception. In the case of Dashboards qry pod) will delete and start the pod again. At the Safe Mode, the users of a dashboard would see the error of a Safe Mode and the qry pod will be deleted and started again. So the next dashboard refresh will bring the data

Step #5 – Verify

The “Summary of Data Groups” sheet includes a pivot chart that will calculate the max memory consumption from each data group. This value correlates to the “Query: Max RAM in MB” configuration.

We need to take the maximal value of Peak query memory consumption (GB) from the Summary of data groups tab and multiply it by 1.20 to avoid Safe mode.

Step #6 – Process the results and configure Sisense Data Groups

Prerequisite - Read this article on how to create data-groups (5 min read): https://documentation.sisense.com/docs/creating-data-groups

On the Sisense Web Application, for each new data group:

Navigate to “Admin” --> Data Groups” and click the “+ Data Group” button.
Fill out the information from the “Required data groups” tab.

Step #7 – Monitor the Sisense Instance

The final step is to follow the information in the “Sisense Monitor” and to make sure the performance is improving.

Review the documentation below for more details regarding monitoring https://docs.sisense.com/main/SisenseLinux/monitoring-sisense-on-linux.htm

Good luck!

Sisense Community

How to configure Data Groups