Data Groups In-depth

intapiuser · ‎03-02-2023

Data groups are way to limit and control the resources of your instance, avoid Out-Of-Memory and Safe Mode exceptions.

First I would like to describe how Sisense works on a general level.

1. When a user opens the dashboard for the first time, Sisense starting (warming up) the Query pod of the cube where it calculates the results and response to the widget/dashboard. (query pod e.g. "ec-SampleCube-qry" )

2. Query pod calculates the results and returns them to the dashboard AND saves the results in the RAM for further re-use by other users.

3. When another user opens the same dashboard with the same filters, the same query will be sent, the query pod will not calculate the results instead it will take ready results from the memory, which will speed up the dashboard load.

4. If the user changes a filter, the new query will be sent to the query pod where it will be re-calculated, returned to the dashboard, and saved for further use by other users, and so on.

(NOTE, if Data Security is applied on the cubes, the query results from another user will not be re-used as Sisense adds additional Joins to apply the Data Security that makes every query unique hence it is calculated every time.)

Eventually, a dashboard could (in case of M2M, heavy formulas, complex aggregations, etc.) occupy the RAM of the entire server, which could trigger Out-Of-Memory issues, impact other ec-qry (dashboard) pods from starting and impacting the overall performance of the server.

To avoid this from happening you can limit resources using the Data groups.

Now let's check the Data Group settings. Followed by tips on best practices.

Data group settings are available from Admin tab -> Data group at the left-hand side menu

Main Section
- Group Name - The name of the group that will be reflected in the Data Group list.
- Build Node(s) - In the case of Multi-node, this is where you should specify the node where the Elasticube will be built. It is crucial to specify the nodes, otherwise, Multi-node capabilities would not be used. In the case of a Single node, the same server is responsible for the build and for the query as well, so it will be the one server to rule them all.
- Query Nodes - also, where you need to specify the nodes for Query purposed (dashboards) mostly for multinode, in case of Single node it should be the same server as for Build. In the case of multinode, this is where you will specify the node where the ec-ElastiCube-qry pod would run. Needed for redundancy and parallelism.
- ElastiCubes - the list of the elasticube to which the Data Group limitations would be applied. Note that you will apply the same settings for each of the cubes in a group. This means limiting the Build to 8 GB will give 8 GB to each build pod (ec-Elasticue-bld) in the group and not to the entire group.
- Instances - the number of query pods created per elasticube when the dashboard is in use. Increasing the number of instances will improve (reduce execution time) the query processing time. This is because the query execution will now be shared between multiple instances. After calculations, the results would not be shared between pods. Increasing the number of instances could help in solving the OOM issues as when one of the pods would be restarted the other will hold the queries. The RAM and CPU limitations would be applied to each of the instances, so if you will limit the RAM for the query to 5 GB you will allow using 5 GB for each instance.

Note: In case the number of instances would be set to zero, it will enable the IDLE timeout that will stop the elasticube (delete query pod) in case the Elasticube was not used for 30 minutes.

Remember that it will take time to start the cube after it was stopped, however, when it's stopped it's not using any resources. This is useful in case the dashboard is not used often and there is no need to keep the results of the queries.

UPDATE: starting from L2022.5 the IDLE option moved from "Instances" and now have own toggle in the bottom of the Data Group Settings.

Change IDLE time

1. Go to the Configuration manager. Available from Admin tab -> System Settings. and on the top right corner

2. Scroll down to the bottom of the page

3. Press on "Show Advanced"

4. Expand "Advanced Management Params"

5. Edit "Stop Unused Datasource (minutes)" to desired

6. Save changes.

Note settings would be applied on elasticube rebuild.

Secondary Section
- ElastiCube Query Recycler - when disabled, the query pod will store the results of query execution, and on each request will re-calculate the results. Could be useful in the case of Data Security when queries could not be re-used by another user. Or in case you testing the cube and do not want to use much RAM.
- Connector Running Mode - will create the connector pod inside the build pod, which will increase the build time but will make the build process more stable. Will require more RAM. Should be used for debugging purposes only.
- Index Size (Short/Long) - Long should be used if the cube has more than 50M rows or in case the text fields have tons of symbols. In this case, Sisense will use another indexing (x64)
- Simultaneous Query Executions - Limits the number of connections that the query Pod will open to the ec-qry Pod for running/executing queries. In case your widget contains heavy formulas it is worth reducing the number to make the pressure lower. Used when experiencing out-of-memory (OOM) issues (java, not Safe-Mode). Parallelization is a trade-off between memory usage & query performance.
  8 is optimal amount of queries

NOT RECOMMENDED changing without Sisense Support/Architect advice.

Query Timeout (Seconds) - how long dashboard will wait for a response with the results of a query from the query pod
% Concurrent Cores per Query - related to Mitosis and Parallelism technology. To minimize the risk of OOM, set this value to the equivalent of 8 cores. e.g. if 32-core, set to 25; for 64 cores set to 14; the value should be an even number. Can be increased up to a maximum of half the total cores - i.e. treat the default as the recommended max that can be adjusted down only. Change it when experiencing out-of-memory (OOM) issues.
Query Nodes Settings (for ec-qry Pods, Dashboards)
- Reserved Cores - number of free cores without which the pod will not start. Meaning if at the moment when the pod should be started ( the dashboard has been opened) Sisense checks if there's 1 free core. If it is - Sisense will start qry pod. If not - will not start. Also, the setting will reserve this core only for the cube, and no one else could use this core(s)
- Max Cores - Maximum cores allowed to be used by the qry pod.
- Reserved RAM (MB) - Same as Reserved Cores but regarding the RAM used by Dashboards. Needed RAM to start the qry pod, also it reserves the RAM so the reserved will be used only by this qry-pod of the cube. If other cubes would require RAM while the build/dashboard use happens, they will not get it from the reserved.
- Max RAM (MB) - Maximum RAM that is allowed to be used by the qry pod ( dashboards). By each of the query instances if were increased in the Instances. Please note that at 85% of the pod usage OR overall server RAM will cause Safe-Mode exception. that in case of Dashboards qry pod) will delete and start the pod again. At the Safe Mode, the users of a dashboard would see the error of a Safe Mode and the qry pod will be deleted and started again. So the next dashboard refresh will bring the data. I would recommend setting the MAX ram to 40GB, this is the max reasonable RAM, if you will see the safe mode error with 40 GB limits - it's time to check the dashboard for optimization and/or many-to-many. Please note that the limit is very dependent on the size of the elasticube, Data Security and formulas you're using. and even 1 gb size on a disk cube can use 40 GB of RAM, when 100GB size on a disk, cube might use 10 GB of RAM, it's ok if elasticube will use 2-3 times its size on a disk. However, in case the elasticube will use 40 GB I would recommend checking some optimization for example in case the RAM is used due to heavy calculations on the widget, you can try to move the calculation to the elastcube build level i.e. create custom table with needed calculation, so they will be performed while the cube is build and on a dashboard - you will use ready results. Yes, in this case, it will use more RAM on a build, but it's will be one time. Please check more best practices and optimizations at the end of the article.
Build Nodes Settings (for ec-bld Pods)
- Reserved Cores - number of free cores without which the pod will not start. Meaning if at the moment when the pod should be started ( the Build started) Sisense check if there's 1 free core. if it is - Sisense will start bld pod. if not - will not start.
- Max Cores - Maximum cores allowed to be used by the BLD pod.
- Reserved RAM (MB) - Amount of free RAM to start the build. Could be used if you know that the build will take 30 GB for example and if not, it will fail. So knowing that it will fail anyway, no point to start it and use RAM, if any way it will fail. Or in order to "reserve" the RAM so no one else will use it when the build will happen and you will be sure that it will finish successfully.
- Max RAM (MB) - the maximum RAM that is allowed to be used by a cube when it builds.

Additional Information
User label - Assigns labels to nodes in the data group. This is useful for scaling your nodes. For example, if you have implemented auto-scaling you can use these labels to scale up the nodes. You can check further information on this article.
Storage Settings
- Interim Build on Local Storage - Enable to build ElastiCubes on the local storage instead of shared storage. Should be used to troubleshoot the issues on a Multinode where cubes are usually stored on shared storage so that all nodes would have access to it and for redundancy as well. On single-node its always build to local storage as single doesn't have shared storage)
- Store ElastiCube Data on S3 Enable - Enable to save your ElastiCubes on S3 if you have configured S3 access in your system configuration. Reflected (cannot be changed from there) in Configuration manager->Management
- Query cube from local storage - Enable to query the cube from the local storage in case Interim Build on Local Storage is enabled, for Multinode. Using local storage can decrease the time required to query the cube, after the first build on local storage.
Set as default - apply to set the group as default. In this case, all new cubes would be created with settings of the default Data Group.

Best Practices

1. Choose the strategy that is more important in your case, successful build or dashboard loads without Safe-Mode exception.

a) Successful build - plan the schedule so the cubes would not overlap each other. as well as set the unlimited (-1) in the Max RAM for Build Nodes

and limit the Query pods to the amount that will save RAM for a successful build

b) Dashboard priority - in this case, you need to understand the limits that the build will take. Run the build with unlimited, and check Grafana how much it will use. Add 15% (Safe-Mode) and set up the Build limits. in this case, the build will not use more than limited and the SafeMode exception will let you know that the Data is increasing, and that will give more control of your environment

2. Setup reasonable limitations for Query pod.

Remember as much as you will limit as much it will take to release the RAM. Also, the reasonable MAX limits in most cases are 40 GB, in case the query pod uses more it is worth checking the dashboards in order to optimize the widgets etc. However, ideal middle-grade cubes should not use more than 10 GB of RAM. Unfortunately, it is not possible to correlate and predict using the size of the cubes it takes on storage VS how much RAM it will use as it depends on the formulas you're using in the widgets. The aggregation from 1 table and pivot with many JOINs will use much more RAM using the same amount of data from cubes. Also, check the Performance Guide

3. Create different groups for different purposes, for example, create a "test" group and limit the resources so that even created by mistake m2m will not overuse the RAM. Create Small Scale group for Demo cubes/dashboards so that under no circumstances they will not use RAM more than you will limit (1 GB for example)

4. Increase the number of instances if RAM is sot an issue but the dashboards load slowly.

5. Consider autoscaling capabilities. For example, you see that the RAM/CPU spike and issues happened only on weekends when you rebuild all your cubes. And the current amount of RAM/CPU is not enough only at this day/week/hour. with autoscaling, Sisense will create an additional node that will handle the additional load and will be scaled down when not needed, for example, IF RAM load >= 80% - start a new node. If RAM get lower 80% scale down the pod. This is cheaper rather upgrade the entire server if at another period of time the RAM is not needed.

Sisense Community

Data Groups In-depth

Change IDLE time