Build Stability Improvements on Heavy Loaded Systems
Symptoms
Relevant for Linux
Below are a couple of select symptoms your system could be experiencing under a big load with simultaneous builds. Please note the list is not exhaustive:
- Build failures over random cubes or a couple of cubes that are being rebuilt manually successfully
- CubeIsUnreachable error
- Failure due to Build Service restart
Diagnosis
Build flow is connected to three main services:
- Build: Takes care of triggering builds and moving logs between the Pod and UI
- Management: Creates ec-bld pod and takes care of Kubernetes level communication over the flow
- Ec: <name_of_the_cube>-bld - actual cube import process which creates the folder, makes import
If memory consumption of the ec-bld pod is being controlled by the DataGroup Max RAM for Build (more in-depth article here) and affects just this cube, then Build/Management works for all builds running on the system. Sometimes when there are 4+ cubes that run in parallel, Build/Management services could be under heavy load and require additional RAM. Since both of the services are Java based they have a default memory limit mechanism that allows 500 MB of RAM to be used. If the service needs more, it could be throttling which can affect defined timeouts or cause service restart.
If you want to ensure that this is the case, please check graphana for build/management services when builds fail. Linked is additional information on how to use Grafana to troubleshoot performance issues. Keep in mind that by default services are limited to 500 MB but could consume more since there are additional parts of the service that are not Java-based and Java limits can peak from time to time. However, if you see that the service is under load, it is a good idea to allocate more RAM for stability improvements.
Solution
- Kubernetes deployment limits
- Java service on the Sisense Configuration side
4. Build/management pod will automatically restart after deployment modification.
6. Save settings.
Tips:
Please ensure that the Memory Limit value is correctly entered. If there is a problem with the value, the service will not start.
Please keep in mind that the Deployment value should be 1000m bigger than the Configuration value.
If you need any additional help, please contact Sisense Support.