Using Grafana to troubleshoot performance issues
Overview Out of the box, Sisense running on Linux comes with embedded monitoring tool Grafana. Sisense created a few dashboards that can be helpful in order to troubleshoot the performance issues. Grafana can be accessed from Admin page->System Settings-> Monitoring (in top right corner) or by https://your_web_site/app/grafana/ The dashboards list is available by navigating on the left-hand side menu on a 4 square button-> manage Navigation Grafana has some base controls that makes it straightforward to gather info about the system over a period of time. Within a dashboard you will see similar headers to the following: 1: Filter dropdowns - such as namespace (by default, Sisense components run in the sisense namespace), pods, and node(s). 2: Time period - can be configured to be relative or absolute (according to browser timezone) 3: Refresh - can manually toggle the refresh or set a refresh rate Within a visualization, you can hover over particular lines or click on one or more values Click and drag over a time period to zoom in on the time period for more accurate values For more information, check out the official Grafana docs - Working with Grafana dashboard UI Available Dashboards We will focus our troubleshooting using 4 dashboards: Nodes All Pods per namespace Kubernetes / Compute Resources / Namespace (Workloads) Import the dashboard 12928 Nodes Dashboard This dashboard is useful for troubleshooting overall node performance during a period of time. It can tell us what the CPU, RAM, Disk I/O, and networking usage of the machine during a period of time. Use the 'instance' drop down to select which node to examine. All Pods per Namespace Dashboard The dashboard header will allow you to filter the pods/nodes, setup the timeframe, specify refresh/autorefresh: The dashboard has 3 widgets (CPU, RAM and Network) and will show the load of a server pod-by-pod (separately) overlapping each other This is useful when checking what the RAM consumption of a pod is. Please note this dashboard not does not show overall RAM consumption by default, it will show the RAM of each pod individually. Click to check how to add Total: In Grafana (hosted on /app/grafana), under dashboard named all pods per namespace , two widgets (Cpu Usage, Memory Usage) need to add the sum of all Pods. This is mandatory information when troubleshooting resource pressure situations. Add the following metric to the memory widget: sum (container_memory_working_set_bytes{job="kubelet", pod_name=~"$pod", container_name!=""}) We can then see more clearly the state of the machine (RAM in this case): using the Pod filter you can filter all the query or build POD by typing "qry" or "bld" in a filter and selecting needed Pods: Hover over on the widget item to see the name of a POD Drag and drop to select needed timeframe: Kubernetes / Compute Resources / Namespace (Workloads) This dashboard has a lot more widgets and will show you the total usage of RAM, CPU etc Useful when need to see total usage to identify Safe-Mode trigger. For multinode see the manual This dashboard additional widgets that can be helpful in monitoring your server performance, network usage, etc. Sisense Cluster Detail Dashboard (#12928) This dashboard is included by default in many recent versions of Sisense on Linux. This dashboard has a lot of pre-set-up widgets that will show you pods, cluster, Drive, RAM etc usage. Before you try to import this dashboard, check to see if the dashboard is already in your Grafana dashboard menu. If your instance is on an older release or you would like to import a dashboard from other users, follow the steps below: 1. Click the 4 square menu icon, then Manage dashboard 2. Press on import dashboard in the top right corner 3. Specify the number of a dashboard 12928 (or other dashboard number) and press load 4. Select Logs Data source "Prometheus" 5. Click import and enjoy3.5KViews2likes0CommentsHow to fix Safe-Mode on Build, Dashboard
The Safe-Mode is triggered when the pod (query/build) OR overall server RAM consumption gets to 85% of usage. If Safe Mode is triggered on build - it will cancel the build due to OOM. On the dashboard - the application will restart the Query pod of the cube (delete and start it again) in order to release memory. The safe mode also has a grace period that cancels all new queries for 30 seconds. On the next dashboard page restart after 30 seconds, you will see the results in case of correct Data Groups settings. Related Using Grafana to troubleshoot performance issues Log location: single node, (MultiNode logs located on a first node (the one that is the first in the list of nodes in config.yaml that been used to install Sisense.) /var/log/sisense/sisense/ build - ec-<name of cube>-bld-hash.log for example ec-sample-ecommerce-bld-a63510c3-3672-0.log query - ec-<name of cube>-qry-hash.log for example ec-sample-ecommerce-qry-a63510c3-3672-0.log In case you received the Safe-mode triggered while BUILD. Check out the build error message and identify if it's a pod or server overall issue. Build pod limits issue BE#521691 : In order to fix the POD limitation issue, consider increasing the Build Node RAM limits in the Data Groups: Build server overall OOM issue BE#636134: In case of overall server OOM you need to check what is consuming RAM using Grafana, it can be: - Other builds. If so please consider changing the build schedule accordingly. - In most cases, the RAM is consumed by Query pods. If so, try to stop the cubes of the query pod to release the RAM. - Consider RAM upgrade - Increase MAX RAM in Data Group settings Settings of the Elasticube Build Safe-Mode are located in the Configuration Manager, Elasticube Build Params, where you can enable/disable Safe mode and change the % of RAM that should be saved. It's not recommended to disable the Safe-mode as it supposes to save the 15% of RAM for Technicians to login and fix issues. Note, in case disabling Safe Mode the server may become unresponsive and even server restart might not fix the issue. Do not disable Safe-mode without urgent need! In case you received the Safe-mode triggered while surfing the Dashboard. It's also can be or due to POD limitation, or server OOM. The Dashboards uses RAM of the query pods where Sisense keeps results of Dashboard's query execution results for further reuse by another dashboard user. Sisense collects results for 85% of the RAM allocated in the Data group settings after it tries to remove old results and replace them with new, at this time if the huge quey will be received from a dashboard, the Safe Mode will be triggered that will delete the query pod (remove all saved results) and will start a new pod to calculate results. At this moment the user will see the error on a dashboard, however, after the pod will be restarted (usually about 14 sec, however, the more RAM allocated to the query pod the more it will take to release the RAM and start a new pod) user can restart the page to get the results. UPD: starting from Sisense L2021.5, the Soft-Restart of the Elasticubes has been implemented (beta, please test it before go live). The Soft Restart of the qry pod will restart only Monet DB and not the entire pod, which will speed up restart in case hitting Safe-Mode. to enable Safe Restart: - go to the control panel - click 5 times on Sisense Logo at the top left corner - navigate to Management section on the left-hand side menu - scroll down to the end of a page - enable Soft-Restart in order to fix: - Consider changing Data Group Settings - Consider RAM upgrade - increase amount of Instances in the Data Group. So when one of the instances would be restarted du to a Safe-Mode the other will handle the requests Settings of the Elasticube Query Safe-Mode located in the Configuration Manager, Elasticube Query Params: Cleanup Threshold Percentage - the % of the Safe-Mode (default 15%) Safe-Mode Grace Period - after safe-mode is triggered, Sisense will cancel all queries for 30 sec. Disable/Enable Safe-mode - NOT RECOMENDED!!!3.5KViews1like0CommentsJumpToDashboard - Troubleshooting the most common configuration issues
JumpToDashboard - Troubleshooting the most common configuration issues This article provides possible solutions to the most common configuration issues with the JumpToDashboard plugin. Please review the symptom of the issue first (what error/behavior you experience with the JumpToDashboard plugin) and then follow the solution instructions. If this doesn’t solve your issue, feel free to contact our Support Team describing your issue in detail. Symptoms: A target (usually with “_drill” in prefix) dashboard disappeared for non-owner users in the left-hand side panel. Solution: this behavior could be intended and is controlled by the JumpToDashboard parameter called “hideDrilledDashboards”. To make the dashboard visible for the non-owners, please check the following: 1. Log in as a dashboard owner and find the dashboard in question in the left-hand side panel. Click the 3-dots menu and make sure it’s not hidden: 2. If it’s not hidden by the owner intentionally, then navigate to the Admin tab > System Management (under Server & Hardware) > File Management > plugins > jumpToDashboard > js > config.js and check if hideDrilledDashboards set to true. If so, then change it to false and save the changes in the config file. 3. Wait until the JumpToDashboard plugin is rebuilt under the Admin tab > Server & Hardware > Add-ons page and ask your user to refresh the browser page to check if a drill dashboard appears on the left-hand side panel. Symptoms: No "Jump to dashboard" menu appears in a widget edit mode clicking the 3-dots menu. Solution: there could be different reasons for such behavior so check the most common cases below: Double-check if the JumpToDashboard plugin is enabled under the Admin tab > Server & Hardware > Add-ons page. Make sure that both dashboards (parent and target) are based on the same ElastiCube. By default, the JumpToDashboard plugin has sameCubeRestriction: true in the config.js file that prevents the ‘jump to’ menu from appearing when a drill dashboard uses a different data source. Check that the prefix you used for the drill dashboard creation is correct. It could be changed in the config.js file. By default, it uses “_drill”: Symptoms: when clicking on a widget that should open a drill dashboard, nothing happens. Solution: in such cases, we recommend opening your browser console (for example, F12 for Chrome > Console tab) to see if there are any errors that could indicate the issue. For example, a 403 error in the console indicates that the target dashboard is not shared with the user who is experiencing the issue. To fix it, login as an owner of the drill dashboard and share it with the relevant user or group. Symptoms: when clicking on a widget to get the drill dashboard you get a 404 error. Solution: This issue usually happens when the target/drill dashboard is removed from the system. In order to fix it, please follow the steps below: Log in to the system as an owner. Find the parent widget and open it in edit mode. Click the 3 dots menu > choose the ‘Jump to dashboard’ menu and select any other dashboard that exists in the system. Press Apply and publish the changes to other users. Note: if you need just to remove a drill dashboard that doesn’t exist from this widget and not substitute it with another one, try the following: after you choose a new drill dashboard, just unselect it after that and then save the changes. If the jump to dashboard menu doesn’t appear for this widget, try to create a new temporary dashboard with “_drill” in the prefix and do the same. Symptoms: The drill dashboard is not opening for some viewers. Solution: republish the drill dashboard to make sure the updated version is delivered to all end users. Additional Resources: JumpToDashboard Plugin - How to Use and Customize1.5KViews1like2Comments