Sisense Kubernetes Cluster Health Check

antonvolov · ‎09-07-2022

Sisense Kubernetes Cluster Health Check

Check the pods status

1. Check if there are pods that are not in a Running or Competed state.

kubectl get po -A -o wide | egrep -v 'Running|Completed'

-A is used to get pods from all namespaces (Sisense is usually installed in sisense one)

-o wide is used to get the extended output

The response should be empty:

unnamed (7).png

2. If there is no output (all pods are in a Running or Competed state), check if all containers of the Running pods are READY.

kubectl get po -A -o wide | egrep 'Running'

You should be looking for numbers in the READY column, x/y, where x is the number of ready containers and y is the total number of containers. Please note if x is less than y, then not all containers are ready! Please refer to the sections below for instructions on troubleshooting this issue.

unnamed (35).png

3. If all pods are in a Running or Completed state, and all Running pods containers are READY, check the status of the nodes: kubectl get nodes.

unnamed (9).png

All nodes should have the status ‘Ready.’ If it is a single-node environment, then you will see just one node. If a multi-node environment, then you should see several nodes.

4. If the node is not in a ‘Ready’ state, get details by ‘describing’ the node:

kubectl describe node <node-name>.

5. You may also check storage health by running: kubectl -n sisense get PVC.

6. If all pods are in a Running or Completed state, all Running pods containers are READY, and all nodes are in a Ready state, the basic Kubernetes troubleshooting is complete, and the issue is not in the Kubernetes infrastructure.

What if kubectl is not running?

1. If Linux doesn't recognize the kubectl command, there is an issue with the Kubernetes installation, or the user doesn’t have permissions to run kubectl:

unnamed (11).png

2. The main Kubernetes component is kubelet. Check the status of the kubelet service (does not apply to RKE deployments): systemctl status kubelet

It should be in an active/running state.

3. If the kubelet is not in an active (running) state, try restarting it

4. You may check kubelet logs by running: journalctl -u kubelet

Click shift+g to go to the end of the list

unnamed (31).png

5. If the kubelet is missing, then there is an issue with the Kubernetes installation

6. If the kubelet is in an active (running) state, check if there is a .kube directory in the home directory of your current user:

cd && ls -la .kube

unnamed (14).png

If the .kube directory is missing or empty, the current user is not configured to run kubectl, and there is a problem with the Kubernetes configuration.

What if the pods are not running correctly?

1. If you have a meaningful output, but you have pods in a state other than Running or Completed, or not all containers are READY like the below screenshot, then you will have to describe the pod to understand the reason why it is unhealthy.

unnamed (32).png

2. For example, you have a pod with 0/1 containers READY:

unnamed (33).png

Assuming the pod is in the sisense namespace, copy the name of the pod, in this case, ‘external-plugins-5dcf494b77-gtsfk’, and run

kubectl -n sisense describe pod external-plugins-5dcf494b77-gtsfk

The output will look like this:

unnamed (17).png

The two main sections we are interested in evaluating are Conditions and Events.

Conditions will give you True/False values if the pod is:

-Initialized

-Ready

-ContainerReady

-PodScheduled (pod is placed on the node)

In the example above, the pod had been placed on the node (PodScheduled: True), but it is not ready, although it has been initialized, because its container is not Ready.

Events will give you an excerpt from the kubelet log showing events related to the current pod.

In the example above, the Readiness probe for the pod failed, so the problem is in the application itself and not in the Kubernetes infrastructure.

3. You can check the logs of the pod with the command:

kubectl -n sisense logs external-plugins-5dcf494b77-gtsfk

and look for errors given the clue about the root cause of the problem.

4. The container may be in a state other than Running:

unnamed (18).png

Use describe pod to check its Conditions and Events as we did in the previous case:

kubectl -n sisense describe po external-plugins-5dcf494b77-gtsfk

5. If conditions and events don’t give you enough information about the root cause of the problem, look at the State/Last state section:

unnamed (20).png

In the example above, the Last State is ‘Terminated,’ and the Reason is OOMKilled, which means Out of Memory, Killed. This means that the Kubernetes has killed the container because the latter has exceeded the memory limit.

To increase the memory limit, find the problematic pod: kubectl -n sisense get po

unnamed (21).png

Then find the Kubernetes object managing the pod. In our example

kubectl -n sisense get all | grep connectors

Find a resource without additional random letters/digits in the name:

unnamed (22).png

In our case, it’s a deployment.

Then edit the resource: kubectl -n sisense edit deployment connectors

And search for resources:

unnamed (23).png

Increase the ‘limits’ for the ‘memory’ in our example.

6. Let’s consider another example:

unnamed (24).png

We have a pod in a CrashLoopBackOff state.

Let’s describe the pod: kubectl -n sisense describe po sisense-dgraph-alpha-0

unnamed (25).png

It doesn’t give us anything obvious. Let’s check the logs: kubectl -n sisense logs sisense-dgraph-alpha-0 (add –previous if you don’t have any output).

unnamed (26).png

You can see that the root cause of the issue is the fact that there is “no space left on device.” This means we should allocate more space to the pod.

In this case, you may check the status of persistent volumes and persistent volume claims with kubectl get pv and kubectl -n sisense get PVC.

You are looking for statuses other than Bound:

unnamed (27).png

unnamed (28).png

If you see a status other than Bound, you may “describe” the resource: kubectl -n sisense describe pvc data-dgraph-0 as in the case above.

What else to check?

1. If the cluster looks healthy, but performance suffers, you may check the resource consumption by Sisense services. Start with checking nodes.

kubectl top nodes

The output should look like:

unnamed (29).png

Note if the CPU% is close to 100% or memory% is close to 85%

2. To check the resource consumption of the individual pods run

kubectl -n sisense top po

Note the pods with abnormally high memory consumption:

Conclusion

In conclusion, if you are a Kubernetes pro then this article will help you quickly grasp what infrastructure components are involved in Sisense deployment and what to check next. If you are a Kubernetes newbie, these basic instructions will let you troubleshoot the issue quickly and identify the issue to seek further help.

If you need any additional help, please contact Sisense Support.

Sisense Community

Sisense Kubernetes Cluster Health Check