How to properly reboot a k8s node(s)

vsolodkyi · ‎07-02-2024

How to properly reboot a k8s node(s)

As part of maintaining a healthy and robust Kubernetes (K8s) cluster, occasional reboots of nodes might be necessary. Whether for system updates, hardware maintenance, or other reasons, it's essential to follow a structured process to ensure minimal disruption to running workloads. Below is a step-by-step guide on safely rebooting nodes within a Kubernetes cluster, covering both Red Hat Enterprise Linux (RHEL) and Ubuntu systems.

Considerations:

Sequential Draining: In a 3-node cluster where 2 nodes are for app/query and 1 is for the build, nodes must be drained individually. This ensures that there is always a node available for serving requests.
Sequential Reboot: Another node cannot be drained until the previous one is fully booted and functioning. At any given time, must be 2 nodes available to maintain cluster stability.
Preparation for Builds: Before starting to drain nodes, it is required to ensure that no builds are running and all scheduled build jobs are suspended. This prevents any disruptions to the build processes during the node rebooting process.
Wait for Pod Migration: It is essential to wait until all pods are successfully moved to another node and are up and running before stopping K8s control plane resources using Docker commands. Failure to do so could result in some services not being fully migrated, leading to unexpected issues after rebooting a node.
Verification Before Next Step: Once a node is rebooted and uncordoned, before draining the next node, ensure that all pods are running and move to the uncordoned node. This ensures that the cluster maintains its desired state and prevents any disruptions to ongoing workloads.

Step 1: Identify Nodes

Firstly, you need to identify the nodes within your Kubernetes cluster. You can do this by running the following command:

kubectl get nodes

This command will provide you with a list of all nodes along with their current status.

Step 2: Drain the Node(s)

Before rebooting a node, it's crucial to gracefully evict all the pods running on that node to ensure they are rescheduled elsewhere in the cluster. To do this, you can use the kubectl drain command. For instance:

kubectl drain <node_name> --ignore-daemonsets --delete-emptydir-data

Replace <node_name> with the name of the node you want to drain. The --ignore-daemonsets flag ensures that system daemons (e.g., monitoring agents) running as DaemonSets are not evicted. The --delete-emptydir-data flag removes data stored in emptyDir volumes associated with the pods.

Step 3: Stop Kubernetes Services (RHEL)

For Red Hat Enterprise Linux (RHEL) systems, it's necessary to stop certain Kubernetes-related Docker containers before rebooting the node. Execute the following commands with root access:

docker stop kubelet kube-proxy kube-scheduler kube-controller-manager etcd kube-apiserver

systemctl stop docker

Stopping these services ensures that Kubernetes components are gracefully shut down before the node is rebooted.

Step 3: Stop Kubernetes Services (Ubuntu)

On Ubuntu or other Debian-based systems, Kubernetes components are typically managed as systemd services. Therefore, you would handle stopping and starting these services differently. You would typically use commands like systemctl stop <service_name> to stop services.

For example, on an Ubuntu system, you might stop Kubernetes components like this:

sudo systemctl stop kubelet kube-proxy kube-scheduler kube-controller-manager etcd kube-apiserver

Step 4: Reboot the Node

Now, you can proceed with rebooting the node using your system's standard reboot command.

Step 5: Verify Docker Service

After the node has rebooted, ensure that the Docker service has started successfully:

systemctl status docker

Step 6: Start Kubernetes Services (RHEL)

Once Docker is up and running, you need to ensure that the essential Kubernetes services are also running. Execute the following commands:

docker start kubelet kube-proxy kube-scheduler kube-controller-manager etcd kube-apiserver

Verify that all necessary containers are running using:

docker ps | grep kube

Step 6: Start Kubernetes Services (Ubuntu)

After the node has rebooted, you can start the Kubernetes services on Ubuntu using systemd commands:

sudo systemctl start kubelet kube-proxy kube-scheduler kube-controller-manager etcd kube-apiserver

Step 7: Uncordon the Node

After confirming that the node is back online and all Kubernetes services are running, you can mark the node as schedulable again:

kubectl uncordon <node_name>

This command allows the Kubernetes scheduler to resume placing pods on the node.

Step 8: Verify Node Status

Finally, confirm that the node is back in a ready state and available for scheduling pods:

kubectl get nodes

This command should display all nodes in the cluster, with the previously rebooted node now showing as ready.

By following these steps, you can safely reboot nodes within your Kubernetes cluster while minimizing disruption to your running workloads.

Sisense Community

How to properly reboot a k8s node(s)