Reboot a single worker node in Azure Red Hat Openshift

1. List the worker nodes and identify the node to be rebooted

$ oc get nodes
NAME                                    STATUS   ROLES    AGE     VERSION
   master-0                  Ready    master   5d17h   v1.24.15+990d55b
   master-1                  Ready    master   5d17h   v1.24.15+990d55b
   master-2                  Ready    master   5d17h   v1.24.15+990d55b
   worker-xxx-01          Ready    worker   5d16h   v1.24.15+990d55b
   worker-xxx-02          Ready    worker   5d16h   v1.24.15+990d55b
   worker-xxx-03          Ready    worker   5d16h   v1.24.15+990d55b

2. Cordon a worker node (Maintenance mode)

oc adm cordon worker-xxx-01

node/worker-xxx-01 cordoned

3. It is recommended to review the list of pods before draining

   $ oc get pods -A -o wide --field-selector spec.nodeName=<worker_node_name>

4. Drain node in preparation for maintenance. If the command fails then use additional options:
--force to delete pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet, or StatefulSet resources.
--delete-emptydir-data option deletes the pods with the local storage
--ignore-daemonsets option ignores the daemon sets, and pod eviction can resume successfully.
--disable-eviction option to bypass PDB and drain the node.

   $ oc adm drain worker-xxx-01 

   node/worker-xxx-01 already cordoned

5. Ensure the drain has scheduled all pods onto other nodes, also there is adequate resources for all nodes. Make sure all the critical applications are still available. Some pods like daemonsets can not be rescheduled and will need to be killed as part of the reboot.

Check for undrained pods on the node

 $ oc get pod -o wide -A | grep "node_name"

Check for recent FailedScheduling events. This may indicate the cluster is under resourced and require additional nodes.

   $ oc get events -A | grep "FailedScheduling"

6. Check the status of the worker nodes, expected Ready,SchedulingDisabled

$ oc get nodes
NAME                                    STATUS   ROLES    AGE     VERSION
   master-0                  Ready    master   5d17h   v1.24.15+990d55b
   master-1                  Ready    master   5d17h   v1.24.15+990d55b
   master-2                  Ready    master   5d17h   v1.24.15+990d55b
   worker-xxx-01          Ready, SchedulingDisabled    worker   5d16h   v1.24.15+990d55b
   worker-xxx-02          Ready    worker   5d16h   v1.24.15+990d55b
   worker-xxx-03          Ready    worker   5d16h   v1.24.15+990d55b

7. The oc debug node/<node_name> command provides a way to open a shell prompt into the worker node. This crates a separate container and mounts the noderoot file system at the /hostfolder, and allows you to inspect any files from the node

 $ oc debug node/worker-xxx-01

   Temporary namespace openshift-debug-kck98 is created for debugging node...
   Starting pod/worker-xxx-01-debug ...
   To use host binaries, run `chroot /host`
   Pod IP: x.x.x.x
   If you don't see a command prompt, try pressing enter.
   sh-4.4#

8. Start a chroot shell in the /host folder

   $ chroot /host

9. Reboot the worker node

$ reboot 

   Removing debug pod ...
   Temporary namespace openshift-debug-xx was removed.

10. Watch the progress and confirm the worker node is rebooted

  $ oc describe node worker-xxx-01 | grep LastTransitionTime -A2

   Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       
   Message
   ----             ------  -----------------                 ------------------                ------                       
   MemoryPressure   False   Tue, 15 Aug 2023 12:40:54 +0800   Tue, 20 Aug 2023 12:18:38 +0800 KubeletHasSufficientMemory

11. Confirm the worker nodes is ready after the reboot

 $ oc wait --for=condition=Ready node/node worker-xxx-01 

   node/node worker-xxx-01  condition met

12. Restore (Uncordon) the worker node from the maintenance mode

$ oc adm uncordon worker-xxx-01

   node/worker-xxx-01 uncordoned

Reboot a single worker node in Azure Red Hat Openshift

How to set a static ip address on Ubuntu

Troubleshoot OpenShift installation

How to install Ansible on CentOS, Rhel and Ubuntu

Related Posts