Pin a Kubernetes pod to the current node to avoid (hostPath) data loss
TL;DR
here's a handy one-liner to pin a running pod to the node it's currently on:
kubectl patch deployment -n $NAMESPACE $DEPLOYMENT -p '{"spec": {"template": {"spec": {"nodeSelector": {"kubernetes.io/hostname": "'$(kubectl get pods -n $NAMESPACE -o jsonpath='{ ..nodeName }')'"}}}}}' || (echo Failed to identify current node of $DEPLOYMENT pod; exit 1)
The long version
I've been supporting with the Portainer team with a helm chart for their new v2, Kubernetes-supporting version. Recently the boss told me:
"Sometimes, when using one of these small/development, multi-node Kubernetes clusters like k3s or microk8s, Kubernetes will schedule the pod to a particular node, but when the pod moves to a different node, the data is lost. Find a way to ensure that the pod always remains on the same node"!
"Nonsense", I replied. "The Kubernetes storage provisioner will be smart enough to ensure that an allocated PV doesn't just move to a different node". And to prove how smart I was, I illustrated by creating a multi-node KinD cluster:
❯ cat kind.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
❯ kind create cluster --config kind.yaml
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.19.1) ?
✓ Preparing nodes ? ? ?
✓ Writing configuration ?
✓ Starting control-plane ?️
✓ Installing CNI ?
✓ Installing StorageClass ?
✓ Joining worker nodes ?
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Thanks for using kind! ?
I created the namespace, added the helm repo, and deployed the chart:
❯ kubectl create namespace portainer
namespace/portainer created
❯ helm repo add portainer https://portainer.github.io/k8s/
❯ helm repo update
❯ helm upgrade --install -n portainer portainer portainer/portainer
Release "portainer" does not exist. Installing it now.
NAME: portainer
LAST DEPLOYED: Wed Dec 9 21:08:09 2020
NAMESPACE: portainer
STATUS: deployed
REVISION: 1
NOTES:
1. Get the application URL by running these commands:
export NODE_PORT=$(kubectl get --namespace portainer -o jsonpath="{.spec.ports[0].nodePort}" services portainer)
export NODE_IP=$(kubectl get nodes --namespace portainer -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
I examined the PV created by the deployment and saw, as expected, a nodeSelector:
> kubectl get pv -o yaml
<snip>
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- kind-worker
"Boom!", I said. "There's no problem, because Kubernetes won't let the pod run on a different node, due to the nodeSelector".
Not so fast!
"Try microk8s", the boss said, "it happens all the time..."
So I did. Grumbling about how much harder it is to setup a multi-node microk8s environment, I used Multipass to create 2 Ubuntu 20.04 VMs, and then followed the instructions re setting up a microk8s cluster.
Sure enough, as it turns out, when I examined the microk8s PV, there was no nodeSelector. Microk8s, it turns out, uses a simple hostPath-type provisioner!
Where's my data?
So this presents a problem for any application deployed on a multi-node microk8s cluster, as well as any other cluster using a hostPath-based storage provisioner. We came up with what I think is an elegant solution though..
This command will return the current node of a pod (provided that pod has been scheduled):
kubectl get pods <podname> -o jsonpath='{ ..nodeName }'
And this command will patch a deployment, adding a nodeSelector:
kubectl patch deployments <deploymentname> -p '{"spec": {"template": {"spec": {"nodeSelector": {"kubernetes.io/hostname": "<nodename>"}}}}}'
Combined, we get this neat little command, a variation which is now featured on the Portainer install docs:
kubectl patch deployment -n $NAMESPACE $DEPLOYMENT -p '{"spec": {"template": {"spec": {"nodeSelector": {"kubernetes.io/hostname": "'$(kubectl get pods -n $NAMESPACE -o jsonpath='{ ..nodeName }')'"}}}}}' || (echo Failed to identify current node of $DEPLOYMENT pod; exit 1)
It should be noted that pinning a pod to a node obviously reduces resiliency in the event that a node fails, and something like this shouldn't be attempted seriously in production. If you're using microk8s though, you're probably not in serious production, so go wild!
BTW, this is what I do, all day, every day. I enjoy it, and I'm good at it. If this sort of stuff is what you need, I'd be interested to work with you.