- Kubernetes logs: Pod logs, container logs, Docker logs, kubelet logs, and master node logs - Mon, Sep 25 2023
- Kubernetes DaemonSets - Wed, Sep 6 2023
- Static Pods in Kubernetes - Fri, Sep 1 2023
Configuring node labels
Before directly jumping to the node selector, let's first understand node labels. Like most other objects in Kubernetes, labels are assigned to nodes, which helps in finding and selecting nodes based on particular criteria. There are well-known (or default) labels that are automatically assigned to Kubernetes nodes in a cluster, and you can also add labels to nodes manually. The labels are set as key–value pairs, as shown in the command below:
kubectl label nodes <node-name> size=<node value>
I have two worker nodes in my Kubernetes cluster, and the kube-srv3 node has higher resources (4 CPUs and 4 GiB memory) compared to the kube-srv2 node. With the above command, I set a size=large label on the kube-srv3 node. Similarly, I will use the kubectl label nodes kube-srv2.testlab.local size=small command to label the other node as small. You will learn the importance of these labels in the next section. To view the labels, you could use the kubectl get nodes or the kubectl describe nodes command, as shown below:
kubectl get nodes <node-name> --show-labels
The screenshot shows how you can view well-known and manually assigned labels on a Kubernetes node. By the way, the kubectl describe nodes command also shows the node capacity, as shown in the screenshot below:
Configuring node selectors
A node selector controls the Pod placement based on the node labels. When you define a Pod, you can specify a node selector in its specification. This gives you much better control of running certain Pods on specific cluster nodes. For example, imagine that your Pod requires higher resources. It makes sense to schedule it on a node labeled large (kube-srv3, in this case). To achieve this, you will define a Pod definition file, as shown below:
apiVersion: v1 kind: Pod metadata: name: webapp-pod spec: containers: - image: nginx name: webapp-pod nodeSelector: size: large
Here, we used nodeSelector, which allows you to specify the node labels as key–value pairs. The Pod will be scheduled on the node(s) having a size=large label. Remember, you can have multiple cluster nodes with the same label, so the node selector allows the Pod to be scheduled on one of those nodes. Alternatively, if you prefer to tie your Pod to a particular node, you can specify the nodeName: kube-srv3.testlab.local under the spec section of the Pod definition file.
Let's create the Pod and find out which node it is scheduled on.
kubectl create -f webapp-pod.yaml kubectl get pods –o wide
The screenshot shows that the Pod is running on the kube-srv3 node, which is what we needed so far.
Now, let's assume that a new worker node (kube-srv4), having enough capacity to run our Pod, is added to the cluster. I will assign the size=medium label to this new node. Since there are two nodes with enough capacity, you want to schedule the Pod on either node (kube-srv3 or kube-srv4). Unfortunately, you cannot simply achieve this with a node selector, since it doesn't support complex rules. This is where node affinity comes into the picture.
Configuring node affinity
Node affinity is a feature that gives you more control over scheduling Pods based on multiple characteristics of nodes in a Kubernetes cluster. Node affinity allows you to specify more advanced rules to ensure that the Pods are scheduled on specific nodes that match certain criteria. You can set rules based on node labels or other node attributes (such as CPU, memory, GPU, etc.), including Pod labels. The use of Pod labels allows you to colocate similar Pods on a particular node.
Two types of node affinity exist:
- requiredDuringSchedulingIgnoredDuringExecution: This is a node affinity type that specifies the hard constraints indicating that the Kube scheduler should schedule the Pod on a node that satisfies the defined rules. If no nodes meet the specified requirements, the Pod will not be scheduled and will remain in pending status. That is why it is named requiredDuringScheduling.
- preferredDuringSchedulingIgnoredDuringExecution: This is a node affinity type that specifies relatively soft constraints, indicating that the Kube scheduler will try to find the nodes meeting the specified requirements. If no matching node is found, the Pod is still scheduled anyway. That is why it is named preferredDuringScheduling.
If you are wondering about the last part (i.e., IgnoredDuringExecution), it essentially means that the node affinity is only considered during the Pod scheduling. If someone changes the node labels after a Pod is scheduled, it will not be affected, and the Pod will continue to run.
Now that you understand node affinity, let's understand how to configure it on a Pod. We will use the same Pod specification file, but instead of using a node selector, we will define a node affinity rule.
apiVersion: v1 kind: Pod metadata: name: webapp-pod spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: size operator: In values: - large - medium containers: - image: nginx name: webapp-pod
The screenshot shows that I used the requiredDuringSchedulingIgnoredDuringExecution node affinity type and nodeSelectorTerms. The nodeSelectorTerms field is an array of one or more NodeSelectorTerm objects, and each NodeSelectorTerm contains a list of node selector expressions to match against the node labels. The NodeSelectorTerm can have one or more matchExpressions or matchFields.
- matchExpressions is a label selector used to specify a set of key–value pairs with the help of operators such as In, NotIn, Exists, DoesNotExist, Gt, Lt, etc. These expressions allow you to match nodes based on the node labels.
- matchFields is a keyword used to match the nodes based on particular node fields, such as the node name.
In a nutshell, the expression we used above evaluates the size node label with multiple values (large or medium). Remember, the operator plays an important role here. The In operator requires the label's value to match one of the specified values (large or medium). When you schedule a Pod, the Kube scheduler evaluates the nodeSelectorTerms field. If it finds a matching node, the Pod will be scheduled on that node; otherwise, it will remain unscheduled. You could also use other operators to define custom rules, depending on your requirements.
Note that the NotIn and DoesNotExist operators cause the opposite behavior of node affinity, known as anti-affinity. This prevents Pod scheduling on a particular node. For example, if we use the NotIn operator in our case, the Pod won't be scheduled on a large or medium node. Anti-affinity gives similar results as taints and tolerations.
Let's create a new Pod using the updated configuration file. Remember, a Pod from the node selector section is already running on the kube-srv3 node.
kubectl apply -f webapp-pod.yaml kubectl get pods -o wide
You can see that the new Pod is scheduled on the kube-srv4 worker node. Even if we delete the old Pod and recreate it, it will be scheduled on either the kube-srv3 node or the kube-srv4 node, but not on the kube-srv2 node due to the node affinity rule we defined.
The screenshot shows that when the Pod is recreated, it is scheduled on the desired nodes only.
Conclusion
I hope this post gives you an understanding of node selectors and node affinity. The node selector is the simplest method for controlling the Pod placement on nodes. Node affinity, on the other hand, is a more complex yet powerful feature that helps with workload distribution by scheduling the Pods on certain nodes or preventing them from being placed on nodes based on a defined set of rules. In the next post, we will discuss ConfigMaps in Kubernetes.