- Update container images with Copa - Mon, Nov 27 2023
- Deploying stateful applications with Kubernetes StatefulSets - Wed, Nov 1 2023
- Install and enable IIS Manager for Remote Administration - Thu, Oct 26 2023
StatefulSets in Kubernetes is a workload API that oversees the deployment and scaling of a set of Pods while preserving stickiness to persistent storage and guaranteeing order and uniqueness. Through StatefulSets, each Pod receives a stable, unique identifier, maintaining predictable and orderly deployment, which is indispensable for the seamless operation of stateful applications.
Deployments vs. StatefulSets
The following table covers the differences between Deployments and StatefulSets:
|Deployments are used for deploying stateless applications.||StatefulSets are used for deploying stateful applications.|
|Pods created by Deployments are identical and interchangeable.||Pods created by StatefulSets are not interchangeable because each Pod is unique and possesses a stable identity that persists across rescheduling.|
|Pods are created in parallel with random hashes in their names.||Pods are created in a specific order with fixed names, hostnames, and ordinal indexes that can be used to access them.|
|Deployments require a service to communicate with Pods, and requests are load-balanced across multiple Pod replicas.||StatefulSets require a headless service to handle the communication between Pods without load balancing.|
|In a Deployment, all Pods share the same persistent volume claims (PVC), which means that the same persistent volume (PV) is used in all containers.||In a StatefulSet, each Pod has a different PVC, which means each Pod uses a different PV.|
|When a Deployment is deleted, Pods are deleted randomly.||When a StatefulSet is deleted, Pods are deleted in the reverse order.|
Deploy an automatic storage provisioner
Before you can start working with StatefulSets, persistent volumes must be created in your Kubernetes cluster. They can be either statically or dynamically provisioned. Static persistent volumes are manually created by admins, whereas dynamic persistent volumes are automatically created based on a StorageClass. A StorageClass is a Kubernetes resource that allows admins to define different types of storage in a Kubernetes cluster. It helps in automatic storage provisioning by allowing users to request a specific type of storage without knowing the details of the underlying storage system.
Depending on how your Kubernetes cluster is configured, you might have different StorageClasses available. StorageClasses depend on the underlying infrastructure of the Kubernetes cluster. They are often configured to use cloud-based storage solutions such as AWS EBS, Azure Disk, or GCP's Persistent Disk. When Kubernetes is deployed on a public cloud platform, default StorageClasses are often predefined to utilize the cloud platform's storage offerings.
However, since I am working with an on-prem cluster, no default StorageClass is configured. Fortunately, I have an NFS server running in my network, so I can deploy an NFS subdir external provisioner, which is an automatic provisioner that uses an existing NFS server for the dynamic provisioning of persistent volumes in the Kubernetes cluster. To install it with Helm, use the following commands:
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/ helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \ --set nfs.server=<your-nfs-server> \ --set nfs.path=<your-nfs-share>
Don't forget to specify your own NFS server and NFS share path in the second command.
Once the NFS subdir external provisioner is installed in your cluster, you will get an nfs-client Storage Class for use with StatefulSets. Without an automatic provisioner in place, admins must create persistent volumes manually and ensure that enough persistent volumes are always available in the cluster. To check the persistent volumes and persistent volume claims in the Kubernetes cluster, run this command:
kubectl get pv,pvc
The screenshot shows that there has been no persistent volume or persistent volume claim so far. They will be dynamically provisioned as we create our first StatefulSet.
Deploy a StatefulSet
The YAML configuration below defines a StatefulSet:
apiVersion: apps/v1 kind: StatefulSet metadata: name: webapp spec: replicas: 3 serviceName: webapp-svc selector: matchLabels: app: webapp template: metadata: labels: app: webapp spec: containers: - name: nginx image: nginx ports: - name: app-port protocol: TCP containerPort: 80 volumeMounts: - name: app-data mountPath: /usr/share/nginx/app_data volumeClaimTemplates: - metadata: name: app-data spec: accessModes: ["ReadWriteOnce"] storageClassName: nfs-client resources: requests: storage: 1Gi
You can see that the configuration is similar to a Deployment, except for the kind, which is set to StatefulSet. The serviceName field is used to specify the headless service. A headless service in Kubernetes is a service configuration that provides network connectivity to pods based on their DNS names rather than through a load balancer, allowing for direct pod-to-pod communication and discovery without a stable IP or a single service IP.
Furthermore, a volumeClaimTemplates field is used instead of PersistentVolumeClaim. Each Pod will use its own persistent volume claim, which is dynamically provisioned in the NFS share.
In our example, a volume claim template is created that I named app-data. ReadWriteOnce means the volume can be mounted as read-write by a single node. StorageClassName is nfs-client, which we installed earlier, and 1Gi means 1 gibibyte of storage for each persistent volume. By the way, NFS volumes also support ReadWriteMany access mode, which essentially means they can be mounted by multiple Pods with read and write permissions.
Now, apply the YAML file and watch the Pods being created:
kubectl apply -f webapp.yaml kubectl get pods -l app=webapp --watch
You can see in the screenshot that the Pods are created in sequential order. In addition, the Pod names are assigned based on the StatefulSet name and a zero-based ordinal index separated by a dash (-) character. In our example, the StatefulSet name is webapp, so Pods are named webapp-0, webapp-1, webapp-2, and so on.
If you repeat the kubectl get pv,pvc command once again, you will notice three persistent volume claims bound to separate persistent volumes that are dynamically provisioned by the external provisioner.
Notice that persistent volume claims are named based on the volumeClaimTemplates name and Pod name (e.g., app-data-webapp-0, app-data-webapp-0, app-data-webapp-0, and so on). The volume column shows the persistent volume name, which is claimed (bound) by the respective persistent volume claim.
Verify sticky identity
As stated earlier, the identity of Pods managed by a StatefulSet persists across the restart or rescheduling. If a Pod is restarted or rescheduled (for any reason), the StatefulSet controller creates a new Pod with the same name. To demonstrate the sticky identity, let's delete a Pod.
kubectl delete pod webapp-0 kubectl get pods -l app=webapp
This behavior is particularly useful when you deploy a complex stateful application (e.g., a replicated instance of a MySQL database with a master–slave configuration). In this case, the master node of the MySQL cluster can be reliably predicted. Assuming the StatefulSet name is mysql, the first Pod will be named mysql-0, which can be configured as a master Pod. You can then point the slave Pods (replicas) to the stable name of the master Pod that doesn't change, even if it is rescheduled.
View a StatefulSet
To view the StatefulSet, you can run the kubectl get statefulset and kubectl describe statefulset commands, as shown below:
kubectl get statefulsets -o wide kubectl describe sts <sts-name>
The short name for StatefulSet is sts. The describe command shows the configuration of a StatefulSet in detail. You can also use the kubectl describe pod command to see the persistent volume claim and the mounted volume.
Remember, persistent volumes and persistent volume claims have a one-to-one relationship, which means that a persistent volume claim can only bind to one persistent volume, and a persistent volume can only be bound by one persistent volume claim. In addition, persistent volumes are not associated with any namespace. However, the persistent volume claims are namespaced, which means that a Pod and a persistent volume claim must be in the same namespace to be able to work.
Create a headless service
You cannot use a standard configuration of a Kubernetes Service to access the Pods of a StatefulSet. This is because StatefulSet Pods are unique and non-interchangeable. With a regular service, the traffic is automatically load-balanced across different Pods, which works in the case of Deployments where Pods are interchangeable. Imagine that you are running a MySQL cluster with master–slave topology, where the first Pod is working as a master and others as slaves. In this case, you need something that communicates with individual Pods directly instead of randomly choosing a single Pod. A headless service is perfect for such a scenario. Let's create webapp-svc.yaml with this configuration:
apiVersion: v1 kind: Service metadata: name: webapp-svc spec: ports: - port: 80 name: app-port clusterIP: None selector: app: webapp
Explicitly setting the value of the clusterIP field to None makes it a headless service. The spec.selector field matches the Pod labels to identify them as service endpoints. To learn more about Kubernetes services, read this post. Now, apply the webapp-svc.yaml file to create a headless service.
kubectl apply -f webapp-svc.yaml kubectl get svc
To view the service endpoints, use the kubectl get endpoints webapp-svc command or kubectl describe svc webapp-svc command, as shown below:
kubectl get pods -l app=webapp -o wide kubectl get endpoints webapp-svc
Here, you can see the IP address of all Pods under the endpoints column. To check whether your headless service is working as expected, connect to a Pod (e.g., webapp-0), and run the nslookup <headless-service-name> command. You should see the service identifiers and IP addresses of all endpoints (Pods), as shown in the screenshot below.
Remember, you cannot simply rely on the IP address of a Pod, since it changes after the restart. That is why a headless service uses the DNS name of Pods to communicate, which is a stable identifier in <pod-name>.<headless-service-name>.<namespace>.svc.cluster.local format. This also means that all Pods can ping each other using their DNS names.
Scale a StatefulSet
The StatefulSets can be scaled up and down easily, just like a Kubernetes Deployment. To manually scale a StatefulSet, you can change the replicas count in the YAML manifest and then apply the updated manifest or simply use the kubectl scale imperative command, as shown below:
kubectl scale statefulset <sts-name> --replicas=x
Previously, the webapp StatefulSet had three replicas, but after scaling it to five, you can see that two more Pods are created with the same naming scheme. If you scale it down, you will notice that the Pod with the highest ordinal index value is deleted first.
The StatefulSets can also be scaled up and down automatically with the help of HorizontalPodAutoscaler (HPA).
Delete a StatefulSet
To delete a StatefulSet, use the kubectl delete command followed by the --file (or -f) option and supply the YAML manifest, or simply use the kubectl delete sts <sts-name> command.
Subscribe to 4sysops newsletter!
kubectl delete -f webapp.yaml
Whereas Kubernetes Deployments are suitable for running stateless applications, Kubernetes StatefulSets are tailored for stateful applications. Understanding when and how to use StatefulSets is key to unlocking their full potential in Kubernetes.