In my previous post in this Kubernetes series, you learned how to install the Kubernetes Metrics Server. Today, I will introduce the HorizontalPodAutoscaler. Both features work together in the Kubernetes ecosystem to facilitate the automatic scaling of pods based on observed system or custom metrics. A HorizontalPodAutoscaler automatically scales the number of Pods in a workload resource (such as a Deployment or a StatefulSet) based on observed metrics, such as CPU or memory utilization. It helps to improve the performance and efficiency of the application by adjusting the number of Pods to match the current demand. The kube-controller-manager (a control plane component) is responsible for running the HorizontalPodAutoscaler in a Kubernetes cluster.
Avatar

Ensure the Metrics Server is running

To use the HorizontalPodAutoscaler, you need to deploy the Metrics Server in your Kubernetes cluster. The Metrics Server collects resource metrics from the kubelets running on nodes and exposes them through the Kubernetes API. To determine whether the Metrics Server is working as expected, run this command:

kubectl top nodes
Making sure the Metrics Server is up and running in the Kubernetes cluster

Making sure the Metrics Server is up and running in the Kubernetes cluster

If you see the resource metrics, as shown in the screenshot, the Metrics Server is up and running. If you see an error, make sure the Metrics Server is configured correctly.

Create a Deployment

For the demo, let's create a deployment with this YAML configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
selector:
matchLabels:
app: webapp
replicas: 1
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: webapp
image: harbor.testlab.local/library/webapp:latest
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: webapp-svc
spec:
ports:
- port: 80
selector:
app: webapp
Create a YAML file for a Kubernetes Deployment and Service

Create a YAML file for a Kubernetes Deployment and Service

This YAML file creates a Kubernetes Deployment and Service. You may also use your existing deployment. To learn more about resource requests and limits, check out this post.

Here, I used a custom container image that executes a PHP script that performs a CPU-intensive operation. You can use the registry.k8s.io/hpa-example image from the public Kubernetes registry, which does a similar thing. A custom container image is necessary because we need to simulate an application that consumes many CPU cycles when we put some load (traffic) on it. The idea of the HorizontalPodAutoscaler is to automatically scale a deployment that runs a containerized business application. So, if you have a containerized app handy, you can use that image here.

Now, apply the YAML file to create the deployment and service.

kubectl apply -f webapp.yaml
Create a Kubernetes Deployment and Service

Create a Kubernetes Deployment and Service

The deployment and service are now ready.

kubectl get deploy webapp
kubectl get svc webapp-svc
View the Kubernetes Deployment and Service

View the Kubernetes Deployment and Service

Create a HorizontalPodAutoscaler

The next step is to create a HorizontalPodAutoscaler resource in the cluster. To do so, create a YAML file with this configuration:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webapp
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 1
maxReplicas: 10
metrics:
- resource:
name: cpu
target:
averageUtilization: 50
type: Utilization
type: Resource
Create a YAML file for the HorizontalPodAutoscaler resource in Kubernetes

Create a YAML file for the HorizontalPodAutoscaler resource in Kubernetes

Let's briefly discuss each parameter:

  • apiVersion—Specifies the resource's group and the API version.
  • kind—Specifies the kind of resource (HorizontalPodAutoscaler, in this case).
  • metadata: name—Defines the resource name (webapp-hpa, in this case).
  • scaleTargetRef—Specifies the reference to a Deployment (webapp) that will be scaled using this HorizontalPodAutoscaler.
  • minReplicas—Specifies the minimum number of Pods that the HorizontalPodAutoscaler will maintain.
  • maxReplicas—Specifies the maximum number of Pods that the HorizontalPodAutoscaler can create when resource utilization is high.
  • metrics—An array that allows you to specify multiple metrics for scaling. Let's look at each field in more detail:
    • type—We set this field to Resource since we are using the Pod's resource requests and limits (CPU or memory) as metrics. Other possible values are Pods, Object, or External.
    • resource—Specifies a resource metric source. It has the following subfields:
      • name—Indicates the name of the resource to be measured, such as CPU or memory.
      • target—Indicates the desired value of the resource metric. It has the following subfields:
        • type—Specifies the type of the target value. Possible values are: Utilization, AverageValue, or Value.
        • averageUtilization—Specifies the target value as a percentage of the requested resource value. For example, if the target type is Utilization and the averageUtilization is 50, then the Kube controller will try to maintain the average resource utilization across all Pods at 50% of the requested value.
        • averageValue—Specifies the target value as a raw value per Pod. For example, if the target type is AverageValue and the averageValue is 500m, then the Kube controller will try to maintain the average resource value across all Pods at 500 millicores.
        • value—Specifies the target value as a raw value for the whole resource. For example, if the target type is Value and the value is 1 GiB, then the Kube controller will try to maintain the total resource value across all Pods at 1 GiB.

In a nutshell, the entire metrics section that you see in the screenshot above defines an average CPU utilization of 50% across all the Pods of a Kubernetes deployment. When the average CPU utilization goes above this threshold, the HorizontalPodAutoscaler will increase the number of Pods and continue to do so to keep the average CPU utilization under 50%. The maximum number of Pods it can create is limited by the minReplicas field, which is currently set to 10.

This YAML file will create a HorizontalPodAutoscaler with the declarative approach. Alternatively, you can use the following imperative command to create a HorizontalPodAutoscaler:

kubectl autoscale deployment webapp --cpu-percent=50 --min=1 --max=10

Since we are using a declarative approach, let's apply the YAML file with this command:

kubectl apply -f webapp-hpa.yaml
Create the HorizontalPodAutoscaler in Kubernetes

Create the HorizontalPodAutoscaler in Kubernetes

To view the HorizontalPodAutoscaler, run this command:

kubectl get hpa webapp-hpa
View the HorizontalPodAutoscaler in Kubernetes

View the HorizontalPodAutoscaler in Kubernetes

This command shows the Name, Reference, Targets, MinPods, MaxPods, Replicas, and Age of HorizontalPodAutoscaler resource. The Targets column indicates the utilization percentage in the <current>/<target> format. The utilization percentage is calculated by dividing the current resource utilization of Pods by the target resource utilization specified in the HorizontalPodAutoscaler object. For example, if the current CPU utilization of Pods is 60% and the target CPU utilization is 50%, then the current CPU utilization percentage is 60/50 = 120%, which means that the Pods are using more CPU than the desired level. So the HorizontalPodAutoscaler will try to scale up the number of replicas (Pods) to stabilize the deployment and bring the CPU utilization below the desired level, which is currently set to 50%.

Generate some load

Now that the HorizontalPodAutoscaler resource is ready, it is time to put some load on the application, which will cause CPU utilization to go up. To do so, run a temporary Pod with this command:

kubectl run -it load-generator --rm --image=busybox -- /bin/sh -c "while sleep 0.01; do wget -qO - http://webapp-svc; done"

This command creates a temporary Pod and runs an infinite while loop to send a wget request to the containerized application exposed by the service. The application performs some arithmetic computation and prints DONE when finished. When called in a loop, it causes CPU utilization to drastically rise. Essentially, we are simulating a high CPU utilization scenario on a business application so that the HorizontalPodAutoscaler can increase the Pods to meet demand.

I will now split my terminal window and open another SSH session to the control plane node. In the first window, I will run the kubectl get webapp-hpa command with the --watch flag and in the other window, I will run the load generator command.

kubectl get hpa webapp-hpa --watch
Watching the HorizontalPodAutoscaler autoscale a Kubernetes deployment up when the load is increased

Watching the HorizontalPodAutoscaler autoscale a Kubernetes deployment up when the load is increased

The screenshot shows that the number of Pods increased from 1 (minReplicas) to 7 to bring the target CPU utilization under 50%. When the load-generator Pod is terminated (by pressing Ctrl + C), the number of replicas decreases because CPU utilization goes back to normal.

Watching the HorizontalPodAutoscaler autoscale a Kubernetes deployment down when the load is decreased

Watching the HorizontalPodAutoscaler autoscale a Kubernetes deployment down when the load is decreased

When the load is decreased, the HorizontalPodAutoscaler will wait a certain amount of time (known as the cooldown delay) before scaling the Pods down, which prevents the app from scaling up and down too frequently. The default cooldown delay is 5 minutes, but it can be adjusted by passing the --horizontal-pod-autoscaler-downscale-stabilization flag to the kube-controller-manager.

Subscribe to 4sysops newsletter!

Conclusion

In this post, you learned that a Horizontal Pod Autoscaler automatically adjusts the number of pods in a deployment or replica set based on observed metrics, such as CPU utilization or other select metric values. This is an important Kubernetes feature because it ensures optimal resource utilization and application availability by automatically scaling pods based on real-time workloads.

0 Comments

Leave a reply

Your email address will not be published. Required fields are marked *

*

© 4sysops 2006 - 2023

CONTACT US

Please ask IT administration questions in the forums. Any other messages are welcome.

Sending

Log in with your credentials

or    

Forgot your details?

Create Account