- Update container images with Copa - Mon, Nov 27 2023
- Deploying stateful applications with Kubernetes StatefulSets - Wed, Nov 1 2023
- Install and enable IIS Manager for Remote Administration - Thu, Oct 26 2023
Ensure the Metrics Server is running
To use the HorizontalPodAutoscaler, you need to deploy the Metrics Server in your Kubernetes cluster. The Metrics Server collects resource metrics from the kubelets running on nodes and exposes them through the Kubernetes API. To determine whether the Metrics Server is working as expected, run this command:
kubectl top nodes
If you see the resource metrics, as shown in the screenshot, the Metrics Server is up and running. If you see an error, make sure the Metrics Server is configured correctly.
Create a Deployment
For the demo, let's create a deployment with this YAML configuration:
apiVersion: apps/v1 kind: Deployment metadata: name: webapp spec: selector: matchLabels: app: webapp replicas: 1 template: metadata: labels: app: webapp spec: containers: - name: webapp image: harbor.testlab.local/library/webapp:latest ports: - containerPort: 80 resources: limits: cpu: 500m requests: cpu: 200m --- apiVersion: v1 kind: Service metadata: name: webapp-svc spec: ports: - port: 80 selector: app: webapp
This YAML file creates a Kubernetes Deployment and Service. You may also use your existing deployment. To learn more about resource requests and limits, check out this post.
Here, I used a custom container image that executes a PHP script that performs a CPU-intensive operation. You can use the registry.k8s.io/hpa-example image from the public Kubernetes registry, which does a similar thing. A custom container image is necessary because we need to simulate an application that consumes many CPU cycles when we put some load (traffic) on it. The idea of the HorizontalPodAutoscaler is to automatically scale a deployment that runs a containerized business application. So, if you have a containerized app handy, you can use that image here.
Now, apply the YAML file to create the deployment and service.
kubectl apply -f webapp.yaml
The deployment and service are now ready.
kubectl get deploy webapp kubectl get svc webapp-svc
Create a HorizontalPodAutoscaler
The next step is to create a HorizontalPodAutoscaler resource in the cluster. To do so, create a YAML file with this configuration:
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: webapp spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: webapp minReplicas: 1 maxReplicas: 10 metrics: - resource: name: cpu target: averageUtilization: 50 type: Utilization type: Resource
Let's briefly discuss each parameter:
- apiVersion—Specifies the resource's group and the API version.
- kind—Specifies the kind of resource (HorizontalPodAutoscaler, in this case).
- metadata: name—Defines the resource name (webapp-hpa, in this case).
- scaleTargetRef—Specifies the reference to a Deployment (webapp) that will be scaled using this HorizontalPodAutoscaler.
- minReplicas—Specifies the minimum number of Pods that the HorizontalPodAutoscaler will maintain.
- maxReplicas—Specifies the maximum number of Pods that the HorizontalPodAutoscaler can create when resource utilization is high.
- metrics—An array that allows you to specify multiple metrics for scaling. Let's look at each field in more detail:
- type—We set this field to Resource since we are using the Pod's resource requests and limits (CPU or memory) as metrics. Other possible values are Pods, Object, or External.
- resource—Specifies a resource metric source. It has the following subfields:
- name—Indicates the name of the resource to be measured, such as CPU or memory.
- target—Indicates the desired value of the resource metric. It has the following subfields:
- type—Specifies the type of the target value. Possible values are: Utilization, AverageValue, or Value.
- averageUtilization—Specifies the target value as a percentage of the requested resource value. For example, if the target type is Utilization and the averageUtilization is 50, then the Kube controller will try to maintain the average resource utilization across all Pods at 50% of the requested value.
- averageValue—Specifies the target value as a raw value per Pod. For example, if the target type is AverageValue and the averageValue is 500m, then the Kube controller will try to maintain the average resource value across all Pods at 500 millicores.
- value—Specifies the target value as a raw value for the whole resource. For example, if the target type is Value and the value is 1 GiB, then the Kube controller will try to maintain the total resource value across all Pods at 1 GiB.
In a nutshell, the entire metrics section that you see in the screenshot above defines an average CPU utilization of 50% across all the Pods of a Kubernetes deployment. When the average CPU utilization goes above this threshold, the HorizontalPodAutoscaler will increase the number of Pods and continue to do so to keep the average CPU utilization under 50%. The maximum number of Pods it can create is limited by the minReplicas field, which is currently set to 10.
This YAML file will create a HorizontalPodAutoscaler with the declarative approach. Alternatively, you can use the following imperative command to create a HorizontalPodAutoscaler:
kubectl autoscale deployment webapp --cpu-percent=50 --min=1 --max=10
Since we are using a declarative approach, let's apply the YAML file with this command:
kubectl apply -f webapp-hpa.yaml
To view the HorizontalPodAutoscaler, run this command:
kubectl get hpa webapp-hpa
This command shows the Name, Reference, Targets, MinPods, MaxPods, Replicas, and Age of HorizontalPodAutoscaler resource. The Targets column indicates the utilization percentage in the <current>/<target> format. The utilization percentage is calculated by dividing the current resource utilization of Pods by the target resource utilization specified in the HorizontalPodAutoscaler object. For example, if the current CPU utilization of Pods is 60% and the target CPU utilization is 50%, then the current CPU utilization percentage is 60/50 = 120%, which means that the Pods are using more CPU than the desired level. So the HorizontalPodAutoscaler will try to scale up the number of replicas (Pods) to stabilize the deployment and bring the CPU utilization below the desired level, which is currently set to 50%.
Generate some load
Now that the HorizontalPodAutoscaler resource is ready, it is time to put some load on the application, which will cause CPU utilization to go up. To do so, run a temporary Pod with this command:
kubectl run -it load-generator --rm --image=busybox -- /bin/sh -c "while sleep 0.01; do wget -qO - http://webapp-svc; done"
This command creates a temporary Pod and runs an infinite while loop to send a wget request to the containerized application exposed by the service. The application performs some arithmetic computation and prints DONE when finished. When called in a loop, it causes CPU utilization to drastically rise. Essentially, we are simulating a high CPU utilization scenario on a business application so that the HorizontalPodAutoscaler can increase the Pods to meet demand.
I will now split my terminal window and open another SSH session to the control plane node. In the first window, I will run the kubectl get webapp-hpa command with the --watch flag and in the other window, I will run the load generator command.
kubectl get hpa webapp-hpa --watch

Watching the HorizontalPodAutoscaler autoscale a Kubernetes deployment up when the load is increased
The screenshot shows that the number of Pods increased from 1 (minReplicas) to 7 to bring the target CPU utilization under 50%. When the load-generator Pod is terminated (by pressing Ctrl + C), the number of replicas decreases because CPU utilization goes back to normal.

Watching the HorizontalPodAutoscaler autoscale a Kubernetes deployment down when the load is decreased
When the load is decreased, the HorizontalPodAutoscaler will wait a certain amount of time (known as the cooldown delay) before scaling the Pods down, which prevents the app from scaling up and down too frequently. The default cooldown delay is 5 minutes, but it can be adjusted by passing the --horizontal-pod-autoscaler-downscale-stabilization flag to the kube-controller-manager.
Subscribe to 4sysops newsletter!
Conclusion
In this post, you learned that a Horizontal Pod Autoscaler automatically adjusts the number of pods in a deployment or replica set based on observed metrics, such as CPU utilization or other select metric values. This is an important Kubernetes feature because it ensures optimal resource utilization and application availability by automatically scaling pods based on real-time workloads.
Read the latest IT news and community updates!
Join our IT community and read articles without ads!
Do you want to write for 4sysops? We are looking for new authors.