A HorizontalPodAutoscaler (HPA) automatically updates a workload resource (like a Deployment) to match demand.

The HPA is a core component of Kubernetes that allows applications to respond to traffic spikes without manual intervention and to save resources during quiet periods by scaling back down. In this walkthrough, we will deploy a test application and then apply a load to watch the HPA automatically increase and decrease the number of running pods.

Prerequisites

This lab requires a running K3s cluster managed by Rancher, as configured in a previous lab

Based on https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/

For CPU-based autoscaling to work, our pods must declare how much CPU they request. This allows the HPA to calculate utilization as a percentage of the requested amount. We will deploy a sample php-apache application and configure its resources using the Rancher UI.

Create a Namespace

Create the Deployment

Set Resource Requests and Limits

Click Create

You now have a deployment with one pod and an internal service to access it.

Now, let's create the HPA resource that will watch our php-apache deployment.

Navigate to HPAs

Create the HPA

Configure the Metric

This configures the HPA to scale up when the average CPU usage across all pods exceeds 50% of their requested 200m (i.e., 100m).

Configure the behavior

This will make the HPA more reactive, so we wait less during testing

Click Create.

Check the HPA Status

To trigger the autoscaler, we need to send traffic to our php-apache service. We'll run a simple busybox pod in a continuous loop to generate this load.

Open the Kubectl Shell

Run the Load Generator

$ kubectl run -n hpa-demo -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

It will be automatically removed (--rm) when you stop the command.

Watch the Deployment Scale

Task Progress Check

Take a screenshot that shows two things side-by-side:

  1. The HPA detail page showing the high CPU usage and the replica count scaled up (e.g., 7 of 10).
  2. The php-apache Pods page showing the corresponding number of running pods.

Upload the screenshot to complete the lab.

Now we'll stop the load and watch the HPA scale the application back down to its minimum size.

Stop the Load Generator

Observe the Scale-Down

To remove all the resources created in this lab you can delete its namespace:

  1. Navigate to Cluster -> Projects/Namespaces.
  2. Find the hpa-demo namespace, click the three-dot menu on the right, and select Delete.
  3. Confirm the deletion. This will remove the namespace and everything inside it (Deployment, Service, HPA).