Horizontal Pod Autoscaler Walkthrough

Install Metrics server by running the following commands:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Installation output:

serviceaccount/metrics-server created<br>clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created<br>clusterrole.rbac.authorization.k8s.io/system:metrics-server created<br>rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created<br>clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created<br>clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created<br>service/metrics-server created<br>deployment.apps/metrics-server created<br>apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

Check metrics server deployment in yaml output:

kubectl -n kube-system get deployment metrics-server -o yaml

We will patch the deployment to have the following settings:

Add –kubelet-insecure-tls argument to containers args – used to skip verifying Kubelet CA certificates.
Change the container port from 10250 to port 4443
Add hostNetwork: true¹

kubectl patch deployment metrics-server -n kube-system --type='json' -p='[
{
"op": "add",
"path": "/spec/template/spec/hostNetwork",
"value": true
},
{
"op": "replace",
"path": "/spec/template/spec/containers/0/args",
"value": [
"--cert-dir=/tmp",
"--secure-port=4443",
"--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname",
"--kubelet-use-node-status-port",
"--metric-resolution=15s",
"--kubelet-insecure-tls"
]
},
{
"op": "replace",
"path": "/spec/template/spec/containers/0/ports/0/containerPort",
"value": 4443
}
]'

Check update output:

deployment.apps/metrics-server patched

After a few seconds the pod status should be running and active:

kubectl -n kube-system get pods -l k8s-app=metrics-server
NAME                              READY   STATUS    RESTARTS   AGE
metrics-server-5478cc86f5-2h247   1/1     Running   0          2m25s

Check the metrics API status:

kubectl get apiservices -l k8s-app=metrics-server
NAME                     SERVICE                      AVAILABLE   AGE
v1beta1.metrics.k8s.io   kube-system/metrics-server   True        79m

Test if the metrics server is running by checking the kubernetes nodes utilization

kubectl top nodes
NAME                    CPU(cores)   CPU(%)   MEMORY(bytes)   MEMORY(%)
desktop-control-plane   192m         0%       584Mi           3%
desktop-worker          26m          0%       130Mi           0%
desktop-worker2         34m          0%       159Mi           1%

Run and expose php-apache server

kubectl apply -f https://k8s.io/examples/application/php-apache.yaml
deployment.apps/php-apache created
service/php-apache created

Create the HorizontalPodAutoscaler

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
Flag --cpu-percent has been deprecated, Use --cpu with percentage or resource quantity format (e.g., '70%' for utilization or '500m' for milliCPU).
horizontalpodautoscaler.autoscaling/php-apache autoscaled

You can check the current status of the newly created Horizontal Pod Autoscaler, by running:

(You can use “hpa” or “horizontalpodautoscaler”; either name works OK.)

kubectl get hpa
NAME         REFERENCE               TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   cpu: 0%/50%   1         10        1          48s

If you see other HorizontalPodAutoscalers with different names, it means they already existed and this is usually not an issue.

Please note that the current CPU utilization is 0% because no clients are sending requests to the server. The TARGET column reflects the average CPU usage across all Pods managed by the corresponding deployment.

Let’s increase the load

Next, observe how the autoscaler responds to increased load. To do this, you will start a separate Pod that acts as a client. The container inside this client Pod runs an infinite loop, continuously sending requests to the php-apache service.

Note: It’s better to run this in a separate terminal so that the load generation continues and you can carry on with the rest of the steps

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
All commands and output from this session will be recorded in container logs, including credentials and sensitive information passed through the command prompt.
If you don't see a command prompt, try pressing enter.
OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK

Now run: (type Ctrl+C to end the watch when you’re ready)

kubectl get hpa php-apache --watch
NAME         REFERENCE               TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   cpu: 129%/50%   1         10        5          18m

Within a minute or so, you should see the higher CPU load; for example:

NAME         REFERENCE               TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   cpu: 0%/50%   1         10        1          113s

and then, more replicas. see example:

NAME         REFERENCE               TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   cpu: 129%/50%   1         10        5          18m
php-apache   Deployment/php-apache   cpu: 59%/50%    1         10        5          18m
php-apache   Deployment/php-apache   cpu: 66%/50%    1         10        5          18m
php-apache   Deployment/php-apache   cpu: 59%/50%    1         10        7          19m

Here, CPU consumption has increased to 305% of the request. As a result, the Deployment was resized to 7 replicas
and also, you should see the replica count matching the figure from the HorizontalPodAutoscaler

kubectl get deployment php-apache
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
php-apache   7/7     7            7           23m

Note:
It may take a few minutes for the number of replicas to stabilize. Because the load is not explicitly controlled, the final replica count may differ from the example shown.

Stop generating load
To complete the example, stop sending traffic to the service.

In the terminal where you started the Pod running the BusyBox image, stop the load generation by pressing Ctrl + C.

Then, after about a minute, verify the resulting state:

kubectl get hpa php-apache --watch
NAME         REFERENCE               TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   cpu: 0%/50%   1         10        1          45m

and the Deployment also shows that it has scaled down:

kubectl get deployment php-apache
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
php-apache   1/1     1            1           50m

Once CPU utilization dropped to 0, the HPA automatically scaled the replicas back down to 1. Please note that autoscaling adjustments may take a few minutes to complete.

Clean Up Resources

Once you have finished the demo, delete the resources:

kubectl delete deployment php-apache
kubectl delete service php-apache
kubectl delete hpa php-apache

Leave a Reply Cancel reply