I’ve noticed that questions about scaling workloads in Kubernetes remain consistently high and don’t decrease over time. Many users struggle with scaling their workloads effectively and don’t know the optimal approach. While many rely on the default Horizontal Pod Autoscaler (HPA), which has its limitations, this often leads to the creation of custom scripts and tools.
I’d like to introduce you to KEDA (Kubernetes Event Driven Autoscaler), an excellent tool for scaling workloads in Kubernetes. KEDA is a lightweight component that works alongside HPA and significantly extends its capabilities.
How It Works
KEDA monitors your workloads and scales them based on external events. It essentially controls HPA and determines when and how to scale your workloads. For example, you can easily scale deployments based on the number of messages in a queue, or KEDA can fetch data from your application and use these metrics for scaling. While HPA only works with Deployments and StatefulSets, KEDA can even scale Jobs.
How to Install
KEDA is available as a Helm chart and can be installed on any Kubernetes cluster:
- To deploy KEDA using Helm, first add the official KEDA Helm repository:
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
- Install KEDA by running:
helm install keda kedacore/keda --namespace keda --create-namespace
That’s it. Now we are ready to scale our first worload.
See additional options in the official documentation.
How to Use
Deployments
Let’s create a simple workload and scale it with KEDA:
apiVersion: v1
kind: Namespace
metadata:
creationTimestamp: null
name: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: nginx
spec:
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginxinc/nginx-unprivileged:1.26.3-alpine3.20
ports:
- containerPort: 8080
resources:
requests:
cpu: 50m
memory: 15M
Now, let’s create a ScaledObject:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: nginx-hpa
namespace: nginx
labels:
scaler.keda: "true"
spec:
scaleTargetRef:
name: nginx
envSourceContainerName: nginx
pollingInterval: 15
cooldownPeriod: 30
minReplicaCount: 2
maxReplicaCount: 10
fallback:
failureThreshold: 3
replicas: 6
advanced:
restoreToOriginalReplicaCount: false
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 5
periodSeconds: 30
triggers:
- type: cpu
metricType: Utilization # Allowed types are 'Utilization' or 'AverageValue'
metadata:
value: "90"
containerName: "nginx" # Optional. You can use this to target a specific container in a pod
- type: metrics-api
useCachedMetrics: true
metadata:
targetValue: "1"
url: http://my-app:8080/scaler/
valueLocation: "nginx"
- type: cron
metadata:
# Required
timezone: Europe/Kyiv # The acceptable values would be a value from the IANA Time Zone Database.
start: 30 * * * * # Every hour on the 30th minute
end: 45 * * * * # Every hour on the 45th minute
desiredReplicas: "10"
In this example, we have three triggers. The first uses default CPU utilization metrics from metrics-server. It’s configured to add additional replicas when CPU utilization exceeds 90%. Moreover, we’ve specified the container name to target only the nginx container’s CPU usage, which is useful when you have multiple containers in your pod.
The second trigger uses custom metrics from our application. It’s configured to add additional replicas based on the value returned by our application. In this case, we’ll add the same number of replicas as the value returned by our application. So if it returns 10, we’ll add 10 replicas, because the targetValue is 1. But not more than maxReplicaCount value.
The third trigger uses a cron schedule. It’s configured to set desired replicas to 10 between minutes 30 and 45 of every hour.
Jobs
Let’s create a simple job and scale it with KEDA:
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: helloworld
namespace: job
labels:
scaler.keda: "true"
spec:
jobTargetRef:
template:
spec:
restartPolicy: Never
containers:
- name: helloworld
image: busybox:latest
command:
- sleep
- "30"
backoffLimit: 1
ttlSecondsAfterFinished: 60
pollingInterval: 15
minReplicaCount: 0
maxReplicaCount: 1
successfulJobsHistoryLimit: 0
failedJobsHistoryLimit: 0
triggers:
- type: metrics-api
metadata:
targetValue: "1"
url: http://my-app:8080/scaler/
valueLocation: "job"
In this example, KEDA will create a job when the value returned by our application exceeds 1. Here we create only one job regardless of the value returned by our application. We can control the number of jobs by changing the maxReplicaCount parameter.
Pausing
KEDA allows you to pause scaling for a specific workload. You can pause scaling by setting the paused
field to true
in the ScaledObject or ScaledJob spec.
For ScaledJob: Pause: Add annotation:
metadata:
annotations:
autoscaling.keda.sh/paused: true
Or (if we have label scaler.keda=true on our ScaledJob):
kubectl -n job annotate -l scaler.keda=true sj autoscaling.keda.sh/paused=true
Unpause: Set paused to false or remove annotation:
metadata:
annotations:
autoscaling.keda.sh/paused: false
Or via kubectl:
kubectl -n job annotate -l scaler.keda=true sj autoscaling.keda.sh/paused=false
# OR
kubectl -n job annotate -l scaler.keda=true sj autoscaling.keda.sh/paused-
For ScaledObject: Pause: Patch annotation:
metadata:
annotations:
autoscaling.keda.sh/paused-replicas: "0"
autoscaling.keda.sh/paused: "true"
Or (if we have label scaler.keda=true on our ScaledObject):
kubectl -n nginx annotate -l scaler.keda=true so autoscaling.keda.sh/paused-replicas=0
kubectl -n nginx annotate -l scaler.keda=true so autoscaling.keda.sh/paused=true
Unpause: Set paused to false or remove annotation:
metadata:
annotations:
autoscaling.keda.sh/paused: false
Or via kubectl:
kubectl -n nginx annotate -l scaler.keda=true so autoscaling.keda.sh/paused-replicas=1
kubectl -n nginx annotate -l scaler.keda=true so autoscaling.keda.sh/paused=false
# OR
kubectl -n nginx annotate -l scaler.keda=true so autoscaling.keda.sh/paused-replicas-
kubectl -n nginx annotate -l scaler.keda=true so autoscaling.keda.sh/paused-
You can find more information about scaling deployments and jobs in the official documentation.
Scalers
There are more than 70 scalers available in KEDA. You can find the complete list in the official documentation. KEDA supports scaling from databases, queues, HTTP requests, cron jobs, and more, making it extremely powerful and flexible.
Additional Resources
- KEDA samples - repository with examples of using KEDA
- KEDA documentation - official documentation
- KEDA GitHub - source code of KEDA
- KEDA workshop - all examples from this post
- My KEDA talk - my talk about KEDA in Ukrainian :)
Conclusion
KEDA is an excellent tool that helps with scaling workloads in Kubernetes based on different events. It can also be useful from a cost-saving perspective, as you can scale your workloads based on actual demand or even time of day. For example, you can scale to zero during the night and scale up during the day.
I’ve been using KEDA in production for more than two years, and it works remarkably well. I hope you will start using it too!