I’ve noticed that questions about scaling workloads in Kubernetes remain consistently high and don’t decrease over time. Many users struggle with scaling their workloads effectively and don’t know the optimal approach. While many rely on the default Horizontal Pod Autoscaler (HPA), which has its limitations, this often leads to the creation of custom scripts and tools.

I’d like to introduce you to KEDA (Kubernetes Event Driven Autoscaler), an excellent tool for scaling workloads in Kubernetes. KEDA is a lightweight component that works alongside HPA and significantly extends its capabilities.

How It Works

KEDA monitors your workloads and scales them based on external events. It essentially controls HPA and determines when and how to scale your workloads. For example, you can easily scale deployments based on the number of messages in a queue, or KEDA can fetch data from your application and use these metrics for scaling. While HPA only works with Deployments and StatefulSets, KEDA can even scale Jobs.

How to Install

KEDA is available as a Helm chart and can be installed on any Kubernetes cluster:

  1. To deploy KEDA using Helm, first add the official KEDA Helm repository:
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
  1. Install KEDA by running:
helm install keda kedacore/keda --namespace keda --create-namespace

That’s it. Now we are ready to scale our first worload.

See additional options in the official documentation.

How to Use

Deployments

Let’s create a simple workload and scale it with KEDA:

apiVersion: v1
kind: Namespace
metadata:
  creationTimestamp: null
  name: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginxinc/nginx-unprivileged:1.26.3-alpine3.20
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 50m
              memory: 15M

Now, let’s create a ScaledObject:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: nginx-hpa
  namespace: nginx
  labels:
    scaler.keda: "true"
spec:
  scaleTargetRef:
    name: nginx
    envSourceContainerName: nginx
  pollingInterval: 15
  cooldownPeriod: 30
  minReplicaCount: 2
  maxReplicaCount: 10
  fallback:
    failureThreshold: 3
    replicas: 6
  advanced:
    restoreToOriginalReplicaCount: false
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 30
          policies:
            - type: Pods
              value: 5
              periodSeconds: 30
  triggers:
    - type: cpu
      metricType: Utilization # Allowed types are 'Utilization' or 'AverageValue'
      metadata:
        value: "90"
        containerName: "nginx" # Optional. You can use this to target a specific container in a pod
    - type: metrics-api
      useCachedMetrics: true
      metadata:
        targetValue: "1"
        url: http://my-app:8080/scaler/
        valueLocation: "nginx"
    - type: cron
      metadata:
        # Required
        timezone: Europe/Kyiv # The acceptable values would be a value from the IANA Time Zone Database.
        start: 30 * * * * # Every hour on the 30th minute
        end: 45 * * * * # Every hour on the 45th minute
        desiredReplicas: "10"

In this example, we have three triggers. The first uses default CPU utilization metrics from metrics-server. It’s configured to add additional replicas when CPU utilization exceeds 90%. Moreover, we’ve specified the container name to target only the nginx container’s CPU usage, which is useful when you have multiple containers in your pod.

The second trigger uses custom metrics from our application. It’s configured to add additional replicas based on the value returned by our application. In this case, we’ll add the same number of replicas as the value returned by our application. So if it returns 10, we’ll add 10 replicas, because the targetValue is 1. But not more than maxReplicaCount value.

The third trigger uses a cron schedule. It’s configured to set desired replicas to 10 between minutes 30 and 45 of every hour.

Jobs

Let’s create a simple job and scale it with KEDA:

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: helloworld
  namespace: job
  labels:
    scaler.keda: "true"
spec:
  jobTargetRef:
    template:
      spec:
        restartPolicy: Never
        containers:
          - name: helloworld
            image: busybox:latest
            command:
              - sleep
              - "30"
    backoffLimit: 1
    ttlSecondsAfterFinished: 60
  pollingInterval: 15
  minReplicaCount: 0
  maxReplicaCount: 1
  successfulJobsHistoryLimit: 0
  failedJobsHistoryLimit: 0
  triggers:
    - type: metrics-api
      metadata:
        targetValue: "1"
        url: http://my-app:8080/scaler/
        valueLocation: "job"

In this example, KEDA will create a job when the value returned by our application exceeds 1. Here we create only one job regardless of the value returned by our application. We can control the number of jobs by changing the maxReplicaCount parameter.

Pausing

KEDA allows you to pause scaling for a specific workload. You can pause scaling by setting the paused field to true in the ScaledObject or ScaledJob spec.

For ScaledJob: Pause: Add annotation:

metadata:
  annotations:
    autoscaling.keda.sh/paused: true

Or (if we have label scaler.keda=true on our ScaledJob):

kubectl -n job annotate -l scaler.keda=true sj autoscaling.keda.sh/paused=true

Unpause: Set paused to false or remove annotation:

metadata:
  annotations:
    autoscaling.keda.sh/paused: false

Or via kubectl:

kubectl -n job annotate -l scaler.keda=true sj autoscaling.keda.sh/paused=false
# OR
kubectl -n job annotate -l scaler.keda=true sj autoscaling.keda.sh/paused-

For ScaledObject: Pause: Patch annotation:

metadata:
  annotations:
    autoscaling.keda.sh/paused-replicas: "0"
    autoscaling.keda.sh/paused: "true"

Or (if we have label scaler.keda=true on our ScaledObject):

kubectl -n nginx annotate -l scaler.keda=true so autoscaling.keda.sh/paused-replicas=0
kubectl -n nginx annotate -l scaler.keda=true so autoscaling.keda.sh/paused=true

Unpause: Set paused to false or remove annotation:

metadata:
  annotations:
    autoscaling.keda.sh/paused: false

Or via kubectl:

kubectl -n nginx annotate -l scaler.keda=true so autoscaling.keda.sh/paused-replicas=1
kubectl -n nginx annotate -l scaler.keda=true so autoscaling.keda.sh/paused=false
# OR
kubectl -n nginx annotate -l scaler.keda=true so autoscaling.keda.sh/paused-replicas-
kubectl -n nginx annotate -l scaler.keda=true so autoscaling.keda.sh/paused-

You can find more information about scaling deployments and jobs in the official documentation.

Scalers

There are more than 70 scalers available in KEDA. You can find the complete list in the official documentation. KEDA supports scaling from databases, queues, HTTP requests, cron jobs, and more, making it extremely powerful and flexible.

Additional Resources

Conclusion

KEDA is an excellent tool that helps with scaling workloads in Kubernetes based on different events. It can also be useful from a cost-saving perspective, as you can scale your workloads based on actual demand or even time of day. For example, you can scale to zero during the night and scale up during the day.

I’ve been using KEDA in production for more than two years, and it works remarkably well. I hope you will start using it too!