
Kubernetes Auto Scaling
Overview
This tutorial shows how to auto scale runtime nodes of the Curity Identity Server based on its built-in custom metrics and the features of the Kubernetes platform. The system will run on a development computer and a GitHub Repository provides helper resources to automate the setup and teardown:
Example Autoscaling Requirements
The autoscaling logic for this tutorial is summarized below, and scaling will be based on OAuth request time, though the approach demonstrated could be adapted to use alternative or additional metrics:
Desired State
When the Curity Identity Server's OAuth request time has averaged greater than 100 milliseconds for a warmup period, the system will automatically scale up. When the system returns to a level below 100 milliseconds for a cooldown period, the number of runtime nodes will automatically scale down.Kubernetes Base System
To follow this article, either run the Kubernetes Demo Installation using minikube, or alternatively supply your own cluster. If using minikube, first edit create-cluster.sh
and increase the cpus
to 4 and memory to 16384
, since a number of extra monitoring components will be deployed:
minikube start --cpus=4 --memory=16384 --disk-size=50g --driver=hyperkit --profile curity
The minikube system initially contains 2 runtime nodes for the Curity Identity Server, and each of these provides Prometheus compliant metrics:
Once the system is up, run the 1-generate-metrics.sh
script, which runs 100 OAuth requests to authenticate via the Client Credentials Grant and displays the access tokens returned:
The script ends by using the kubectl tool to call Identity Server metrics endpoints directly, to get the fields used in this tutorial:
- idsvr_http_server_request_time_sum
- idsvr_http_server_request_time_count
Components
The monitoring system will frequently call the Identity Server metrics endpoints and store the results as a time series
. The Kubernetes Horizontal Pod Autoscaler will then be configured to receive an overall result, so that it is informed when the number of nodes needs to change:
In this tutorial, Prometheus will be used as the monitoring system, and this requires us to install both the Prometheus system and also a Custom Metrics API
, whose role is to transform values for the autoscaler:
The autoscaler will receive a single aggregated rate
value to represent the average OAuth request time for all runtime nodes in the cluster. The autoscaler will then compare this value to a threshold beyond which auto-scaling is needed.
Installing Prometheus
A fast installation of Prometheus components can be done by running the 2-install-prometheus.sh
script from the helper resources, which deploys the Kube Prometheus System with default settings:
This deploys multiple components within a Kubernetes monitoring
namespace:
The Prometheus Admin UI and the Grafana UI can both be exposed to the local computer via port forwarding commands:
kubectl -n monitoring port-forward svc/prometheus-k8s 9090
kubectl -n monitoring port-forward svc/grafana 3000
Next run the 3-install-custom-metrics-api.sh
script, which will provide endpoints from which the Horizontal Pod Autoscaler gets metrics:
Once complete the Custom Metrics API components are available under a Kubernetes custom-metrics
namespace:
kubectl get all -n custom-metrics
The following command can be run to verify that the custom metrics API is correctly communicating with Prometheus, and this should result in a large JSON response:
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
Integrating Identity Server Metrics
Next run the 4-install-service-monitor.sh
script, which creates a Kubernetes ServiceMonitor resource that tells the Prometheus system to start collecting data from the Curity Identity Server's metrics endpoints:
kind: ServiceMonitor
apiVersion: monitoring.coreos.com/v1
metadata:
name: curity-idsvr-runtime
spec:
selector:
matchLabels:
app.kubernetes.io/name: idsvr
role: curity-idsvr-runtime
endpoints:
- port: metrics
Run the 1-generate-metrics.sh
script again and then browse to the Prometheus UI at http://localhost:9090. You will then able to view metrics, and those beginning with idsvr
will have been supplied by the Curity Identity Server:
The raw metric values can then be queried by calling the Custom Metrics API directly:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/idsvr_http_server_request_time_sum" | jq
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/idsvr_http_server_request_time_count" | jq
As an alternative to using service monitors, Identity Server metrics can be reported to Prometheus via scrape annotations
, for customers who prefer that option. The Curity Identity Server's Helm chart writes the following annotations to support this:
kind: Deployment
metadata:
name: curity-idsvr-runtime
spec:
template:
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: /metrics
prometheus.io/port: "4466"
Autoscaling Calculation
It is important to base autoscaling on an accurate calculation, and this tutorial will use the following value, which gets the total request time across all nodes for the last 5 minutes, then divides it by the total number of requests from all nodes:
sum(rate(idsvr_http_server_request_time_sum[5m])) /
sum(rate(idsvr_http_server_request_time_count[5m]))
The expected result can be verified by entering the formula into the Prometheus UI:
When there are no requests the query will return a not a number (NaN)
or negative result. This is the standard behavior, but if preferred the calculation can be updated to return zero instead:
(sum(rate(idsvr_http_server_request_time_sum[5m])) /
sum(rate(idsvr_http_server_request_time_count[5m])))
> 0 or on() vector(0)
Creating the Autoscaler
Run the 5-install-autoscaler.sh
script, which first adds an external metric for the aggregated value and gives it a name of idsvr_http_server_request_rate
. This is expressed in the Custom Metrics API's configmap via a PromQL Query:
- seriesQuery: '{namespace!="",__name__!~"^container_.*"}'
resources:
overrides:
namespace:
resource: namespace
name:
matches: "^(.*)$"
as: "idsvr_http_server_request_rate"
metricsQuery: 'sum(rate(idsvr_http_server_request_time_sum[5m])) / sum(rate(idsvr_http_server_request_time_count[5m]))'
The script then creates the actual autoscaler Kubernetes resource, using the following yaml. The value expressed as a target is the 100ms threshold beyond which the system needs to start scaling up:
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta2
metadata:
name: curity-idsvr-runtime-autoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: curity-idsvr-runtime
minReplicas: 2
maxReplicas: 4
metrics:
- type: External
external:
metric:
name: idsvr_http_server_request_rate
target:
type: Value
value: 100m
Once deployed, run the 1-generate-metrics.sh
script again, then run the following external metrics query a few seconds later to get the aggregated value that is used for autoscaling:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/idsvr_http_server_request_rate" | jq
The current state of OAuth requests and the number of nodes can then be queried. This will look healthy initially, since the metric is well within its limits:
kubectl describe hpa/curity-idsvr-runtime-autoscaler
Testing Autoscaling
Next use the Admin UI for the Curity Identity Server, navigate to Profiles / Token Service / Endpoints / oauth-token / Client Credentials Grant
and select the New Procedure
option, then paste in the following code, to simulate some slow performance:
var sleepMilliseconds = 200;
var startTime = Date.now();
var currentTime = null;
do {
currentTime = Date.now();
} while (currentTime - startTime < sleepMilliseconds);
After committing changes, run the 1-generate-metrics.sh
script again, and responses will now be slower. Describe the autoscaler again and you will see that the system has determined that it needs to scale up:
The algorithm for selecting the number of new nodes is explained in the Kubernetes Documentation:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
Next undo and commit the above Javascript changes, and notice that after a few minutes of performance being comfortably within limits, the number of nodes is gradually scaled down again.
Tuning Autoscaling
The Kubernetes system deals with auto-scaling in a mature manner with these qualities:
Aspect | Description |
---|---|
Phased Scaling | Kubernetes supports warm up and cool down delays to prevent 'thrashing', so that a metric value change does not instantly trigger spinning up of new nodes. |
Extensibility | This tutorial's autoscaler could be easily extended to use multiple metrics. The desired replica count algorithm then selects the highest value from all included metrics. |
Metrics can be visualized over time using the Grafana system that is deployed with Prometheus, by browsing to http://localhost:3000 and signing in as admin / admin
. This enables a dashboard to be created with Prometheus as the data source, to show the value over time of the metric used for autoscaling.
It is important to combine autoscaling with other checks on the system, and the below options are commonly used:
- The Curity Grafana Dashboard can be imported to visualize the system state
- The Curity Alarm Subsystem can be used to raise alerts to people when a particular node performs poorly
- If the autoscaling metric is frequently flipping above and below its threshold then the default number of instances or the use of metrics may need to be further refined
Conclusion
The Curity Identity Server implements the industry standard solution for providing metrics about its runtime performance and use of resources. Results can then be supplied to monitoring systems, which will aggregate the results. One of the main use cases for metrics is to auto-scale the system if it is ever struggling under load, and any modern cloud native platform will support this.