/images/resources/tutorials/configuration/tutorials-health-auto-healing.png

Health and Auto Healing

On this page

Status Endpoint

The Curity Identity Server provides a status endpoint, which by default is enabled and runs on port 4465. It is used by monitoring systems to check the health of the Curity Identity Server. The simplest way to check the health of runtime nodes is to make a GET or HEAD request to the base URL, then check the HTTP status returned:

bash
1
curl -X GET http://runtimenode-instance1:4465

An HTTP status of 200 means the instance is healthy and serving requests, whereas a 503 indicates that the instance is unhealthy or not yet available. You need to avoid false readings though, such as classifying an instance as unhealthy when it is booting, or when there is an intermittent infrastructure issue for a few seconds.

Kubernetes Health Checks

To understand end-to-end usage of the status endpoint, see how this is handled in a Kubernetes deployment. The Curity Identity Server is usually deployed to Kubernetes using the Helm Chart, and a values file can be used to customize behavior. The following default values are used, and port 4465 only needs to be contactable inside the cluster:

yaml
123456789101112131415161718192021
curity:
healthCheckPort: 4465
adminUiPort: 6749
adminUiHttp: false
runtime:
role: default
service:
type: ClusterIP
port: 8443
livenessProbe:
timeoutSeconds: 1
failureThreshold: 3
periodSeconds: 10
initialDelaySeconds: 30
readinessProbe:
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 3
periodSeconds: 10
initialDelaySeconds: 30

The Helm chart then adds a standard httpGet command. Every 10 seconds, the Kubernetes platform calls this endpoint, on each of the instances of the Curity Identity Server in the cluster. The same HTTP request is used for both liveness probes, which check whether the component is available, and readiness probes, which check whether it should continue to receive requests. An initialDelaySeconds is also provided, to give the containers for the Curity Identity Server time to start:

yaml
12345678910111213141516171819202122232425262728293031
spec:
containers:
- name: idsvr-runtime
image: "custom_idsvr:7.3.1"
ports:
- name: http-port
containerPort: 8443
protocol: TCP
- name: health-check
containerPort: 4465
protocol: TCP
- name: metrics
containerPort: 4466
protocol: TCP
livenessProbe:
httpGet:
path: /
port: health-check
timeoutSeconds: 1
failureThreshold: 3
periodSeconds: 10
initialDelaySeconds: 120
readinessProbe:
httpGet:
path: /
port: health-check
timeoutSeconds: 1
failureThreshold: 3
successThreshold: 3
periodSeconds: 10
initialDelaySeconds: 120

In the event of a probe failing three or more times consecutively for a particular container, that instance will be marked down. The platform will then delete the container and spin up a new one, to maintain the desired state of the cluster. Various settings can be adjusted in order to refine behavior, and these are explained in the Kubernetes documentation on Configuring Probes.

Other Platforms

The same concepts exist in other cloud native platforms. As an example, the Curity Identity Server could be deployed to AWS using EC2 instances. The status endpoints would then be used by Amazon EC2 Auto Scaling to implement equivalent auto-healing behavior.

Status Responses

The full response from the status endpoint includes a JSON payload, containing more detailed information. For each runtime node this returns the following information:

json
12345678
{
"isReady": true,
"nodeState": "RUNNING",
"clusterState": "CONNECTED",
"configurationState": "CONFIGURED",
"transactionId": "67D-56A4D-85FBA",
"isServing": true
}

For the admin node, similar information is returned:

json
12345678
{
"isReady": true,
"nodeState": "RUNNING",
"clusterState": "ADMIN",
"configurationState": "CONFIGURED",
"transactionId": "67D-56A4D-85FBA",
"isServing": false
}

In most cases, the default health checks will be sufficient, but the extra detail allows warning level information to be conveyed. An example might be that a runtime node has a nodeState of WAITING, and is in a working state, but is in the process of connecting to the admin node to get a configuration update. In the event of this persisting, you might decide that health checks should replace the instance. Full details about the response fields and their possible states is provided in the System Admin Guide.

Alarms

Although health checks may be passing for the Curity Identity Server, this does not guarantee that OAuth requests are working correctly for applications. A different type of failure occurs if a dependency of the Curity Identity Server, such as a data source, becomes unavailable, or its connection details are configured incorrectly.

In this case the Curity Identity Server itself is not failing, so its health checks will continue to indicate success. There is no point in the platform replacing instances of the Curity Identity Server, since that will not solve the problem. Instead this scenario is handled by a different monitoring use case of Alarms.

When a dependency fails, the Curity Identity Server raises events that other systems can subscribe to. This enables you to use one of the built-in notifiers, or implement a custom notifier, to enable your preferred behavior. The Integrate Alarms with Monitoring Systems tutorial provides further details on how alarms can be managed.

Conclusion

The Curity Identity Server implements health checks in a standard way, and can be integrated with any monitoring system. This enables you to use modern cloud native approaches to maintain the desired state of the cluster, or raise alerts to people. In most cases this will require very little work, and will simply use the platform's built-in features.

Join our Newsletter

Get the latest on identity management, API Security and authentication straight to your inbox.

Start Free Trial

Try the Curity Identity Server for Free. Get up and running in 10 minutes.

Start Free Trial