Monitoring¶
This section of the admin guide describes information related to monitoring the Curity Identity Server.
Tip
🔥🔥🔥 If you just want to know how to determine if your instance of Curity is unhealthy and on fire, refer to the information below. 🔥🔥🔥
JMX¶
Java Management Extensions (JMX) is a commonly used interface for monitoring the internals of a Java-based application like the Curity Identity Server. This ability to peer inside the application, however, can be dangerous. It is for this reason that JMX is disabled by default. To enable it, the ENABLE_JMX
can be set before starting the Curity Identity Server; the value is ignored and can can be any non-empty value (e.g., true
, 1
, etc.). This can be done on the command line like this, for instance:
ENABLE_JMX
environment variable¶$ ENABLE_JMX=1 idsvr
With JMX enabled, the following can be monitored and, in some cases, changed:
- LDAP connection pools
- JDBC (database) connection pools
- Web server information (thread pools, object pools, ciphers, etc.)
- JVM settings (e.g., memory, CPU usage, etc.)
- Logging settings (including log levels per logger)
Note that serialization must be enabled for the javax.management.*
classes in order for JMX to function properly. This should be handled automatically in typical scenarios.
Tracing¶
To make tracing easier, Curity has support for adding a HTTP response header containing the node’s service-id
,
which is a unique string based on the service-name
(see Setup Nodes for more information),
to every HTTP response.
To enable this header, set the system property called se.curity:identity-server:http:service-id-header
when starting up Curity.
The value of this property should be the name of the header you want to contain the service-id
.
$ JAVA_OPTS="-Dse.curity:identity-server:http:service-id-header=X-Service-Id" idsvr
When that’s done, every Curity response will contain a header like X-Service-Id: the-service-id
,
making it easier to trace requests to particular Curity nodes.
Note
Refer to OpenTelemetry for more comprehensive support for tracing.
Java Flight Recorder¶
The Curity Identity Server ships with support for Java Flight Recorder and Java Mission Control. These are branded, fully-tested builds of the open-source JDK Flight Recorder and JDK Mission Control which Oracle released in 2018. Coupled with the instrumentation and data collection performed by Flight Recorder inside of Curity, Java Mission Control provides monitoring and management possibilities with very little impact on Curity’s performance.
Java Mission Control can be downloaded from Adoptium’s Web site.[1] Once it is installed and started, you can use it to attach to an instance of Curity if JMX is enabled (as described above). This will allow you to monitor, in real-time, such things as CPU, memory, threads, garbage collections, and much, much more.
You can also record the performance for later analysis. This can be very helpful in difficult support cases, for instance. This can be done in Java Mission Control or by using the jcmd command that is shipped with Curity. Using either is a two-step process:
- Start the recording
- Stop the recording and save the results to a file
Starting a Recording Manually¶
To start a recording or to connect to a remote instance of Curity using Java Mission Control requires additional parameters to be provided when starting that instance. Refer to the Monitoring and Management section of the Java SE documentation for details of what these parameters are. As an example, running Curity in a local Docker container (which effectively makes it remote), it is possible to connect to it from Java Mission Control if it is started with additional parameters that can be passed using the JAVA_OPTS
environment variable like this:
$ docker run -it \
-e ENABLE_JMX=1 \
-e JAVA_OPTS="-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=7091
-Dcom.sun.management.jmxremote.rmi.port=7091
-Djava.rmi.server.hostname=localhost
-Dcom.sun.management.jmxremote.local.only=false" \
-e PASSWORD=$PASSWORD \
-p 6749:6749 \
-p 7091:7091 \
-p 8443:8443 \
curity/idsvr:latest
Warning
The example in listing Listing 82 is provided only for demonstration, and more secure options should be used in production.
With remote access enabled, a connection can be made using Java Mission Control by selecting the Connect… menu option in the File menu, or by selecting New Connection from the context menu of the JVM Browser, or by clicking the New Connect button from the toolbar shown in figure Fig. 28:

Fig. 28 Creating a new connection to a remote instance of Curity
Whichever method is used, the following dialogue page will be shown:

Fig. 29 JMX connection modal in Java Mission Control
Clicking the Test connection button should briefly flicker a status dialogue box and then the Status field should be changed to OK
. If so, click Finish. Otherwise, tweak the settings used when starting the instance of Curity and refer to the Java Monitoring and Management documentation for additional guidance and troubleshooting tips. Baring any connectivity issues, a new JVM should be shown in JVM Browser treeview:

Fig. 30 JVM Browser showing new connection to Curity instance
If the MBean Server child node of this new connection is clicked, a dashboard is shown, like that of figure Fig. 31:

Fig. 31 Dashboard showing memory, JVM CPU, remaining heap memory, and other aspects of the running instance of Curity
To start a recording in Java Mission Control, right-click the Flight Recorder item in the JVM Browser treeview under the established connection, as shown in Fig. 30, and then select Start Flight Recording…. The following dialogue will be shown:

Fig. 32 Starting a recording of the performance of an instance of Curity using Java Mission Control
Select Finish to accept the defaults or make adjustments on the current or subsequent pages as required.
Starting a Recording from the Command Line¶
When shell access to the machine running Curity is available, an alternative to start a recording is to use jcmd. With this tool, you need the process ID of the instance of the Curity Identity Server to be profiled or you can use the package name of its bootstrapper. Both alternatives work but only the latter is shown in the following listings:
$ jcmd se.curity.identityserver.app.Bootstrapper JFR.start
Note
The jcmd command is included in various versions of the JDK, but is not directly provided by Curity. The various options and subcommands supported by this tool can be found in the jcmd documentation.
After the JFR.start
subcommand is run, it will print the command necessary to dump a snapshot of analysis data. It will be something like this:
$ jcmd 59896 JFR.dump name=1 filename=FILEPATH # where 59896 is the process ID of Curity
The state of recording can be checked using the JFR.check
subcommand with either the process ID or the bootstrapper package name like this:
$ jcmd se.curity.identityserver.app.Bootstrapper JFR.check
If recording is not currently underway, this command will provide the instructions on how to start one.
When the dump subcommand is run, the recording will be captured in the specified file. At that point, the file can be opened in Java Mission Control or other tools that support the JDK Flight Recorder format.
Dumping the recording to a file does not stop the recording. To do that, use the JFR.stop
subcommand. This also accepts a filename
and name
parameter like JFR.dump
does except that it stops the recording and the filename
parameter is optional. If the filename
is not provided, then a dump will not be simultaneously made. An example of stopping a recording named 1
is shown in the following listing:
$ jcmd se.curity.identityserver.app.Bootstrapper name=1 filename=/tmp/recording_1.jfr
Starting a Recording on Startup¶
It is also possible to start a recording by providing certain command line options to the Curity Identity Server. This can be done in various ways. For example, this can be achieved by configuring the JVM options. A better way typically though is to set the JAVA_OPTS
environment variable to include the parameters necessary to start the recording when the Curity Identity Server starts. Either way, the parameters will be something like these:
-XX:StartFlightRecording=filename=my-good-file.jfr,duration=10m
For information about available flags that can be passed when starting a recording, refer to the flight recorder command reference.
Using either Java Mission Control, command line options provided to the Curity Identity Server, or jcmd, the resulting file can be analyzed and potentially shared with support. This will give a lot of insight into the source of potential issues. For more information on Flight Controller and Mission Control, refer to the following sources:
- Oracle JMC and Flight Recorder Demo Series
- Java Mission Control for Earthlings presentation
Status Endpoint¶
Curity Identity Server contains an HTTP endpoint providing node status information. Its operation is configured by the following environment variables.
Environment variable | Description | Default value |
---|---|---|
STATUS_CMD_ENABLED |
Endpoint enable state | true |
STATUS_CMD_PORT |
HTTP port to bind to | 4465 |
STATUS_CMD_HOST |
Network host or address to listen on | 0.0.0.0 |
STATUS_CMD_MAX_THREADS |
Maximum thread number | 16 |
By default, this status endpoint is enabled, however it can be disabled by setting the STATUS_CMD_ENABLED
environment variable to false
or by starting idsvr
with the --no-status
parameter.
The status endpoint supports HTTP GET
and HEAD
requests to the /
path.
The response will have status code:
200
if the node is started and configured. Note that a200
status code means the node is configured but doesn’t ensure the server is listening for functional requests. For instance, the node may have a service role that is disabled or the internal HTTP server may still be starting/reconfiguring. To know if a node is ready to serve functional requests use theisServing
JSON field or the/serving
request path (see below).503
if the node is not started or not yet configured.
In both cases, the response body will contain a JSON representation of the node status, containing the following fields:
isReady
false
- the node is not ready to process requests (e.g. it is still booting or is shutting down).true
- the node is ready to receive and process requests (but may be disconnected from the admin).
nodeState
BOOTING
- the node is starting up and not ready to process requests.WAITING
- the node is ready to process requests with the latest configuration that is has; however, it is still waiting to connect to the admin, so configuration may be stale.RUNNING
- the node is ready to process requests and is connected to the admin node.ERROR
- the node is in an unrecoverable error state.STOPPING
- the node is shutting down and not able to process requests.
clusterState
STANDALONE
- the node has clustering disabled.CONNECTING_TO_CLUSTER
- the node is not an admin node and is trying to connect (for the first time) or reconnect to the admin node.CONNECTED
- the node is not admin and is connected to the admin node.ADMIN
- the node is an admin node.ERROR
- an unexpected error occurred when checking the cluster state.
configurationState
UNINITIALIZED
- the node is not configured and therefore unable to correctly process requests.CONFIGURED
- the node is fully configured.RECONFIGURING
- the node is currently consuming a new configuration. A previous configuration is still valid and will be used in the meanwhile for any request processing.
transactionId
- an opaque string identifier for the last committed transaction seen by the current node.isServing
false
- the node’s HTTP server that serves the runtime endpoints is not running.true
- the node’s HTTP server that serves the runtime endpoints is running.
isAdminServing
(this field is returned only on admin nodes)false
- the node’s HTTP server that serves the admin endpoints (i.e. an endpoint serving Admin UI requests) is not running.true
- the node’s HTTP server that serves the admin endpoints is running.
pluginsInitialized
true
- all the plugins installed in the node are done initializing.false
- some plugins installed in the node are still initializing.
The status endpoint also contains the /serving
and /admin-serving
paths to expose the isServing
and isAdminServing
information respectively via the status code, which is useful if the probing system is unable to process JSON representations.
These paths accept both GET
and HEAD
requests.
The response for the /serving
path will have an empty body and status code:
200
- the node’s HTTP server that serves the runtime endpoints is running.503
- the node’s HTTP server that serves the runtime endpoints is not running.
The response for the /admin-serving
path will have an empty body and status code:
200
- the node’s HTTP server that serves the admin endpoints is running.503
- the node’s HTTP server that serves the admin endpoints is not running.404
- the node is not admin.
Note that the /admin-serving
path is only served on admin nodes. On runtime nodes, a request to this path will return 404
.
Command line tool¶
The Curity Identity Server installation also contains the bin/status
command line tool that can be used to probe the HTTP status endpoint.
It uses the same environment variables the server uses and has two invocation parameters:
-j
or--json
- if present, the response written to the standard output is in the JSON format; otherwise it is written in plain text.-h
or--help
- prints the synopsis of the tool-v
- not used but maintained for backward compatibility reasons
The status
tool performs a request to the local node status endpoint and writes the response body to the standard output.
The tool exit code is described in the following table.
Exit code | Description |
---|---|
0 |
The probed node is ready. |
1 |
The status endpoint is disabled and was not probed. |
4 |
There was an IO error while communicating with the status endpoint. |
103 |
A response with a 3xx status was received from the status endpoint. |
104 |
A response with a 4xx status was received from the status endpoint. |
105 |
A response with a 5xx status was received from the status endpoint. |
Prometheus-compliant Metrics¶
Each run-time and admin node exposes an endpoint where certain information is published in a Prometheus-compliant format (i.e., Prometheus’ OpenMetrics format). This allows the Prometheus monitoring tool (or others that can process data in this format) to monitor certain metrics about the behavior of the node. This endpoint is exposed over HTTP and listening on the same interface as the status endpoint described above. The port used is one greater than the status endpoint (4466
by default). The URI is /metrics
, so, for example, the URL of the data would be https://localhost:4466/metrics
.
The metrics exposed and their meanings is described in the following table:
Metric Name | Type | Labels | Meaning |
---|---|---|---|
idsvr_authentication_login_total | Counter | acr, profile_id | The number of authentication events that have occurred |
idsvr_authentication_sso_total | Counter | acr, profile_id | The number of Single Sign-on events that have occurred |
idsvr_cpu_usage | Gauge | The amount of CPU used (0 <= x <= 1) by the Java process that the node started | |
idsvr_datasource_account_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all account data sources are taking |
idsvr_datasource_account_count | Counter | ds_id, ds_type | The number of occurrences that all account data sources are taking |
idsvr_datasource_attribute_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all attribute data sources are taking |
idsvr_datasource_attribute_count | Counter | ds_id, ds_type | The number of occurrences that all attribute data sources are taking |
idsvr_datasource_credential_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all credential data sources are taking |
idsvr_datasource_credential_count | Counter | ds_id, ds_type | The number of occurrences that all credential data sources are taking |
idsvr_datasource_database_client_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all database client data sources are taking |
idsvr_datasource_database_client_count | Counter | ds_id, ds_type | The number of occurrences that all database client data sources are taking |
idsvr_datasource_dcr_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all dynamic client registration data sources are taking |
idsvr_datasource_dcr_count | Counter | ds_id, ds_type | The number of occurrences that all dynamic client registration data sources are taking |
idsvr_datasource_delegation_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all delegation data sources are taking |
idsvr_datasource_delegation_count | Counter | ds_id, ds_type | The number of occurrences that all delegation data sources are taking |
idsvr_datasource_device_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all device data sources are taking |
idsvr_datasource_device_count | Counter | ds_id, ds_type | The number of occurrences that all device data sources are taking |
idsvr_datasource_nonce_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all nonce data sources are taking |
idsvr_datasource_nonce_count | Counter | ds_id, ds_type | The number of occurrences that all nonce data sources are taking |
idsvr_datasource_session_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all session data sources are taking |
idsvr_datasource_session_count | Counter | ds_id, ds_type | The number of occurrences that all session data sources are taking |
idsvr_datasource_token_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all token data sources are taking |
idsvr_datasource_token_count | Counter | ds_id, ds_type | The number of occurrences that all token data sources are taking |
idsvr_datasource_bucket_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all bucket data sources are taking |
idsvr_datasource_bucket_count | Counter | ds_id, ds_type | The number of occurrences that all bucket data sources are taking |
idsvr_http_server_request_time_sum | Counter | The number of and amount of time (in seconds) that all HTTP requests are taking | |
idsvr_http_server_request_time_count | Counter | The number of HTTP requests that have been made | |
idsvr_jvm_memory_used | Gauge | memory_id, memory_area | The amount of memory used (in bytes) by the Java process that the node started |
log4j2_appender_total | Counter | level | The number and severity of log messages which have been written since start up |
idsvr_oauth_delegation_issued_total | Counter | client_id, profile_id | The number of delegations issued |
idsvr_oauth_delegation_revoked_total | Counter | client_id, profile_id | The number of delegations revoked |
idsvr_oauth_token_issued_total | Counter | client_id, token_type, profile_id | The number of OAuth tokens (access, ID, refresh) issued |
idsvr_oauth_token_revoked_total | Counter | client_id, token_type, profile_id | The number of OAuth tokens revoked event counter |
idsvr_oauth_introspection_denied_total | Counter | client_id, profile_id | The number of Introspections denied because of tokens not being active |
idsvr_http_client_request_successful_sum | Counter | http_client_id, authority | The total duration of requests issued by an HTTP client resulting in a response with a successful status code. |
idsvr_http_client_request_successful_count | Counter | http_client_id, authority | The number of requests issued by an HTTP client resulting in a response with a successful status code. |
idsvr_http_client_request_client_error_sum | Counter | http_client_id, authority | The total duration of requests issued by an HTTP client resulting in a response with a client error status code (4xx). |
idsvr_http_client_request_client_error_count | Counter | http_client_id, authority | The number of requests issued by an HTTP client resulting in a response with a client error status code (4xx). |
idsvr_http_client_request_server_error_sum | Counter | http_client_id, authority | The total duration of requests issued by an HTTP client resulting in a response with a server error status code (5xx). |
idsvr_http_client_request_server_error_count | Counter | http_client_id, authority | The number of requests issued by an HTTP client resulting in a response with a server error status code (5xx). |
idsvr_http_client_pool_connections | Gauge | http_client_id | The total number of connections in the connection pool of an HTTP client. |
idsvr_http_client_pool_active_connections | Gauge | http_client_id | The number of active (in-use) connections in the connection pool of an HTTP client. |
idsvr_credential_verification_successful_total | Counter | credential_manager_id | The number of successful credential verifications done by a credential manager. |
idsvr_credential_verification_failed_total | Counter | credential_manager_id | The number of failed credential verifications done by a credential manager. |
idsvr_request_pool_threads | Gauge | The total number of threads in request thread pool. | |
idsvr_request_pool_available_threads | Gauge | The number of threads in request thread pool that are available to handle requests. | |
idsvr_request_pool_active_threads | Gauge | The number of threads in request thread pool that are in use. | |
idsvr_request_pool_queue_size | Gauge | Request thread pool’s queue size (i.e. number of requests waiting for a thread) |
Note
The client_id
of the idsvr_oauth_token_issued
metric for ID tokens will be that of the requesting client (i.e., the authorized party). For all other tokens, the client_id
is the ID of the client to whom the token was issued.
Note
Request thread pool metrics (whose name starts with idsvr_request_pool_
) are only available when JMX is enabled.
In addition to the global metrics described above, some plugins may contribute with metrics related to their particular usage. Namely, the JDBC data source plugin exposes metrics of the underlying connection pool.
The labels in the previous table have the meanings described in the following table:
Label Name | Meaning |
---|---|
acr | The authentication class context reference (ACR) of the authenticator used for login or SSO (as applicable) |
client_id | The identifier of the OAuth client to which the metric is related. This is disabled by default. |
ds_id | The identifier of the data source to which the metric is related |
ds_type | The type of data source to which the metric is related (e.g., ldap, jdbc, etc.) |
level | The level of the log message (e.g., error, warn, etc.) |
memory_id | The identifier representing the pool of memory being measured (e.g., G1 Old Gen, etc.) |
memory_area | The type of memory being measured (heap, non-heap, etc.) |
token_type | The type of token to which the measurement is related (e.g., access_token, etc.) |
profile_id | The identifier of the profile to which the metric is related. This is disabled by default. |
http_client_id | The identifier of the HTTP client to which the metric is related |
authority | The authority (hostname and port) used to contact the target system to which the metric is related. |
credential_manager_id | The identifier of the credential manager to which the metric is related. |
Note
By default no unique values are reported for client_id
to prevent value explosion which Prometheus has a hard time handling. If the system only contains a small number of clients then this can be enabled by setting the system property se.curity:identity-server:reporting:include-client-id-label=true
when starting the Curity Identity Server.
Gathering of data can be disabled. If this is set when the node starts, no data will be published. To disable gathering of data, in the admin UI, go to Enable Reporting
. Once that change is committed, all nodes will stop gathering data.
Common Alerts¶
If you want to setup certain alerts when things go wrong in the Curity Identity Server, you can simply setup the following:
- If
datasource_*_sum
/datasource_*_count
>= 800 since the last poll to the metrics endpoint, your database is having issues. The result of this arithmetic is the average response time from the Curity Identity Server to the database (for the given period). - If
log4j2_appender_total
with a label oferror
is > 0, call support! - If
log4j2_appender_total
with a label ofwarn
is greater than the last poll, look into the issue immediately, and raise a support case if you can’t figure out the problem. - If
cpu_usage
is >= 95% at an unexpected time or for a prolonged period of time, you should take action. - If
http_server_request_time_sum
/http_server_request_time_count
>= 1000 since the last poll to the metrics endpoint. The result of this arithmetic is the average HTTP response time to the the Curity Identity Server Web server (for the given period).
[1] | The Adoptium Working Group is the provider of the Eclipse Temurin JDK, the Java Platform which Curity delivers. |
Configuration¶
Configuring Prometheus metrics is done under /environments/environment/reporting
section.
Parameter | Description |
---|---|
enable |
Flag to enable/disable gathering of Prometheus metrics |
include-profile-id |
Flag indicating whether profile_id label should be enabled |
% show environments environment reporting
enable true;
include-profile-id true;