Prometheus-compliant Metrics#

This chapter of the Operation and Monitoring guide contains information about the Prometheus-compliant metrics available in the Curity Identity Server.

Each run-time and admin node exposes an endpoint where certain information is published in a Prometheus-compliant format (i.e., Prometheus’ OpenMetrics format). This allows the Prometheus monitoring tool (or others that can process data in this format) to monitor certain metrics about the behavior of the node. This endpoint is exposed over HTTP and listening on the same interface as the Status Endpoint . The port used is one greater than the status endpoint (4466 by default). The URI is /metrics, so, for example, the URL of the data would be https://localhost:4466/metrics.

The metrics exposed and their meanings is described in the following table:

Metric NameTypeLabelsMeaning
idsvr_authentication_login_totalCounteracr, profile_idThe number of authentication events that have occurred
idsvr_authentication_sso_totalCounteracr, profile_idThe number of Single Sign-on events that have occurred
idsvr_cpu_usageGaugeThe amount of CPU used (0 <= x <= 1) by the Java process that the node started
idsvr_datasource_account_sumCounterds_id, ds_typeThe sum of total time (in seconds) that all account data sources are taking
idsvr_datasource_account_countCounterds_id, ds_typeThe number of occurrences that all account data sources are taking
idsvr_datasource_attribute_sumCounterds_id, ds_typeThe sum of total time (in seconds) that all attribute data sources are taking
idsvr_datasource_attribute_countCounterds_id, ds_typeThe number of occurrences that all attribute data sources are taking
idsvr_datasource_credential_sumCounterds_id, ds_typeThe sum of total time (in seconds) that all credential data sources are taking
idsvr_datasource_credential_countCounterds_id, ds_typeThe number of occurrences that all credential data sources are taking
idsvr_datasource_database_client_sumCounterds_id, ds_typeThe sum of total time (in seconds) that all database client data sources are taking
idsvr_datasource_database_client_countCounterds_id, ds_typeThe number of occurrences that all database client data sources are taking
idsvr_datasource_dcr_sumCounterds_id, ds_typeThe sum of total time (in seconds) that all dynamic client registration data sources are taking
idsvr_datasource_dcr_countCounterds_id, ds_typeThe number of occurrences that all dynamic client registration data sources are taking
idsvr_datasource_delegation_sumCounterds_id, ds_typeThe sum of total time (in seconds) that all delegation data sources are taking
idsvr_datasource_delegation_countCounterds_id, ds_typeThe number of occurrences that all delegation data sources are taking
idsvr_datasource_device_sumCounterds_id, ds_typeThe sum of total time (in seconds) that all device data sources are taking
idsvr_datasource_device_countCounterds_id, ds_typeThe number of occurrences that all device data sources are taking
idsvr_datasource_nonce_sumCounterds_id, ds_typeThe sum of total time (in seconds) that all nonce data sources are taking
idsvr_datasource_nonce_countCounterds_id, ds_typeThe number of occurrences that all nonce data sources are taking
idsvr_datasource_session_sumCounterds_id, ds_typeThe sum of total time (in seconds) that all session data sources are taking
idsvr_datasource_session_countCounterds_id, ds_typeThe number of occurrences that all session data sources are taking
idsvr_datasource_token_sumCounterds_id, ds_typeThe sum of total time (in seconds) that all token data sources are taking
idsvr_datasource_token_countCounterds_id, ds_typeThe number of occurrences that all token data sources are taking
idsvr_datasource_bucket_sumCounterds_id, ds_typeThe sum of total time (in seconds) that all bucket data sources are taking
idsvr_datasource_bucket_countCounterds_id, ds_typeThe number of occurrences that all bucket data sources are taking
idsvr_http_server_request_time_sumCounterThe number of and amount of time (in seconds) that all HTTP requests are taking
idsvr_http_server_request_time_countCounterThe number of HTTP requests that have been made
idsvr_jvm_memory_usedGaugememory_id, memory_areaThe amount of memory used (in bytes) by the Java process that the node started
log4j2_appender_totalCounterlevelThe number and severity of log messages which have been written since start up
idsvr_oauth_delegation_issued_totalCounterclient_id, profile_idThe number of delegations issued
idsvr_oauth_delegation_revoked_totalCounterclient_id, profile_idThe number of delegations revoked
idsvr_oauth_token_issued_totalCounterclient_id, token_type, profile_idThe number of OAuth tokens (access, ID, refresh) issued
idsvr_oauth_token_revoked_totalCounterclient_id, token_type, profile_idThe number of OAuth tokens revoked event counter
idsvr_oauth_introspection_denied_totalCounterclient_id, profile_idThe number of Introspections denied because of tokens not being active
idsvr_http_client_request_successful_sumCounterhttp_client_id, authorityThe total duration of requests issued by an HTTP client resulting in a response with a successful status code.
idsvr_http_client_request_successful_countCounterhttp_client_id, authorityThe number of requests issued by an HTTP client resulting in a response with a successful status code.
idsvr_http_client_request_client_error_sumCounterhttp_client_id, authorityThe total duration of requests issued by an HTTP client resulting in a response with a client error status code (4xx).
idsvr_http_client_request_client_error_countCounterhttp_client_id, authorityThe number of requests issued by an HTTP client resulting in a response with a client error status code (4xx).
idsvr_http_client_request_server_error_sumCounterhttp_client_id, authorityThe total duration of requests issued by an HTTP client resulting in a response with a server error status code (5xx).
idsvr_http_client_request_server_error_countCounterhttp_client_id, authorityThe number of requests issued by an HTTP client resulting in a response with a server error status code (5xx).
idsvr_http_client_request_network_error_sumCounterhttp_client_id, authorityThe total duration of requests issued by an HTTP client resulting in a network error (DNS resolution failure, socket timeout, connection reset…).
idsvr_http_client_request_network_error_countCounterhttp_client_id, authorityThe number of requests issued by an HTTP client resulting in a network error (DNS resolution failure, socket timeout, connection reset…).
idsvr_http_client_pool_connectionsGaugehttp_client_idThe total number of connections in the connection pool of an HTTP client.
idsvr_http_client_pool_active_connectionsGaugehttp_client_idThe number of active (in-use) connections in the connection pool of an HTTP client.
idsvr_credential_verification_successful_totalCountercredential_manager_idThe number of successful credential verifications done by a credential manager.
idsvr_credential_verification_failed_totalCountercredential_manager_idThe number of failed credential verifications done by a credential manager.
idsvr_request_pool_threadsGaugeThe total number of threads in request thread pool.
idsvr_request_pool_available_threadsGaugeThe number of threads in request thread pool that are available to handle requests.
idsvr_request_pool_active_threadsGaugeThe number of threads in request thread pool that are in use.
idsvr_request_pool_queue_sizeGaugeRequest thread pool’s queue size (i.e. number of requests waiting for a thread)

The client_id of the idsvr_oauth_token_issued metric for ID tokens will be that of the requesting client (i.e., the authorized party). For all other tokens, the client_id is the ID of the client to whom the token was issued.

Request thread pool metrics (whose name starts with idsvr_request_pool_) are only available when JMX is enabled.

In addition to the global metrics described above, some plugins may contribute with metrics related to their particular usage. Namely, the JDBC data source plugin exposes metrics of the underlying connection pool.

The labels in the previous table have the meanings described in the following table:

Label NameMeaning
acrThe authentication class context reference (ACR) of the authenticator used for login or SSO (as applicable)
client_idThe identifier of the OAuth client to which the metric is related. This is disabled by default.
ds_idThe identifier of the data source to which the metric is related
ds_typeThe type of data source to which the metric is related (e.g., ldap, jdbc, etc.)
levelThe level of the log message (e.g., error, warn, etc.)
memory_idThe identifier representing the pool of memory being measured (e.g., G1 Old Gen, etc.)
memory_areaThe type of memory being measured (heap, non-heap, etc.)
token_typeThe type of token to which the measurement is related (e.g., access_token, etc.)
profile_idThe identifier of the profile to which the metric is related. This is disabled by default.
http_client_idThe identifier of the HTTP client to which the metric is related
authorityThe authority (hostname and port) used to contact the target system to which the metric is related.
credential_manager_idThe identifier of the credential manager to which the metric is related.

By default no unique values are reported for client_id to prevent value explosion which Prometheus has a hard time handling. If the system only contains a small number of clients then this can be enabled by setting the system property se.curity:identity-server:reporting:include-client-id-label=true when starting the Curity Identity Server.

Gathering of data can be disabled. If this is set when the node starts, no data will be published. To disable gathering of data, in the admin UI, go to System > General. There, toggle off Enable Reporting. Once that change is committed, all nodes will stop gathering data.

Common Alerts#

If you want to setup certain alerts when things go wrong in the Curity Identity Server, you can simply setup the following:

  • If datasource_*_sum / datasource_*_count >= 800 since the last poll to the metrics endpoint, your database is having issues. The result of this arithmetic is the average response time from the Curity Identity Server to the database (for the given period).
  • If log4j2_appender_total with a label of error is > 0, call support!
  • If log4j2_appender_total with a label of warn is greater than the last poll, look into the issue immediately, and raise a support case if you can’t figure out the problem.
  • If cpu_usage is >= 95% at an unexpected time or for a prolonged period of time, you should take action.
  • If http_server_request_time_sum / http_server_request_time_count >= 1000 since the last poll to the metrics endpoint. The result of this arithmetic is the average HTTP response time to the Curity Identity Server Web server (for the given period).

Configuration#

Configuring Prometheus metrics is done under configuration-reference/environments/environment/reporting section.

ParameterDescription
enableFlag to enable/disable gathering of Prometheus metrics
include-profile-idFlag indicating whether profile_id label should be enabled

A configured reporting shown in the CLI

% show environments environment reporting
enable             true;
include-profile-id true;

Was this helpful?