Prometheus-compliant Metrics#
This chapter of the Operation and Monitoring guide contains information about the Prometheus-compliant metrics available in the Curity Identity Server.
Each run-time and admin node exposes an endpoint where certain information is published in a Prometheus-compliant format (i.e., Prometheus’ OpenMetrics format). This allows the Prometheus monitoring tool (or others that can process data in this format) to monitor certain metrics about the behavior of the node. This endpoint is exposed over HTTP and listening on the same interface as the Status Endpoint . The port used is one greater than the status endpoint (4466 by default). The URI is /metrics, so, for example, the URL of the data would be https://localhost:4466/metrics.
The metrics exposed and their meanings is described in the following table:
| Metric Name | Type | Labels | Meaning |
|---|---|---|---|
| idsvr_authentication_login_total | Counter | acr, profile_id | The number of authentication events that have occurred |
| idsvr_authentication_sso_total | Counter | acr, profile_id | The number of Single Sign-on events that have occurred |
| idsvr_cpu_usage | Gauge | The amount of CPU used (0 <= x <= 1) by the Java process that the node started | |
| idsvr_datasource_account_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all account data sources are taking |
| idsvr_datasource_account_count | Counter | ds_id, ds_type | The number of occurrences that all account data sources are taking |
| idsvr_datasource_attribute_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all attribute data sources are taking |
| idsvr_datasource_attribute_count | Counter | ds_id, ds_type | The number of occurrences that all attribute data sources are taking |
| idsvr_datasource_credential_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all credential data sources are taking |
| idsvr_datasource_credential_count | Counter | ds_id, ds_type | The number of occurrences that all credential data sources are taking |
| idsvr_datasource_database_client_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all database client data sources are taking |
| idsvr_datasource_database_client_count | Counter | ds_id, ds_type | The number of occurrences that all database client data sources are taking |
| idsvr_datasource_dcr_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all dynamic client registration data sources are taking |
| idsvr_datasource_dcr_count | Counter | ds_id, ds_type | The number of occurrences that all dynamic client registration data sources are taking |
| idsvr_datasource_delegation_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all delegation data sources are taking |
| idsvr_datasource_delegation_count | Counter | ds_id, ds_type | The number of occurrences that all delegation data sources are taking |
| idsvr_datasource_device_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all device data sources are taking |
| idsvr_datasource_device_count | Counter | ds_id, ds_type | The number of occurrences that all device data sources are taking |
| idsvr_datasource_nonce_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all nonce data sources are taking |
| idsvr_datasource_nonce_count | Counter | ds_id, ds_type | The number of occurrences that all nonce data sources are taking |
| idsvr_datasource_session_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all session data sources are taking |
| idsvr_datasource_session_count | Counter | ds_id, ds_type | The number of occurrences that all session data sources are taking |
| idsvr_datasource_token_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all token data sources are taking |
| idsvr_datasource_token_count | Counter | ds_id, ds_type | The number of occurrences that all token data sources are taking |
| idsvr_datasource_bucket_sum | Counter | ds_id, ds_type | The sum of total time (in seconds) that all bucket data sources are taking |
| idsvr_datasource_bucket_count | Counter | ds_id, ds_type | The number of occurrences that all bucket data sources are taking |
| idsvr_http_server_request_time_sum | Counter | The number of and amount of time (in seconds) that all HTTP requests are taking | |
| idsvr_http_server_request_time_count | Counter | The number of HTTP requests that have been made | |
| idsvr_jvm_memory_used | Gauge | memory_id, memory_area | The amount of memory used (in bytes) by the Java process that the node started |
| log4j2_appender_total | Counter | level | The number and severity of log messages which have been written since start up |
| idsvr_oauth_delegation_issued_total | Counter | client_id, profile_id | The number of delegations issued |
| idsvr_oauth_delegation_revoked_total | Counter | client_id, profile_id | The number of delegations revoked |
| idsvr_oauth_token_issued_total | Counter | client_id, token_type, profile_id | The number of OAuth tokens (access, ID, refresh) issued |
| idsvr_oauth_token_revoked_total | Counter | client_id, token_type, profile_id | The number of OAuth tokens revoked event counter |
| idsvr_oauth_introspection_denied_total | Counter | client_id, profile_id | The number of Introspections denied because of tokens not being active |
| idsvr_http_client_request_successful_sum | Counter | http_client_id, authority | The total duration of requests issued by an HTTP client resulting in a response with a successful status code. |
| idsvr_http_client_request_successful_count | Counter | http_client_id, authority | The number of requests issued by an HTTP client resulting in a response with a successful status code. |
| idsvr_http_client_request_client_error_sum | Counter | http_client_id, authority | The total duration of requests issued by an HTTP client resulting in a response with a client error status code (4xx). |
| idsvr_http_client_request_client_error_count | Counter | http_client_id, authority | The number of requests issued by an HTTP client resulting in a response with a client error status code (4xx). |
| idsvr_http_client_request_server_error_sum | Counter | http_client_id, authority | The total duration of requests issued by an HTTP client resulting in a response with a server error status code (5xx). |
| idsvr_http_client_request_server_error_count | Counter | http_client_id, authority | The number of requests issued by an HTTP client resulting in a response with a server error status code (5xx). |
| idsvr_http_client_request_network_error_sum | Counter | http_client_id, authority | The total duration of requests issued by an HTTP client resulting in a network error (DNS resolution failure, socket timeout, connection reset…). |
| idsvr_http_client_request_network_error_count | Counter | http_client_id, authority | The number of requests issued by an HTTP client resulting in a network error (DNS resolution failure, socket timeout, connection reset…). |
| idsvr_http_client_pool_connections | Gauge | http_client_id | The total number of connections in the connection pool of an HTTP client. |
| idsvr_http_client_pool_active_connections | Gauge | http_client_id | The number of active (in-use) connections in the connection pool of an HTTP client. |
| idsvr_credential_verification_successful_total | Counter | credential_manager_id | The number of successful credential verifications done by a credential manager. |
| idsvr_credential_verification_failed_total | Counter | credential_manager_id | The number of failed credential verifications done by a credential manager. |
| idsvr_request_pool_threads | Gauge | The total number of threads in request thread pool. | |
| idsvr_request_pool_available_threads | Gauge | The number of threads in request thread pool that are available to handle requests. | |
| idsvr_request_pool_active_threads | Gauge | The number of threads in request thread pool that are in use. | |
| idsvr_request_pool_queue_size | Gauge | Request thread pool’s queue size (i.e. number of requests waiting for a thread) |
The client_id of the idsvr_oauth_token_issued metric for ID tokens will be that of the requesting client (i.e.,
the authorized party). For all other tokens, the client_id is the ID of the client to whom the token was issued.
Request thread pool metrics (whose name starts with idsvr_request_pool_) are only available when JMX is enabled.
In addition to the global metrics described above, some plugins may contribute with metrics related to their particular usage. Namely, the JDBC data source plugin exposes metrics of the underlying connection pool.
The labels in the previous table have the meanings described in the following table:
| Label Name | Meaning |
|---|---|
| acr | The authentication class context reference (ACR) of the authenticator used for login or SSO (as applicable) |
| client_id | The identifier of the OAuth client to which the metric is related. This is disabled by default. |
| ds_id | The identifier of the data source to which the metric is related |
| ds_type | The type of data source to which the metric is related (e.g., ldap, jdbc, etc.) |
| level | The level of the log message (e.g., error, warn, etc.) |
| memory_id | The identifier representing the pool of memory being measured (e.g., G1 Old Gen, etc.) |
| memory_area | The type of memory being measured (heap, non-heap, etc.) |
| token_type | The type of token to which the measurement is related (e.g., access_token, etc.) |
| profile_id | The identifier of the profile to which the metric is related. This is disabled by default. |
| http_client_id | The identifier of the HTTP client to which the metric is related |
| authority | The authority (hostname and port) used to contact the target system to which the metric is related. |
| credential_manager_id | The identifier of the credential manager to which the metric is related. |
By default no unique values are reported for client_id to prevent value explosion which Prometheus has a hard time
handling. If the system only contains a small number of clients then this can be enabled by setting the system
property se.curity:identity-server:reporting:include-client-id-label=true when starting the Curity Identity
Server.
Gathering of data can be disabled. If this is set when the node starts, no data will be published. To disable gathering of data, in the admin UI, go to System > General. There, toggle off Enable Reporting. Once that change is committed, all nodes will stop gathering data.
Common Alerts#
If you want to setup certain alerts when things go wrong in the Curity Identity Server, you can simply setup the following:
- If
datasource_*_sum/datasource_*_count>= 800 since the last poll to the metrics endpoint, your database is having issues. The result of this arithmetic is the average response time from the Curity Identity Server to the database (for the given period). - If
log4j2_appender_totalwith a label oferroris > 0, call support! - If
log4j2_appender_totalwith a label ofwarnis greater than the last poll, look into the issue immediately, and raise a support case if you can’t figure out the problem. - If
cpu_usageis >= 95% at an unexpected time or for a prolonged period of time, you should take action. - If
http_server_request_time_sum/http_server_request_time_count>= 1000 since the last poll to the metrics endpoint. The result of this arithmetic is the average HTTP response time to the Curity Identity Server Web server (for the given period).
Configuration#
Configuring Prometheus metrics is done under configuration-reference/environments/environment/reporting section.
| Parameter | Description |
|---|---|
enable | Flag to enable/disable gathering of Prometheus metrics |
include-profile-id | Flag indicating whether profile_id label should be enabled |
A configured reporting shown in the CLI
% show environments environment reporting
enable true;
include-profile-id true;