Distributed Service#

Curity Identity Server has an internal Distributed Service which is used to implement certain features that require communication between different Curity nodes in a Cluster .

This service is mostly independent of the clustering configuration used by the Configuration Service (see notes below about the cluster key), but can only be configured if the Configuration Service cluster is also configured.

If the Distributed Service is not configured explicitly, it still runs in all Curity deployments where the Configuration cluster is configured. In such case, the Distributed Service uses the Configuration cluster key for communication.

Cluster formation#

For the Distributed Service to work, it is important that all nodes are able to connect to each other using the TCP protocol. By default, port 6790 is used, but that can be configured.

The binary message protocol used by the Cluster is considered internal, and hence not fully documented here. This section does mention a few types of messages that nodes can send to each other only to clarify important information about how the cluster works.

When a node starts up, it will immediately connect to the Admin node using the address configured in the Configuration cluster configuration. Each Runtime node (i.e. non-admin nodes) sends its own hostname to Admin via an IDENTITY message, and then requests to join the cluster via a JOIN message.

The Admin Node decides when to accept/remove nodes into the cluster. When the cluster topology needs to change, the Admin Node informs all nodes about it as shown in the figure below (the TOPOLOGY UPDATE message is sent to all non-Admin nodes, the diagram below shows it only once for brevity):

Diagram showing how nodes join the distributed service cluster

Notice how the RT2 Node knows it needs to connect not only to the Admin Node, but also to RT1, after it has been accepted by the Admin Node into the cluster and received the full cluster topology.

Once a cluster has been formed, an open connection is maintained at all times between each pair of nodes in the cluster. Nodes that appear later in the cluster topology start a connection with all the nodes that appear earlier, as shown in the image below:

Diagram showing how nodes join the distributed service cluster

The Admin Node does not initiate any connections, all other nodes must start a connection to it. Each connection allows sending messages both ways, so all nodes can send data to all other nodes. The connections use mutual TLS based on the configured cryptographic key.

There is no limit to how many nodes may be part of a cluster, but the more nodes, the more overhead there will be just in keeping the cluster alive. We recommend caution in case more than a few dozen nodes are needed in a single cluster.

Status Endpoint#

The Status Endpoint on non-Admin nodes will not report that a node isServing until the node has successfully joined the cluster.

If for some reason, the node fails to join the cluster, it will not report isServing: true and hence may be considered unhealthy by monitoring systems. That should have the desired effect of restarting the node.

Resilience#

To maintain the connection alive for long periods even in case of idleness, a Ping message is sent by the client every few seconds. That message must be acknowledged by the receiver, ensuring data can flow both ways.

When a connection goes down, the connection initiator (i.e. the client) tries to reconnect using exponential backoff. Other nodes tell the Admin node when they lose connection to another node, so that if the communication with that node does not get back to normal within a reasonable time frame from different nodes, the node that is reported as being down may be expelled from the cluster.

The expelled node may try to re-join the cluster within a short time window if it can still communicate with the Admin Node. In case the node really crashed, the failure detection algorithm will detect that eventually, and the Admin Node will remove the node from the cluster. If the node is restarted by an external orchestrator, it can join the cluster again as a completely new node as far as the other nodes are concerned.

Changes in the cluster topology can happen at most every few seconds or so. Specially when using the Distributed Cache Plugin , topology changes should be kept to a minimum since each time that happens, nodes may need to rebalance data, causing a spike in traffic between nodes.

Deciding the hostname to use#

A node finds its own hostname or IP address by inspecting the available networks interfaces and choosing the first non-loopback network interface available.

If it does not find a suitable one, the node reads the HOST environment variable, which is set automatically in many deployments. If that is not set, 0.0.0.0 is used, which cannot work on real deployments but may be sufficient for testing purposes.

Before going to production, ensure that the nodes are able to find their own addresses and report that correctly. Failing to do so may cause communication errors when nodes attempt to join the cluster.

While it is not advisable, it is possible to disable the Distributed Service completely by setting the environment variable se.curity.distributed-service.enable to false. Doing this may break important functionality of the Curity Identity Server, such as the Distributed Cache Plugin and the Admin UI’s functionality to display metrics about individual nodes.

Security#

The Distributed Service requires a keystore with a key-pair in it to establish mutual TLS communication between nodes. Hence, communication is both encrypted and authenticated. Regardless, it is not advisable to expose the Distributed Service to the outside world unnecessarily (after all, its only purpose is to allow Curity nodes to communicate with each other).

The keystore used, by default, is the Configuration Service Cluster configuration’s keystore.

Hence, it is not necessary to configure a cryptographic key for the Distributed Service specifically. However, if you prefer to avoid key reuse, it is possible to configure a separate key. If doing so, remember to rotate the key periodically, as explained below.

Rotating the Distributed Service Key#

The simplest way to rotate the Distributed Service’s key is to run the generate-distributed-service configuration action.

Generating a new Distributed Service configuration, including a new key, using the CLI :

% request environments environment generate-distributed-service

The first time this action runs, it sets the secondary-key of the Distributed Service to the same value as the Configuration service key because that allows Curity nodes where the Distributed Service was not configured explicitly to communicate with the nodes using the newly generated key.

This action generates a new primary key for the Distributed Service and sets the secondary-key to the previous value of the primary key (or the Configuration cluster key if none was configured).

Running the action again has the result of rotating the key.

Even though it is possible to generate the keys externally and simply configure the Distributed Service to use them, it is highly recommended to use the action as explained above instead because that guarantees that the keys can be used securely.

Was this helpful?