Experimental

Distributed Cache#

The Distributed Cache data source can be used to store sessions and Nonces (single-usage tokens). For this reason, it may be configured as the Caching Service .

It is built on top of the Distributed Service , and hence requires that each Curity node in a cluster can communicate with every other node.

Uniquely among data sources, only one Distributed Cache Plugin can be configured. This is because the cache stores data in memory, which is a global resource, so it would not make sense to have more than one instance.

Overview#

The Distributed Cache is a distributed storage solution that allows storing short-lived, high volume data, in memory for fast access and simplicity since a database is not required and the cache itself evicts expired entries as necessary.

The amount of data a cluster may be able to hold in memory scales with the size of the cluster. The more nodes, the more data can be stored simultaneously.

The nodes in the cluster attempt to divide evenly the number of entries they each hold. When the cluster topology changes (e.g. due to a node crashing or new nodes being added), the data is rebalanced in an efficient manner so that only a small number of entries need to be re-distributed between the nodes to achieve even storage again.

Notice that without a persistent data-source being configured to back the distributed-cache, all data is lost in case all nodes in the cluster are restarted! This means that when upgrading Curity, there must be a data-source configured so that when the new cluster is started up, it will gradually re-populate the cache by reading from the data-source.

Configuration#

The Distributed Cache does not have any mandatory configuration values. Sensible defaults are used for everything, but for most deployments, it is recommended that each configuration setting be explicitly configured to match the available resources, as explained below.

backups#

Default: 2

The number of nodes that should store each entry, besides the owner node.

This number determines how resilient the cluster is. Let backups be set to n. That means that each entry will be held in the owner, plus n nodes, i.e. n + 1 nodes.

If the cluster has n * 2 nodes and n nodes go down at the same time, no data is lost since each entry is held by n + 1 nodes. However, if n + 1 nodes go down simultaneously, then data may be lost since there’s no guarantee that every entry was stored in one of the surviving nodes.

When using a backing Bucket to store data, no data will be lost, even if all nodes go down simultaneously.

cleanup-period#

Default: 300 seconds

The period between cleanup calls in seconds. Cleanup calls evict expired entries from the cache.

Cleaning up the cache only removes in-memory data. Persisted data in the Bucket data-source, if configured, are not deleted, but entries that existed before the cleanup operation ran are ignored on subsequent reads. Bucket data-sources may already support automatic cleanup, but in the future, Bucket entries may also be optionally removed.

max-entries-per-message#

Default: 1024

The maximum number of entries to send to another node in a single message.

A node may need to send many entries at once to another node when, for example, rebalancing the data due to a topology change. In such cases, allowing too many entries to be sent at once may cause problems due to the memory overhead of each message and the large size of each resulting request. On the other hand, if sending only a few entries per message, it may take too long to finish sending all messages.

The ideal number for this setting depends on the total number of entries each node is expected to hold, as well as the total number of nodes in the cluster.

enable-metrics#

Default: true

Whether metrics for the cache should be published.

The following metrics are published by each node, if enabled:

Metric NameTypeLabelsMeaning
idsvr_cache_get_successfulTimercache_type, cache_idSuccessful cache reads.
idsvr_cache_get_errorTimercache_type, cache_idFailed cache reads.
idsvr_cache_update_successfulTimercache_type, cache_idSuccessful cache updates.
idsvr_cache_update_errorTimercache_type, cache_idFailed cache updates.
idsvr_cache_remove_successfulTimercache_type, cache_idSuccessful cache removals.
idsvr_cache_remove_errorTimercache_type, cache_idFailed cache removals.
idsvr_cache_entries_countGaugecache_type, cache_idTotal entries currently in the cache.
idsvr_cache_expired_totalCountercache_type, cache_idTotal entries that have been automatically expired.

store-data-in-admin-node#

Default: true

Whether the admin node should store cache data. When disabled, data is only distributed among runtime nodes.

In deployments where the admin node is not used to handle user requests, we recommend turning this off.

persistence#

Enable persistence to back the in-memory cache.

Only the owner node for each entry writes to and reads from the persistent store. Backup nodes hold in-memory copies only. This means that if the owner node goes down, backups still serve the entry from memory, but persistence is temporarily unguarded until a rebalance assigns a new owner.

Requires the following configuration settings:

persistence-bucket/data-source/id#

The ID of a Bucket data-source used to persist the data that is stored in the cache.

persistence-mode#

  • read-mode (default: verify-against-persistent-data) - one of:
    • from-memory - Do not check persistent data if in-memory data exists. This mode offers higher performance, but may give inconsistent data if the in-memory data does not reflect the persisted data for a period of time.
    • verify-against-persistent-data - Always read from persistent data and check that it matches in-memory data. In case it does not match, a warning is logged. This mode is recommended for debugging purposes only.
  • write-mode (default: write-blocking) - one of:
    • write-async - Write data to the persistence layer asynchronously. Failures will be retried a few times within a short period. This mode may cause inconsistencies in the data because the write may happen much later, after the write call has already returned, but has higher performance.
    • write-blocking - Block all writes until the data has been persisted. This mode is slower, but ensures data consistency.

The full configuration reference can be found here .

How the Distributed Cache works#

The Admin Node is responsible for managing the cluster topology. When a node joins the cluster, it may take a few seconds for the Admin Node to accept the “join request” and modify the cluster topology with one or more new nodes. The same applies when removing nodes due to the failure algorithm detecting that a node is down, or just requesting to leave the cluster because it is being shut down graciously.

See more information about cluster formation and resilience in the Distributed Service documentation.

Entries are assigned to nodes using Rendezvous Hashing. For each entry, every node in the cluster is scored by hashing the combination of the entry’s ID and the node’s identifier. Nodes are then ranked by their score, and the highest-scoring node becomes the entry’s owner. The next backups nodes in the ranking become backup holders. Because the algorithm is deterministic and stateless, all nodes independently agree on which nodes should hold a given entry without any coordination.

When an entry is inserted, the inserting node computes the target nodes (owner + backups) and sends the entry to all of them directly. If the inserting node is itself among the targets, the entry is written to its local cache without a network round-trip.

When a node is temporarily down (i.e. the connection is currently failing, but the node is not yet removed from the cluster), other nodes keep track of which nodes have missed write operations. Deletions are also tracked via tombstones so they can be replayed. As soon as the connection to the affected node recovers, a targeted resync is triggered: each node sends the recovering node all entries it should hold (according to the current topology), as well as any tracked deletions.

If the node takes too long to recover, it is removed from the cluster by the Admin Node, in which case the remaining nodes perform a full rebalance operation to redistribute the data. The algorithm guarantees that only a minimum number of entries need to change location when this happens. However, for large caches, this can still involve significant amounts of data being sent between nodes, hence topology changes should be kept to a minimum.

Example cluster rebalancing data#

The following figures show how nodes may store data and how a data rebalancing operation may be performed when the cluster topology changes:

Example initial cluster state

In this example, as the number of backups is configured to 1, each entry is stored in 2 nodes. Entries with the same ID are represented with the same color, and a gray line is drawn between them.

If we add a new node RT4 to the cluster, without changing the actual data, the cluster may look like this after rebalancing:

Cluster state after a node has been added

The green lines show the entries that had to be moved to another node. In this case, there were only two entries moved:

  • RT2 sent a copy of the dark-green entry to RT4. RT3 no longer has to keep it locally since only 2 copies are needed in the whole cluster, so it can delete it from memory.
  • RT2 sent a copy of the purple entry to RT4. RT1 no longer has to keep it locally, so it deletes it from memory.

Notice that before rebalancing, each node had to hold between 3 and 4 entries. After the new node was added, most nodes hold only 3 entries, with the new node happening to only hold 2.

If the topology changes while a data rebalancing operation is in progress, data may be lost since the rebalance operation that is in progress will need to be cancelled (since it may become impossible to communicate with nodes that were previously supposed to be sent some data). However, this should be highly unlikely because the rebalance operations minimizes the amount of data that has to be transferred, and hence should complete quickly. Also, the Admin Node does not change the cluster topology too frequently, which should always give the nodes enough time to finalize rebalancing operations.

Now, if another node gets removed from the cluster, say RT1, the cluster may end up looking like this:

Cluster state after a node is removed

With the cluster back to having only 4 nodes, each node again has to hold a little bit more data than before.

Notice that as soon as the cluster rebalances the data, the number of backups for each entry is upheld again.

Was this helpful?