The current distributed cache implementation is built on top of https://infinispan.org[Infinispan], a high-performance, distributable in-memory data grid.
When you start {project_name} in production mode, by using the `start` command, caching is enabled and all {project_name} nodes in your network are discovered.
{project_name} allows you to either choose from a set of pre-defined default transport stacks, or to define your own custom stack, as you will see later in this {section}.
When you start {project_name} in development mode, by using the `start-dev` command, {project_name} uses only local caches and distributed caches are completely disabled by implicitly setting the `--cache=local` option.
In order to achieve an optimal runtime and avoid additional round-trips to the database you should consider looking at
the configuration for each cache to make sure the maximum number of entries is aligned with the size of your database. More entries
you can cache, less often the server needs to fetch data from the database. You should evaluate the trade-offs between memory utilization and performance.
When one {project_name} node updates data in the shared database, all other nodes need to be aware of it, so they invalidate that data from their caches.
Authentication sessions are created whenever a user tries to authenticate. They are automatically destroyed once the authentication process
completes or due to reaching their expiration time.
The `authenticationSessions` distributed cache is used to store authentication sessions and any other data associated with it
during the authentication process.
By relying on a distributable cache, authentication sessions are available to any node in the cluster so that users can be redirected
to any node without losing their authentication state. However, production-ready deployments should always consider session affinity and favor redirecting users
to the node where their sessions were initially created. By doing that, you are going to avoid unnecessary state transfer between nodes and improve
authenticate to any application without being asked for their credentials again. For each application, the user authenticates with a client session, so that the server can track the applications the user is authenticated with and their state on a per-application basis.
User and client sessions are automatically destroyed whenever the user performs a logout, the client performs a token revocation, or due to reaching their expiration time.
By relying on a distributable cache, cached user and client sessions are available to any node in the cluster so that users can be redirected
to any node without the need to load session data from the database. However, production-ready deployments should always consider session affinity and favor redirecting users
These in-memory caches for user sessions and client sessions are limited to, by default, 10000 entries per node which reduces the overall memory usage of {project_name} for larger installations.
The internal caches will run with only a single owner for each cache entry.
As an OpenID Connect Provider, the server is capable of authenticating users and issuing offline tokens. When issuing an offline token after successful authentication, the server creates an offline user session and offline client session.
The following caches are used to store offline sessions:
* offlineSessions
* offlineClientSessions
Like the user and client sessions caches, the offline user and client session caches are limited to 10000 entries per node by default. Items which are evicted from the memory will be loaded on-demand from the database when needed.
.Password brute force detection
The `loginFailures` distributed cache is used to track data about failed login attempts.
This cache is needed for the Brute Force Protection feature to work in a multi-node {project_name} setup.
.Action tokens
Action tokens are used for scenarios when a user needs to confirm an action asynchronously, for example in the emails sent by the forgot password flow.
The `actionTokens` distributed cache is used to track metadata about action tokens.
TIP: You can see the applied Infinispan configuration in the logs by configuring `--log-level=info,org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory:debug`.
{project_name} automatically creates all required caches with the expected configurations. You can add additional caches or override the default cache configurations in `conf/cache-ispn.xml` or in your own file provided via `--cache-config-file`.
To see the applied Infinispan configuration in the logs, configure `--log-level=info,org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory:debug`.
For configuration of {project_name} server for high availability and multi-node clustered setup there was introduced following CLI options `cache-remote-host`, `cache-remote-port`, `cache-remote-username` and `cache-remote-password` simplifying configuration within the XML file.
The CLI options `cache-remote-username` and `cache-remote-password` are optional and, if not set, {project_name} will connect to the {jdgserver_name} server without presenting any credentials.
If the {jdgserver_name} server has authentication enabled, {project_name} will fail to start.
Configuring {project_name} to be aware of your network topology, increases data availability in the presence of hardware failures, as Infinispan is able to ensure that data is distributed correctly.
For example, if `num_owners=2` is configured for a cache, it will ensure that the two owners are not stored on the same node when possible.
[NOTE]
====
By default, user and client sessions are safely stored in the database, and they are not affected by these settings.
The remaining distributed caches are affected by this configuration.
====
The following topology information is available to configure:
If your {project_name} cluster is deployed between different datacenters, use this option to ensure the data replicas are stored in a different datacenter.
It prevents data loss if a datacenter goes offline or fails.
+
Use the SPI option `spi-cache-embedded--default--site-name` (or environment variable `KC_SPI_CACHE_EMBEDDED\__DEFAULT__SITE_NAME`).
The value itself is not important, but each datacenter must have a unique value.
+
For example: `--spi-cache-embedded--default--site-name=site-1`
Rack name::
If your {project_name} cluster is running in different racks on your datacenter, set this option to ensure the data replicas are stored in a different physical rack.
It prevents data loss if a rack is suddenly disconnected or fails.
+
Use the SPI option `spi-cache-embedded--default--rack-name` (or environment variable `KC_SPI_CACHE_EMBEDDED\__DEFAULT__RACK_NAME`).
The value itself is not important, but each rack must have a unique value.
+
For example: `--spi-cache-embedded--default--rack-name=rack-1`
Machine name::
If you have multiple {project_name} instances running on the same physical machine (using virtual machines or containers for example), use this option to ensure the data replicas are stored in different physical machines.
It prevents data loss against a physical machine failure.
+
Use the SPI option `spi-cache-embedded--default--machine-name` (or environment variable `KC_SPI_CACHE_EMBEDDED\__DEFAULT__MACHINE_NAME`).
The value itself is not important, but each machine must have a unique value.
+
For example: `--spi-cache-embedded--default--machine-name=machine-1`
+
[NOTE]
====
The {project_name} Operator automatically configure the machine name based on the Kubernetes node.
It ensures that if multiple pods are scheduled on the same node, data replicas are still replicated across distinct nodes when possible.
We recommend to set up anti-affinity rules and/or topology spread constraints to prevent multiple Pods from being scheduled on the same node, further reducing the risk of a single node failure causing data loss.
The default stack is set to `jdbc-ping` when distributed caches are enabled, which is backwards compatible with the defaults in the 26.x release stream of {project_name}.
|`kubernetes` (deprecated) |TCP|DNS resolution using the JGroups `DNS_PING` protocol. It requires to set `jgroups.dns.query` to the headless service FQDN.
|`tcp` (deprecated)|TCP|IP multicast using the JGroups `MPING` protocol. See below on how to configure a unique `jgroups.mcast_addr` or `jgroups.mcast_port` for each cluster.
|`udp` (deprecated)|UDP|IP multicast using the JGroups `PING` protocol. See below on how to configure a unique `jgroups.mcast_addr` or `jgroups.mcast_port` for each cluster.
When using the `tcp`, `udp` or `jdbc-ping-udp` stack, each cluster must use a different multicast address and/or port so that their nodes form distinct clusters.
By default, {project_name} uses `239.6.7.8` as multicast address for `jgroups.mcast_addr` and `46655` for the multicast port `jgroups.mcast_port`.
NOTE: Use `-D<property>=<value>` to pass the properties via the `JAVA_OPTS_APPEND` environment variable or in the CLI command.
Please refer to {infinispan_embedding_docs}#cluster-transport[Setting up Infinispan cluster transport] and {infinispan_embedding_docs}#customizing-jgroups-stacks_cluster-transport[Customizing JGroups stacks] for further documentation.
. Set the option `cache-embedded-mtls-enabled` to `false`.
. Follow the documentation in http://jgroups.org/manual5/index.html#ENCRYPT[JGroups Encryption documentation] and {infinispan_embedding_docs}#secure-cluster-transport[Encrypting cluster transport].
With TLS enabled, {project_name} auto-generates a self-signed RSA 2048 bit certificate to secure the connection and uses TLS 1.3 to secure the communication.
The keys and the certificate are stored in the database so they are available to all nodes.
By default, the certificate is valid for 60 days and is rotated at runtime every 30 days.
Use the option `cache-embedded-mtls-rotation-interval-days` to change this.
When using a service mesh like Istio, you might need to allow a direct mTLS communication between the {project_name} Pods to allow for the mutual authentication to work.
Otherwise, you might see error messages like `JGRP000006: failed accepting connection from peer SSLSocket` that indicate that a wrong certificate was presented, and the cluster will not form correctly.
You then have the option to allow direct mTLS communication between the {project_name} Pods, or rely on the service mesh transport security to encrypt the communication and to authenticate peers.
To allow direct mTLS communication for {project_name} when using Istio:
* Apply the following configuration to allow direct communication.
Although not recommended for standard setups, if it is essential in a specific setup, you can configure the keystore with the certificate for the transport stack manually. `cache-embedded-mtls-key-store-file` sets the path to the keystore, and `cache-embedded-mtls-key-store-password` sets the password to decrypt it.
The truststore contains the valid certificates to accept connection from, and it can be configured with `cache-embedded-mtls-trust-store-file` (path to the truststore), and `cache-embedded-mtls-trust-store-password` (password to decrypt it).
To restrict unauthorized access, always use a self-signed certificate for each {project_name} deployment.
For more details about JGroups transport, check the http://jgroups.org/manual5/index.html#Transport[JGroups documentation page] or the {infinispan_embedding_docs}#cluster-transport[Infinispan documentation page].
If you run {project_name} instances on different networks, for example behind firewalls or in containers, the different instances will not be able to reach each other by their local IP address.
In such a case, set up a port forwarding rule (sometimes called "`virtual server`") to their local IP address.
This section provides methods to verify that your {project_name} cluster has formed correctly and that network communication between instances is functioning as expected.
It is crucial to perform these checks after deployment to ensure high availability and data consistency.
To verify if the cluster is formed properly, check one of these locations:
* Admin UI
+
Access the {project_name} Web UI, typically available at `++https://<your-host>/admin/master/console/#/master/providers++`.
Under the *Provider Info* section, locate the *connectionsInfinispan* entry.
Click on *Show more* to expand its details.
You should find information about the cluster status and the health of individual caches.
+
image:server/infinispan_info.png[Infinispan Cluster Information in Web UI]
* Logs
+
Infinispan logs a cluster view every time a new instance joins or leaves the cluster.
Search for log entries with the ID `ISPN000094`.
+
A healthy cluster view will show all expected nodes.
For example:
+
[source,text]
----
ISPN000094: Received new cluster view for channel ISPN: [node1-26186|1] (2) [node1-26186, node2-37007]
----
+
This log entry indicates that the cluster named "ISPN" currently has 2 nodes: `node1-26186` and `node2-37007`.
The `(2)` confirms the total number of nodes in the cluster.
* Metrics
+
{project_name} exposes Infinispan metrics via a Prometheus endpoint, which can be visualized in tools like Grafana.
The metric `vendor_cluster_size` shows the current number of instances in the cluster.
You should verify that this metric matches the expected number of running instances configured in your cluster.
+
Refer to <@links.observability id="metrics-for-troubleshooting-clustering-and-network" anchor="_cluster_size"/> for more information.
To enable histograms for the cache metrics, set `cache-metrics-histograms-enabled` to `true`.
While these metrics provide more insights into the latency distribution, collecting them might have a performance impact, so you should be cautious to activate them in an already saturated system.