Customizing Keycloak Using SPI

Customizing Keycloak Using SPI

1. Introduction Background

The most important factor that was considered while designing the authentication and authorization system in the project was the reliability of security.

In particular, hospital systems require much stricter security standards than general services.

Since we handle patient personal information and medical data, the stability and scalability of the authentication system were very important.

In the initial stage, we also considered implementing our own authentication system.

However, the authentication system involves very complex areas beyond simple login functionality, including session management, token issuance, access control, and responding to security vulnerabilities.

In particular, we decided to use a proven authentication solution because it had to reliably support modern authentication standards such as OAuth2, OpenID Connect, and SSO.

The solution we chose in that process was Keycloak.

Keycloak is a Red Hat-based open source IAM (Identity and Access Management) solution that natively supports OAuth2, OpenID Connect, SAML, and more.

Above all, the fact that it is a globally proven platform with stability was a significant advantage.

However, as I started working on the actual project, issues began to arise that could not be resolved with just the simple basic settings.

In particular, the project had not only the need for simple user authentication but also complex requirements such as the following.

- Multi-tenancy-based permission structure
- User separation by hospital
- Granular RBAC (Role Based Access Control)
- External hospital data-based user verification
- Real-time identity verification process
- Integration of internal service business logic

The basic Keycloak structure offers strong features, but it was more focused on internal member management.

On the other hand, the project needed to control user data organically within the service business flow, rather than simply storing users within Keycloak.

Especially due to the nature of the hospital project, not every user could register as a member.

This was because only users verified as patients or staff of the hospital should be allowed access.

In other words, a special validation logic that cross-references user information with external hospital data in real-time was absolutely necessary during the registration process.

To address these requirements, it was necessary to extend the internal authentication pipeline of Keycloak itself, and eventually, a customization structure utilizing SPI (Service Provider Interface) was adopted.

What is SPI (Service Provider Interface)?

SPI (Service Provider Interface) refers to an interface structure designed to extend the functionality of a framework from the outside.

Generally, the API (Application Programming Interface) that we use frequently is similar to an interface for developers to call the functions of external systems.

On the other hand, SPI has the opposite concept.

The framework allows developers to write their own implementations based on pre-defined rules and interfaces, and the engine automatically finds and executes them.

In other words, SPI can be seen as a 'plugin extension structure'.

Keycloak is designed around this SPI-based architecture.

Thanks to this, developers can freely extend the authentication flow without directly modifying the core source code of Keycloak.

For example, the following functions can be expanded in the form of SPI.

- User Storage
- Authentication
- Event Listener
- Protocol Mapper
- Password Policy
- User registration flow

- The project specifically utilized the Authentication SPI. The implementation method is relatively straightforward.

After implementing a custom class according to the interface specifications provided by Keycloak, build it as a JAR file and deploy it to the Keycloak server.

Once the Keycloak server is started, it automatically scans for the SPI implementation internally and recognizes it as a new authentication module.

In other words, the logic created by the developer behaves like a basic function of Keycloak.

The biggest advantage of this structure is that it does not require modifications to the internal engine.

In other words, you can keep the customization code and engine code separate even when upgrading the Keycloak version.

Thanks to this structure, the project was able to seamlessly integrate the hospital system's specific authentication requirements into the standard authentication platform.

3. The mechanism of the Authentication Flow

The authentication system of Keycloak operates on a pipeline structure called the Authentication Flow.

When a user performs login or registration, Keycloak executes the predefined authentication steps in order.

For example, a typical login flow may consist of the following steps.

- Enter user information
- Password verification
- OTP verification
- Additional authentication procedure
- Token issuance

Each stage is managed as a unit called Execution.

The project was implemented by inserting a custom validation step in the middle of this Authentication Flow.

First, we implemented an SPI-based authentication class and built it in JAR format to deploy it to the Keycloak server.

After deployment, the admin console recognized the custom logic as a new authentication execution.

Later, it could be directly inserted into the existing authentication flow on the admin screen.

In other words, it was possible to seamlessly integrate the hospital verification logic into the standard login process.

When a real user attempts to log in or sign up, Keycloak executes each authentication step in the configured order. When a custom step is invoked, it passes the AuthenticationFlowContext object. This object was a very important tool that could control the entire state of the authentication flow.

In the project, the AuthenticationFlowContext was used to perform the following tasks.

- Retrieve user input data
- Hospital DB integration
- Patient information validation
- Employee information validation
- Tenant permission validation
- Successful/Failed Authentication Handling

For example, the hospital identification number and personal information entered by the user were compared in real-time with an external hospital system to verify whether the user is an actual patient or staff member.

On successful validation, the context.success() was called to proceed to the next authentication step, and on failure, an error message was returned to halt the authentication.

Through this structure, it was possible to maintain the basic authentication flow of Keycloak while flexibly inserting project-specific business logic.

4. Technical Challenges: Data Consistency Issues in Distributed Environments

One of the most difficult problems during the actual implementation was data consistency in distributed environments.

In particular, the DI (Duplication Information) values used in the hospital user identification process had to be kept safe between various authentication stages.

Initially, we intended to use the AuthNote feature provided internally by Keycloak.

AuthNote is a temporary storage space for sharing data between stages within an authentication session.

In a single server environment, it operated relatively stably. However, the actual operating environment was a multi-server based cluster structure.

The issue occurred during the external security module authentication process.

The user is redirected to the external authentication server during the authentication process and then returns to Keycloak, during which the session may move to another node.

As a result, the following issues have occurred repeatedly.

- AuthNote data loss
- Session consistency mismatch
- Authentication status reset
- Failed to retrieve DI value
- Authentication interrupted

Initially, we reviewed the browser cookie-based storage method.

We also tested changes to Keycloak session settings and the Sticky Session structure.

However, these methods have been structurally difficult to serve as complete solutions.

Especially since the hospital accreditation system required high levels of stability and reliability, there must be no possibility of certification failures, even in some situations.

Ultimately, the core problem was to create a structure that can reliably share state data even in a distributed server environment.

To solve this issue, we decided to approach it by directly utilizing Keycloak's internal caching structure.

5. Problem Solving: Design of Distributed Cache Based on Infinispan

The technology chosen to solve the problem of authentication session loss occurring in a distributed environment was Infinispan.

Infinispan is a Java-based open-source distributed in-memory data grid solution.

Interestingly, Keycloak itself already uses Infinispan internally.

In other words, it was possible to use Keycloak's internal caching structure without adding separate Redis or external storage.

This provided the advantage of significantly reducing system complexity.

In the project, a custom cache access structure was implemented utilizing KeycloakSession and InfinispanConnectionProvider.

The implementation process consisted of three main stages.

The first was securing a cache instance.

We accessed the Cache Manager managed by Keycloak to create a dedicated cache space for storing custom data.

The second was a data storage structure based on state values (State Key).

A unique state value (State Key) is generated at the start of the authentication process and used as the key.

We have stored the necessary DI values as Values in the hospital certification process.

In other words, instead of directly passing the actual DI values in the certification flow, we have configured it to only pass the state values (State Key).

The third was the TTL (Time To Live) setting.

Since authentication process data is only needed temporarily, memory efficiency had to be considered.

Therefore, the data stored in the cache is configured with a TTL to be automatically deleted after a certain period of time.

Thanks to this structure, the user was able to safely restore the DI value with just the state value (State Key) after returning to Keycloak following external authentication.

The most important point was that the same data could be reliably shared even in a distributed server environment.

In other words, even if the server node changed during the authentication process, we were able to continue the authentication flow without session loss.

As a result, we were able to achieve a high level of authentication stability and data integrity required by the hospital project through an Infinispan-based architecture.

6. Conclusion

Customizing based on Keycloak SPI was more than just simple plugin development.

At first, I thought it would be a matter of adding a few simple authentication features, but in reality, I had to deeply understand the internal structure of the Keycloak engine and the authentication flow mechanism itself.

In particular, areas like Authentication Flow, AuthenticationFlowContext, SPI structure, and Infinispan cache architecture were difficult to understand with simple configurations alone.

However, by directly analyzing these structures and solving problems, my understanding of the authentication platform itself has significantly increased.

The most significant aspect was that we successfully integrated the 'complex requirements unique to our service' into the standard certification platform.

In particular, the hospital project had much stricter certification and security requirements than general services.

It wasn't just a simple login function; it had to verify whether it was an actual hospital user, detail permissions within a multi-tenancy structure, and provide a stable authentication experience even in a distributed environment.

Throughout this process, I was able to consider the overall system design, data integrity, and authentication stability beyond just implementing simple functions.

Ultimately, this project was a very challenging experience technically, and it was a valuable project that allowed me to grow as a developer.

I believe that this experience will serve as a very important benchmark when designing authentication and authorization systems in the future.

joshua

Site footer