Optimization Experience of MSA Backend for Large-scale Clinical Data Processing and Legacy Integration

Optimization Experience of MSA Backend for Large-scale Clinical Data Processing and Legacy Integration

Experience in MSA backend optimization for large-scale clinical data processing and legacy integration (AMC PROMs construction)

1. Introduction and technical challenges

The core data supporting 'Patient-Centered Care', a key paradigm in modern medicine, is PROMs (Patient-Reported Outcomes Measures). I was responsible for backend development of the AMC PROMs platform at Asan Medical Center, where I worked on the mission of reliably collecting and analyzing vast clinical data and integrating it with the enormous hospital core system (AMIS). I would like to share the process of overcoming four technical challenges encountered during this process: authentication limitations, real-time data visualization bottlenecks, large-scale retrospective calculation OOM, and the risk of failure propagation from an architectural perspective.

2. Non-standard payload-based authentication adapter overcoming HTTP header constraints

[Problem Situation: Limitations on HTTP Header Manipulation in Legacy Environments] The latest MSA environment and Spring Security follow the standard protocol for exchanging JWT via the HTTP Authorization Header. However, the hospital's core system (AMIS) was unable to send the token within the standard HTTP header due to structural constraints and instead had to include the encrypted authentication information in a specific field (encToken) within the Request Body (AmcData).

[Solution: Implementation of ContentCachingRequestWrapper and Custom Filter] To accommodate non-standard communication while maintaining internal security standards, a custom Security Filter was implemented at the front end. Due to the servlet nature, the InputStream is lost after being read once, so ContentCachingRequestWrapper was applied to cache the Request Body. The encToken from the cached Body was extracted in the filter, validated against the Keycloak server, and valid information was injected into the ThreadLocal-based ContextHolder. This completed an authentication adapter that accepts external non-standard integrations while strictly following the internal standard authorization control flow.

3. Introduction of Kafka & CQRS Pattern for Real-time Clinical Data Visualization

[Problem Situation: Real-time DB Bottleneck Due to Complex Medical Formula Calculations] Before the patient submits the questionnaire and enters the consultation room, the physician's monitor should already visualize the pain index and quality of life (QoL) change trends. If the medical staff has to join the vast historical survey response tables and calculate medical rules every time they open the dashboard, it will cause severe query latency, leading to inconvenience for the users.

[Solution: Building an Event-Driven and Separate Query Model (QM)]

To ensure real-time capabilities, we combined the CQRS (Command Query Responsibility Segregation) architecture with Kafka event streaming.

Lightweighting of commands: When a patient submits a survey, it quickly inserts only the source data and returns the response, while asynchronously publishing the event to the Kafka Topic immediately.

Asynchronous pre-calculation: The consumer subscribes to events and performs heavy computations (conditional question processing, score aggregation) in the background based on a dynamic rules engine (SpEL).

Query model optimization: The calculated results are loaded into a 'statistics table' that is extremely flattened for querying. As a result, medical staff can immediately read summary data without heavy computation (O(1) performance), making it possible to have comfortable queries even as data accumulates.

4. JPA OOM prevention optimization when processing large-scale retroactive calculations

[Problem Situation: Saturation of Persistence Context (L1 Cache) when querying hundreds of thousands of records] It has been confirmed that when statistical criteria are changed and past inquiry data for several years is recalculated (retroactively applied) at once, more than 100,000 entities are loaded into the persistence context (1st level cache) and a critical issue occurs where the garbage collector (GC) fails to operate, leading to an Out of Memory (OOM) error.

[Solution: Chunk Processing and Explicit Memory Release] To avoid compromising system availability, the 'chunk-based memory control' technique was applied. The entire data is queried in slices of 1,000 records using Offset and Limit, and bulk insert is performed after calculations. The most crucial tuning was explicitly calling entityManager.clear() after every 1,000 record operation to clear the persistence context. This allows the server's heap memory usage to remain at a flat level of 1,000 records, even as the target increases to millions of records.

5. Design for Blocking Cascading Failure in Core System Interconnection

[Problem Situation: Exhaustion of Internal Thread Pool Due to External API Delays] When temporary loads or delays occur in the AMIS system, the Tomcat Threads of the PROMs server waiting for its response could enter an infinitely blocked state, posing a risk of cascading failure that could paralyze the entire platform.

[Solution: Intelligent RestClient and Fault Tolerance Logic]

To block sources of risk, the RestClient of Spring 6 was introduced to redesign the communication client defensively.

Strict Timeout Isolation: By enforcing a Connect Timeout of 5 seconds and a Read Timeout of 30 seconds, we ensured that connections are terminated within 30 seconds in the event of a remote server failure, thereby securing system independence and reclaiming resources.

Error Bypass Technique: In the event of a failure response, we implemented logic to avoid triggering an Exception and instead accepted certain error codes (ignorableMessageIds) as variable arguments to bypass the issue. This protects the core system's stability from negatively impacting the patient interview experience.

6. Conclusion

This project was a valuable experience in rigorously thinking about and responding to how backend systems should withstand the drastic increase in traffic and the infinite accumulation of data in an enterprise environment, ultimately achieving success. The flexible acceptance of non-standard authentication methods, securing real-time data through CQRS, optimizing large memory through primary cache control, and defensive programming techniques to prevent failure propagation have become strong technical assets that not only implement features but also instill 'scalability and robustness' in the system. Over the past six months, the entire team dedicated themselves tirelessly to completing the project, overcoming numerous challenges. The emotion I felt when receiving a heartfelt letter of gratitude from the client at the end of that intense process is unforgettable. As an engineer, I am immensely happy and fulfilled to have firmly etched the name 'NEXTREE' in the client's mind as the best technology partner they can trust. Moving forward, I will use this experience as nourishment to design a robust architecture that maximizes the business value of our clients.