1. Overview
The document management system had the ability to extract data from various documents and perform additional AI processing based on that data.
When a user selects multiple documents, the system is designed to convert the content of each document into Markdown format and then send it to the AI server for subsequent analysis or processing tasks.
During the initial implementation stage, the focus was primarily on completing the functionality itself, so the document processing logic was developed with a relatively simple structure.
In other words, when a request comes in, documents are processed one by one sequentially, and after all document processing is completed, the request is sent to the AI server.
The function itself operated normally, but performance issues began to emerge gradually in the actual operational environment.
The problem has worsened, especially in the following situations.
- When users request multiple documents simultaneously.
- This is the case when the document size increases.
- This is the case when the Markdown conversion time increases.
- This is the case where the waiting time is long before the AI request.
The crux of the problem was that as the number of documents increased, the total response time increased linearly.
For example, if it takes about 500ms to process a single document, the total processing time increases to several seconds or more due to the simple accumulation when there are 10 documents.
In particular, the Markdown extraction task was not short in processing time because it included both I/O and data processing tasks.
Ultimately, users experienced longer wait times during the 'document preparation process' rather than the AI features themselves, leading to a decline in overall user experience.
Additionally, on the server side, there was an issue of resource utilization efficiency decreasing as a single request occupied a long time.
Accordingly, the project concluded that structural performance improvements are necessary beyond simple functionality implementation.
In particular, it was decided to introduce an asynchronous-based parallel processing structure that can handle multiple independent tasks simultaneously to improve the overall processing flow.
2. Existing Structure and Problem Situations
The existing structure was a very typical sequential processing-based approach.
The request flow operated as follows.
1. It receives the client request.
2. Extract the Markdown from the first document.
3. Extract the Markdown from the second document.
Extracting the Markdown of the third document.
Completing the processing of all documents.
6. Requesting to the AI server.
7. Returning the result response.
In other words, it was a structure where the next document was processed after one document had been processed.
This method had the advantage of being simple to implement. Additionally, the exception handling flow could be relatively intuitively structured as well.
However, the following performance issues occurred in the actual operating environment.
The first issue was an increase in response time.
As the number of documents increased, the processing time increased almost proportionally.
For example, if it takes 1 second to process one document, the following results appeared.
- It takes about 1 second for 1 document.
- It takes about 5 seconds for 5 documents.
Processing 10 documents takes about 10 seconds.
In other words, the total processing time was a simple accumulation structure.
The second issue was the inefficient use of server resources.
The actual document processing tasks were independent of each other.
Thus, there was no reason to wait for Document B while processing Document A.
However, in the existing structure, all tasks were executed sequentially within a single thread flow.
Eventually, an inefficient structure occurred that did not fully utilize CPU and thread resources.
The third issue is the degradation of user experience.
Users experienced situations where the waiting time for Markdown extraction felt longer than the AI functionality itself.
Especially as the number of documents increased, the UI response delays became more severe, and some users mistook it for the system itself having frozen.
The fourth issue was a lack of scalability.
As the project grew, the number of documents to be processed was likely to continue increasing.
However, the existing structure was highly vulnerable to an increase in request volume.
Ultimately, it became clear that there was a need to transition the structure itself to a parallel processing-based one, beyond just the level of simple function implementation.
3. Improvement of asynchronous processing-based structure
To solve the problem, the project has transitioned the document-based Markdown extraction task to an asynchronous parallel processing structure.
The core idea was very simple.
It was that 'each document processing is independent, so they can be processed simultaneously.'
In other words, instead of processing sequentially in a single request flow as before, we separated each document processing task into individual asynchronous tasks.
The improved structure operated in the following flow.
1. Receive client requests.
2. Create Markdown extraction tasks by document.
3. Executes each task in parallel.
4. Waits until all tasks are completed.
5. Collecting the result data.
6. Making a request to the AI server.
7. Returning the response.
In the project, we utilized the @Async feature of the Spring Framework to structure asynchronous execution.
Each document processing method is designed to run in a separate asynchronous thread.
In other words, when requesting 10 documents, the structure allows for 10 tasks to be processed simultaneously.
Of course, simple parallel execution alone was not enough.
The most important thing was to safely proceed to the next stage after all tasks were completed.
To achieve this, we utilized CompletableFuture.
Each asynchronous task was returned in the CompletableFuture format, and at the final stage, CompletableFuture.allOf() was used to accurately synchronize the timing of the completion of all tasks.
Thanks to this structure, I was able to achieve the following effects.
- It allows simultaneous handling of independent tasks.
- You can reduce the overall processing time.
- You can increase CPU resource utilization.
- Response delays can be reduced.
In particular, while previously the document processing time was simply cumulative, after improvements, the overall response time started to be determined based on the longest task duration.
For example, if 10 documents need to be processed at a rate of 1 second each, the existing structure would take approximately 10 seconds, but with the parallel processing structure, it can be reduced to about 1 to 2 seconds.
4. Applied Technologies and Implementation Methods
The key technologies used in this performance improvement were Spring's @Async and Java CompletableFuture.
Firstly, @Async is an asynchronous processing feature provided by the Spring Framework.
Declaring @Async on a specific method allows that method to be executed asynchronously in a separate thread.
In the project, @Async was applied to the Markdown extraction method for each document.
This allows each document processing task to be configured to run independently and in parallel.
For example, it could be implemented in the following way.
- call extractMarkdownAsync(document).
- CompletableFuture
- processes in a parallel execution structure.
However, it was difficult to control the overall flow with just @Async. This is because it was necessary to manage the completion time of each task accurately.
To solve this, I actively used CompletableFuture.
CompletableFuture is an object that represents the result of an asynchronous computation and provides very powerful features for combining or synchronizing the completion of multiple tasks.
In the project, it was used in the following way.
- Creates asynchronous tasks per document.
- Collects a list of CompletableFutures.
- Wait for all completion using CompletableFuture.allOf().
- Aggregate the result data.
In particular, CompletableFuture allows for much more flexible flow control than a simple Future.
For example, it was easy to implement features such as the following.
- You can link follow-up actions upon completion.
- You can configure an exception handling chain.
- You can combine parallel tasks.
- You can manage asynchronous flows.
The management of the thread pool was also important when introducing parallel processing.
Unlimited parallel execution can actually increase the system load.
Therefore, the project configured an appropriate ThreadPoolTaskExecutor to limit the maximum number of concurrent executions.
This allowed us to prevent system instability caused by excessive parallel processing.
5. Considerations and Operational Stability
Asynchronous and parallel processing structures provide very strong advantages in terms of performance, but they also increase operational complexity.
Therefore, the project took into account not only simple speed improvements but also a stable operational structure.
The first aspect we considered was thread pool management.
Simply increasing the number of parallel processes can actually lead to performance degradation.
For example, the following problems may occur.
- Threads may be created excessively.
- CPU context switching may increase.
- Memory usage may increase.
- Server load may increase.
Therefore, the project set an appropriate thread pool size considering the server specifications and average request volume.
The second consideration was exception handling.
In a parallel processing environment, some tasks may fail.
For example, out of 10 documents, only 1 may fail to process.
In this case, there was a need for a policy on whether to treat the entire request as a failure or to use only some successful results.
The project was designed to minimize the impact on the overall flow in the event of an exception.
We also utilized the exception handling features of CompletableFuture to reliably handle exceptions in asynchronous tasks.
The third was synchronization at the completion point.
In parallel processing, it is very important to proceed to the next step only after all tasks have been completed exactly.
If an AI server request occurs before some tasks are completed, data consistency issues can arise.
Thus, the project used a structure based on CompletableFuture.allOf() to accurately control the overall completion timing.
The fourth was monitoring and traceability.
In an asynchronous structure, the flow of logs is distributed, making it difficult to analyze the cause of failures.
To resolve this, we have configured it to track logs and processing times by task.
This allowed us to quickly identify specific document processing delays or exception situations.
6. Improvement Results and Conclusions
Since the introduction of the asynchronous processing-based architecture, the project has seen meaningful performance improvements in various aspects.
The first noticeable aspect was the reduction in response time.
In the existing structure, the processing time increased linearly with the number of documents, but after the improvement, the overall response time was significantly reduced thanks to the parallel processing structure.
In particular, the improvement effect was even more pronounced as the number of documents increased.
For example, a task that used to take several seconds to process 10 documents can now be completed in a much shorter time after parallel processing.
The second was the increase in the efficiency of server resource utilization.
In the existing structure, CPU and thread resources were not fully utilized due to sequential processing.
In contrast, after the improvements, it has become possible to handle multiple tasks simultaneously, allowing for much more efficient use of system resources.
The third was improving user experience.
Users experienced faster response times, and the waiting stress during the use of AI features was also greatly reduced.
The fourth was structural scalability.
Through this experience, I was able to realize that asynchronous processing and parallel processing structures are not only important for simple performance improvement but also crucial in terms of future scalability.
We have confirmed that it is particularly effective in environments like the following.
- It has a structure with a lot of independent work.
- It is a structure with a high proportion of I/O processing.
- It is an environment that processes multiple requests in parallel.
- This is the AI integration preprocessing structure.
In conclusion, this improvement can be seen as a case of performance optimization that goes beyond simple feature implementation to improve the processing structure itself.
Asynchronous processing and parallel processing structures are very important core design methods in modern backend systems, and it was a very meaningful experience that can be actively utilized in future similar structural designs.
messi