Development of Error Reporting Functionality Based on Apache POI

Development of Error Reporting Functionality Based on Apache POI

1. Overview and Background

During the testing phase of the large hospital next-generation construction project, thousands of logs were continuously generated daily. To quickly identify issues occurring during the testing process and verify quality, it was essential to check the logs daily and categorize and organize the types of errors.

At that time, it took an average of more than one hour per day to manually create error reports, and on days with a large volume of logs, it could take even longer. Additionally, due to the nature of manual work, there was also the possibility of omissions or misclassifications.

Accordingly, we planned and developed an automation system with the following goals.

  • Remove repetitive manual tasks

  • Build a reporting system capable of reliably processing large volumes of logs

  • API provided to allow anyone to instantly generate reports in the same format

  • Structural design that can flexibly respond to additional analysis requirements in the future

As a result, we were able to fully automate the existing manual log analysis process by building an Excel reporting automation API based on Apache POI.

2. Background of Apache POI introduction and technology selection process

2-1. Background for selecting the reporting format

Initially, we also considered providing data in a simple JSON response format.
However, in actual test practice, most personnel were reviewing data based on Excel, and for the following reasons, we determined that the Excel format was the most suitable.

  • The format that test personnel and business users are most familiar with.

  • Sorting, filtering, and conditional searches are very easy

  • Additional analysis and secondary processing are easy

  • Mail attachments and sharing are convenient, and they can be immediately used as operating report materials.

Especially not just at the level of simply 'viewing the logs,'
I judged that it is important to produce deliverables that can be used immediately in real operations and testing sites.

Accordingly, I decided to make the final deliverable format based on Excel (.xlsx).

2-2. Reasons for Selecting Apache POI

We reviewed several options during the process of selecting a library for creating Excel files.

The most important factors I considered in the process were as follows.

1) Stability in processing large volumes of data

Since we had to process tens of thousands of log data entries per day, memory stability was the most important factor. Most common Excel libraries tended to load all data into memory before generating files, which posed a risk of Out Of Memory (OOM) in large data environments.

Apache POI supports SXSSF (Streaming mode), allowing us to keep only some of the data in memory while generating files, making it very suitable for large-scale processing.

2) High Compatibility with the Java Ecosystem

Since the entire backend of the project was built on Java + Spring, we needed libraries that could seamlessly integrate with the Java environment.

Apache POI is the most widely used Excel processing library in the Java standard ecosystem,

  • it has a rich maintenance reference.

  • Well-documented

  • There are many community cases

  • The fact that its stability has been validated was an advantage.

Especially due to the nature of the next-generation project, it was important to use libraries that have high universality and standardization rather than technologies that can only be understood by specific individuals, as multiple developers had to maintain it together.

3-1. Created Excel from collected log data based on Apache POI

First, the log centralization system Loki API was used to convert the log data that occurred during a specific period into structured objects, and then an Excel report was generated using Apache POI.

The report is structured as follows.

  • Service error status sheet

  • Error Code Statistics Sheet

  • Detailed Error Log Sheet

  • Hourly occurrence status sheet

Each sheet has been implemented to be generated dynamically,
The test target service is designed to be adaptable without the need for separate modifications even if it changes.

3-2. Application of SXSSF and memory optimization

Initially, I attempted to implement it using the standard XSSFWorkbook method.

However, during the process of handling tens of thousands of log data entries, the memory usage sharply increased, and there was an actual risk of OOM.

To solve this, we applied the SXSSFWorkbook method of Apache POI.

SXSSF has the following characteristics.

  • Keep only the number of schedule rows in memory

  • Flush excess data to a temporary disk file

  • It is possible to create large files in a streaming manner

This allowed us to safely handle large amounts of logs while reliably maintaining memory usage.

However, there were also the following constraints in SXSSF.

  • Rows that have been flushed cannot be modified afterwards.

  • Data can only be written sequentially

  • There are restrictions on accessing the previous row

Therefore, we considered the following points during implementation.

  • Define the header/style first.

  • Design the data entry order in advance

  • Remove logic that requires post-processing

  • The sheet structure was determined in advance

Understanding these constraints and reflecting them in the design phase greatly contributed to a stable implementation.

4. Conclusion

Through the development of this functionality, we were able to reduce manual tasks that previously took about 1 hour a day to approximately 3 seconds, and the reporting, which had only reached a simple count verification level, was able to advance to a level capable of detailed failure analysis.

Through this experience, I realized that it is extremely important for a backend engineer to consider not only creating code that functions but also the scale of data, maintainability, and scalability that can occur in an operational environment.

In the future, I want to grow as a developer who thinks about structures that can solve real-world problems and enhance operational efficiency, beyond simple implementations.

Novin

References

https://poi.apache.org/

Site footer