1. Background of Technology Selection
As the requirements for protecting personal data in the system have strengthened, the application of data encryption has become a mandatory task. In particular, personal information such as names, phone numbers, and emails needs to be encrypted when stored, and must be managed securely in the operating environment. Initially, the option to apply only the database's built-in encryption feature was considered, but for the reasons outlined below, we decided to investigate a structure where the application performs encryption and decryption directly.
Go.If encryption is only performed at the database level, users who gain access to data from outside the application canPossibility of querying plaintext dataThis exists
I. Customer Security GuideIt is recommended that encryption and decryption be handled at the application level rather than within the database
The target system utilized MongoDB Atlas, and it can leverage the CSFLE (Client Side Field Level Encryption) feature officially provided by MongoDB.
For reference, CSFLE is a feature that encrypts data at the client level through a library before storing it in the database and automatically decrypts it during retrieval.
It is possible to selectively apply encryption to specific fields, and similarly to how only the essential columns are encrypted in an operational environment using RDB, this applies to MongoDB as well. Encrypting all data can significantly degrade search and operational efficiency, but CSFLE allows for optimization by encrypting only the necessary fields.
The encryption key management method was also an important security consideration.
Storing the encryption key in the application’s internal configuration file poses security risks, so it has been reviewed to use a separate KMS (Key Management Service).
Since the system was operated in an AWS environment, it was decided to use AWS KMS. AWS KMS provides features such as key creation, access control, and automatic renewal, ensuring high reliability in security management.
Additionally, we reviewed the Queryable Encryption feature offered by MongoDB. Queryable Encryption had the advantage of providing some search functionality on encrypted data, but there were issues with the creation of additional collections for encrypted searches and the complexity of the management structure.
In particular, the expected increase in the number of DB collections and management complexity in the operational environment led us to prioritize convenience of operation and maintenance efficiency even more in the actual project.
As a result, this project includes the combination of CSFLE + AWS KMSI have chosen.
2. Local environment application process
The application of CSFLE was first carried out in a way that was verified in the local development environment.
MongoDB provided a CSFLE sample project for Java, and based on that, we first checked the basic functionalities.
The sample code also provided the master key (CMK) functionality for the local environment, making it suitable for initial testing. In the initial phase, the following items were prioritized for verification.
- Whether specific fields are encrypted
- Automatic encryption operation when saving
- Automatic decryption operation when retrieving
- Specify the encryption target field
- Data Encryption Key (DEK) generation method
- Master Key (CMK) Integration Method
CSFLE automatically encrypts data when stored through a client with encryption settings applied. Developers do not need to write separate encryption and decryption logic, which enhances development convenience.
For example, after applying encryption settings to the name field and saving the data, the actual encrypted data is stored in binary format inside MongoDB. Conversely, when data is queried through the same client, decryption is performed automatically, allowing the application to use the data as plain text.
During the initial testing phase, we also reviewed the Entity structure and the method of defining encryption schemas. In MongoDB CSFLE, encryption fields can be specified based on JSON Schema, allowing the selection of either deterministic or random encryption methods for specific fields.
Deterministic encryption has the advantage of enabling equality searches since the same input value is always stored as the same ciphertext, while random encryption offers higher security but makes it difficult to compare identical values.
In the project, the level of privacy protection and search requirements are considered.Some fields that require search will apply deterministic encryptionand, The remaining fields are designed to apply random encryption.in the way that it is applied.
I was able to reliably validate the basic encryption and decryption operations and data storage flow through local environment testing.
3. Cloud Environment and AWS KMS Application
After the local environment validation is complete, actual application has been carried out in the MongoDB Atlas-based cloud environment.
In the cloud environment, AWS KMS is operated the same as in the actual operation.Configured for use with KMS in the development environmentTo apply AWS KMS, you need to set IAM permissions, create KMS Keys, and configure access policies.
It is particularly important to configure the appropriate IAM Role to allow the application server to access KMS. If the permission settings are incorrect, the application itself may not function normally due to failure to retrieve the encryption key.
Since we were using a local master key (CMK) in the local environment, key conversion work was necessary during the transition to the production environment.
In this process, we utilized the Rewrap feature provided by the MongoDB CSFLE library. Rewrap is a function that re-encrypts the existing data key (DEK) with a new master key (CMK). In other words, it changes only the higher-level master key that protects the data encryption key (DEK) and does not re-encrypt the actual data itself.
In the project, we used a local master key-based DEK.Rewrapped using an AWS KMS-based master keyIn the Rewrap process, we focused on the following aspects during the transition to the operational environment.
- Existing data decryption status
- New stored data encryption status
- AWS KMS access status
- Existing DEK reusability status
- Whether rollback is possible in case of failure
Fortunately, the existing data was still accessible after the rewrap, and it was confirmed that the new data is also securely encrypted based on AWS KMS.
In the operational environment, due to security policies,KMS key automatic renewal policyIt has also been applied. AWS KMS supports automatic rotation at regular intervals, which is a significant advantage in terms of key management efficiency. The cycle is set to 1 yeardone.
4. Key issues identified during the operation process
We were able to experience various trials and operational issues while applying CSFLE in a real operational environment. The first is CSFLE library distribution methodThis is a problem.
The CSFLE library for MongoDB Enterprise is provided as operating system-specific binary files separate from the DB access client, and the file sizes are quite large. In a Linux environment, it was several tens of MB, while in a Windows environment, it was several hundreds of MB.
Initially, we considered including it within the Git Repository, but we determined that it was not appropriate due to the inefficiencies in managing binary files and the increase in repository size. As a result, it was recommended to distribute it through Nexus Repository or include it within the Docker image for this project.Included within the Docker imageI proceeded in the direction given.
The second is Spring Boot version compatibilityThis is a problem.
The CSFLE library behaves slightly differently depending on the version of MongoDB Enterprise, and even within the same major version, compatibility issues may arise with Spring Boot.
In particular, there have been cases of MongoDB Driver conflicts or Bean initialization errors with lower versions of Spring Boot. There is no separate guide document, so please adjust according to the version of Spring Boot you are currently using.Validated similar versions of the CSFLE library released around the same timeDone.
As a result, it was confirmed that it is important to avoid using a very low version of Spring Boot and to use versions that have been sufficiently verified for compatibility with the MongoDB Driver and CSFLE library.
The third is KMS master key (CMK) renewal This is an issue.
AWS KMS is fundamentally provides automatic key rotation featureIt does.
However, since the existing key can also be maintained for a certain period, the existing DEK does not pose an immediate issue. Nevertheless, from a security perspective, it is recommended to rewrap the DEK based on the newly updated CMK.
In the project, actually Testing the rewrapping from the local master key to the AWS KMS master keyWe conducted it and were able to confirm normal operation.
This gave us confidence that we can reliably respond to future key changes and security policy modifications.
5. Data Migration and LIKE Search Response
Unexpected issues have occurred during the data migration process.
In the local environment, viewing plaintext data through the encryption and decryption client worked normally, but in the MongoDB Atlas environment, errors occurred when retrieving some plaintext data.
Ultimately, in the deployment environment, which is the cloud environment, Multi data source structurehas been configured. One is Query plaintext dataIt was a general MongoDB client for doing so, and the other one was Encryption and decryption client with CSFLE applied client.
During the migration process, the existing plaintext data was first retrieved using a general client and then saved again using an encryption/decryption client.
Since CSFLE automatically performed the encryption during this process, there was no need to write separate encryption logic manually.
Another major issue is LIKE search problemwas.
The data to be encrypted included personal information such as names, and there was a requirement for partial search functionality in the user search feature.
In the financial sector systems, for the search of Korean names, keyword-based hash search methodIt is often used. For example, the name is separated into initial sounds or some string units, and then stored as a hash value, with the same method of hash conversion used for retrieval as well.
The problem was the English name. Since there were many more cases of partial searches for English names, it was necessary to generate at least 16 keywords for smooth searching.
Fortunately MongoDB provides array-based search capabilitiesbecause I was doing it add a keyword hash array fieldI was able to solve it by doing so. In other words, the name field itself is encrypted, but the search keyword hash array is stored separately to support partial search functionality.
If If you were using a relational database (RDB)separate Hash table for keyword referencewould have needed a way to place it.
Through this project, we were able to reaffirm that data encryption requires a comprehensive design that includes not just encryption at rest, but also considerations for searching, operations, migration, and key management.
Daniel(K)