Practical Application of Debezium CDC

Practical Application of Debezium CDC

1. Introduction

The problem I faced while participating in the Medical System MSA Transition Project was data duplication. The new MSA system (PostgreSQL) and the existing legacy system (Oracle) had to be operated side by side, and there was a requirement to maintain data consistency in real-time while both systems were simultaneously serving.

Batch synchronization caused delays of several minutes, making it unsuitable for the healthcare domain, the DB trigger method added operational burdens to the legacy Oracle, and application dual writes made it difficult to ensure transaction atomicity. After reviewing various methods, we decided to adopt the CDC (Change Data Capture) method, which directly reads the transaction logs from the source DB and propagates them asynchronously.

2. Architecture: Debezium and custom synchronization application

Debezium is an open-source CDC platform that directly subscribes to the WAL (Write-Ahead Log) of PostgreSQL, publishing INSERT/UPDATE/DELETE events in real-time to Kafka. Since it reads the WAL logs without triggering the source DB or changing the code, it did not require touching legacy systems, and the ability to continue from the last processed log position (LSN) even after the connector restarts aligned well with the project's requirements.

image1.png

[Figure 1] Debezium Official Architecture — Source: debezium.io/documentation/reference/3.5 (Apache License 2.0)

Category

Technology

Role

Source DB

PostgreSQL

Data Change Source (WAL-based CDC)

CDC engine

Debezium 3.x

WAL detection → Kafka event publication

Event bus

Apache Kafka

at-least-once delivery guarantee (possible duplicate publishing → Consumer idempotency handling required)

Synchronization App

Data-Sync (Spring Boot)

Event consumption, Oracle reflection, consistency hash storage

Target DB

Oracle

Final synchronization target for legacy systems

There were three main reasons for directly developing a custom Spring Boot application (Data-Sync) instead of using the standard Sink Connector.

Schema transformation - The complex PK, non-standard column names, and type conversions (timestamptz → TIMESTAMP WITH LOCAL TIME ZONE, etc.) of legacy Oracle were difficult to handle with standard Connector settings alone.

Error handling - We needed to have different retry intervals and notification methods based on the type of error, and for this, the flexibility of custom code was necessary.

Consistency validation - There was also a requirement to asynchronously store the data hash after synchronization is complete in order to periodically verify consistency between the two systems.

3. Durability Verification: Fault Response Testing

After the design was completed, I wanted to check whether the system would not lose data even in the event of a failure before it was actually put into operation. I independently executed scenarios in a development environment Kubernetes cluster, verifying each time whether 20,000 records of data were input into PostgreSQL and successfully loaded into Oracle without any loss.

Scenario

Disability Condition

Recovery Mechanism

Result

Kafka Partition Leader failure

Forced shutdown of the leader broker during transmission

Kafka Controller elects another broker as the leader within the ISR, Debezium automatic reconnection

✓ No loss

Complete failure of Kafka Cluster

Delete all broker pods and redeploy

Kafka Connect offset topic (PVC retain) works with PostgreSQL replication slot to continue from the last LSN to WAL

✓ No loss

Debezium Connector failure

Forced termination of the connector during CDC processing

Automatically resume reading WAL from the last LSN recorded in the replication slot

✓ No data loss

The reason there was no data loss even after the Kafka cluster was fully redeployed is due to the operation of the Kafka Connect offset topic and
the PostgreSQL Replication Slot working together.

The offset topic was made possible thanks to preserving the last processed Kafka position, and the replication slot preserves the WAL's LSN position.
However, due to the at-least-once characteristic, duplicate events may be published during reprocessing, so implementing UPSERT-based idempotency
in Data-Sync ultimately ensured data consistency.

4. Conclusion: Limitations and Lessons Learned, and Future

Limitations of Initial Design

The method of directly storing column information required for synchronization in the Data-Sync internal metadata table had clear limitations. Each time a column was added to a specific service, changes were needed in Data-Sync as well, compromising the service independence that MSA aims for in the data synchronization layer. To address this issue, we are considering the introduction of Confluent Schema Registry and Avro format. This is a feature already officially supported in Debezium 3.x, and by applying it, Debezium will automatically register the schema with the Schema Registry along with the events, allowing Data-Sync to interpret it dynamically, enabling each service to change the schema independently of Data-Sync.

Key Changes in Debezium 3.x (2024-2025)

Version

Core changes

Meaning from an operational perspective

3.1 (2025.04)

Official Release of Management Platform (Kubernetes UI)

Operation method of directly accessing the Pod via curl → can be replaced with GUI

3.4 (2025.12)

PostgreSQL 17 Failover Slot · PG 18 Support · Kafka 4.1 · OpenLineage Integration

Automatic Slot Migration during DB Failover · Monitoring pg_replication_slots is Essential

AI integration

Add LLM·vector DB linked AI module (Debezium Server Sink)

CDC is evolving from a data replication tool to an AI pipeline layer

The greatest advantage of this approach was that it allowed for controlling data flow without touching legacy systems.

I think this is an approach that can be applied in various contexts, not only in situations where two systems must coexist like in MSA transitions,
but also in cache refresh, search indexing, event-driven architecture, etc.

I hope this project will be of some help to those who are encountering CDC for the first time or those who are contemplating similar data
synchronization issues.

References

[1] Debezium official documentation, debezium.io/documentation/reference/3.4

[2] Debezium 3.1~3.4 Release Notes, debezium.io/blog

[3] Confluent Schema Registry, docs.confluent.io/platform/current/schema-registry

Juno

Site footer