This post is a one-stop guide to Microsoft SQL Server CDC (Change Data Capture) including its evolution, functioning, the technology driving it, and its two types. However, it is necessary to understand what is the Change Data Capture as a standalone entity before coming to the various facets of Microsoft SQL Server CDC
What is Change Data Capture (CDC)
In today’s business ecosystem, most companies rely heavily on generated data to run their operations efficiently. Hence, the most critical aspect is to have measures in place that ensure robust data security and safety norms as well as a high level of data durability.
In this scenario, the Change Data Capture feature not only facilitates strict data security norms but also makes sure that any changes to data are stored without compromising their values and history.
Several attempts had been made in the past in this direction by introducing timestamps, triggers, data auditing, and intricate queries but none produced the required solution. It was only when Microsoft launched its SQL Server CDC feature that a workable process was established.
The Development of Microsoft SQL Server CDC
Microsoft first launched SQL Server CDC IN 2005 with “after update”, “after insert”, and “after delete” options. However, it did not find favor with Database Administrators who found the feature very complex to work with. Microsoft was very receptive to this feedback from DBAs and in 2008, launched another version of the SQL Server CDC with features that are in use to this day.
This new CDC is very user-friendly. DBAs can capture changes made to historical data and store them without having to configure the databases differently to do so.
The Technology Driving SQL Server CDC
The goal of the SQL Server CDC is to record changes like insert, delete, or update in the database and provide their details to the users in an easy-to-understand relational format. Further, the tools to capture changes to the target database such as metadata and column information are in-built in the changed and modified rows. After the changes are captured and recorded in the source tables, they are copied into the target tables under column information. Access to the changes made with SQL Server CDC is firewalled by table-valued functions.
How is the SQL Server CDC a cut over other features in this niche?
In other CDC technologies, users must refresh the source tables in a database over different periods to capture the changes made at the source in the target repository. This makes working with these systems a long drawn-out and complex affair.
On the other hand, SQL Server CDC, by default, provides information about change data continuously that users can apply to specific applications or tables as required. An example of users using the changed data is the Extract, Load, and Transform (ETL) application that moves data changes and incremental data from source tables to a data warehouse.
The Operational Aspects of The SQL Server CDC
As detailed above, SQL Server CDC monitors and records any changes made by users in the tables. These changes are stored in relational tables to be accessed and retrieved as required through T-SQL. Whenever the attributes of the Change Data Capture feature are applied to a database table, a replicated image of the tracked table is automatically created.
Further, the structure of the replicated tables has additional columns of metadata that verify the changes made in the database rows. Apart from this difference between the source and the replicated tables, the architecture of the two is similar in all respects. It makes it easy for DBAs to use the SQL Server CDC feature to track logged tables and access the new audit tables.
An advantage of SQL Server CDC is that its transaction log shows the source of the changes made in CDC. Any changes or modifications made in the tracked source tables are immediately inputted in the log. All details of the changes in the log are linked to the change data section of the original table.
Types of SQL Server CDC
Even though there are two types of SQL Server CDC, it is always advisable to start the CDC operations by going through the first before starting on the second.
Log-based CDC: In this type of SQL Server CDC, the transaction log and file hold all changes made to a database which are then replicated to a target database. Since all changes made at source are duly replicated in the target database and none can be left out, this form of CDC is very reliable. Users neither have to add new tables nor change schemas of the production database.
The drawback of this form of SQL Server CDC is that it is only available for databases that support log-based CDC.
Trigger-based CDC
Here the SQL Server CDC works on triggers placed in the database that are automatically activated whenever a change is noticed in the source database. Hence, human intervention is not required at any stage, significantly reducing operational costs. However, this cost savings is balanced out by increased running time of the source system as the database must be refreshed every time a change occurs.
Trigger-based SQL Server CDC is a level above the log-based CDC. It is easy to implement, provides detailed logs in shadow tables of all transactions, and is directly supported by SQL API for specific databases. Further tracking and recording changes take place faster in this trigger-based CDC.
There are certain downsides in this form of SQL Server CDC too. First, the triggers sometimes get disengaged when the load is heavy. Next, database performance is often adversely affected as multiple writes are required to a database when changes are executed to rows.