Oracle Change Data Capture (CDC) allows tracking of changes made to database tables without changing the database design. It uses triggers to capture changes and stores the details of changed rows in change tables. The publisher exposes these change tables, while the subscriber defines views on them to get a fixed set of changed records. After processing, the subscriber drops the views to start capturing the next set of changes on the next cycle. CDC provides a way to synchronize data between databases or integrate changed data into downstream systems like data warehouses.
1 of 32
Downloaded 12 times
More Related Content
Cdc
1. Oracle¡¯s Change Data Capture
(CDC)
Boris Knizhnik
BIK Information Services, Inc.
borisk@bikinfo.com
NYOUG 12/11/2003
1
2. What is Change Data
Capture?
Tool to help manage data changes
NOT a data warehousing solution
Can be used as a part of data warehousing
solution
Doesn¡¯t require any changes to existing
database design
2
4. Basic concept (cont.)
Source Instance Target Instance
Table1 Table1_changes
Table2 Table2_changes
...
Hey, I want to
I am publishing subscribe!
data !
4
5. Preparations
Make sure you know what tables you will
use in CDC process.
If tables are still under development ¨C use
the utility package to build the list of
columns on the fly.
Prepare two accounts ¨C publisher and
subscriber.
5
6. Setting up Publisher
connect system/manager@whatever
GRANT EXECUTE_CATALOG_ROLE to
boris_publisher;
GRANT SELECT_CATALOG_ROLE to
boris_publisher;
connect scott/tiger@whatever
GRANT SELECT on emp to boris_publisher;
GRANT SELECT on DEPT to
boris_publisher;
6
8. Inside Change Table
Name Type
----------------- ------------
OPERATION$ CHAR(2)
CSCN$ NUMBER
COMMIT_TIMESTAMP$ DATE
RSID$ NUMBER
USERNAME$ VARCHAR2(30) Columns
TIMESTAMP$ DATE from the
SOURCE_COLMAP$ RAW(128) original
TARGET_COLMAP$ RAW(128) table
COMM NUMBER(7,2)
DEPTNO NUMBER(2)
¡
8
9. Another publishing scenario
Same table may be published more
than once
Each change table for the same source
table may contain a different number of
columns
9
10. Setting up Subscriber
connect scott/tiger@whatever
GRANT SELECT ON emp TO boris_subscriber;
GRANT SELECT ON dept TO boris_subscriber;
connect boris_publisher/boris_publisher
GRANT SELECT ON cdc_emp TO boris_subscriber;
GRANT SELECT ON cdc_dept TO boris_subscriber;
10
13. Activate a Subscription
DECLARE v_subscription_description VARCHAR2(30) :=
'scott -> Datawarehouse';
-- Get the handle
SELECT handle INTO vHandle FROM all_subscriptions
WHERE description = v_subscription_description;
-- Activate the subscription
DBMS_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION(vHandle);
-- Extend the subscription window
DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW(
SUBSCRIPTION_HANDLE=>vHandle);
13
14. Logistical problem
Processing data in a change table takes some time.
In the mean time new records could have been stored in
this change table.
After you have processed the records, the next time
your processing program kicks in, you may have a few
more records in those tables.
How are you going to tell the old processed records
from the new ones?
Solution: Extend_window
14
15. Extending Window
-- get the handle
SELECT handle INTO vHandle FROM all_subscriptions
WHERE description = v_subscription_description;
DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW(
SUBSCRIPTION_HANDLE=>vHandle);
15
16. Cyclical Part
Publisher created change tables and is constantly collecting
change records.
Subscriber specified which of these tables she is interested in.
We are ready for a cyclical part of processing collected records
Reading change tables directly is not recommended by Oracle,
because the tables are not stable.
The number of records keeps growing while your data
warehouse process reads these records.
The solution is to create views that give you a fixed set of
records for each underlying change table.
After your data warehouse script finishes processing records,
you may drop this view.
16
17. Extending Window and
Creating CDC Views
connect boris_subscriber/boris_subscriber@whatever
...
-- Get the handle
SELECT handle INTO vHandle FROM all_subscriptions
WHERE description = v_subscription_description;
-- Extend the window for subscription
DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW(
SUBSCRIPTION_HANDLE=>vHandle);
-- Create CDC View (for each table)
v_cdc_table := 'CDC_'||v_source_table;
DBMS_CDC_SUBSCRIBE.PREPARE_SUBSCRIBER_VIEW(
SUBSCRIPTION_HANDLE=> vHandle
,SOURCE_SCHEMA => v_source_schema Result variable
,SOURCE_TABLE => v_source_table
,VIEW_NAME => our_view_name);
17
18. Extending Windows and
Creating CDC Views (cont.)
-- Drop the previous synonym
vSQL := 'DROP SYNONYM ' || v_cdc_view_name;
EXECUTE IMMEDIATE vSQL;
-- Create a private synonym to point to the view for each table:
v_cdc_view_name:=v_cdc_table|| '_vw';
vSQL := 'CREATE SYNONYM ' || v_cdc_view_name ||
' FOR '|| our_view_name;
EXECUTE IMMEDIATE vSQL;
18
19. CDC Views and Synonyms
Subscriber view 'CDC#CV$8757846' was
successfully created for table SCOTT.DEPT
Private synonym 'CDC_DEPT_vw' for view
'CDC#CV$8757846' was successfully created.
Subscriber view 'CDC#CV$8757848' was
successfully created for table SCOTT.EMP
Private synonym 'CDC_EMP_vw' for view
'CDC#CV$8757848' was successfully created.
19
20. CDC Views and Synonyms
(cont.)
CREATE OR REPLACE VIEW CDC#CV$8757846 ( OPERATION$,
CSCN$, COMMIT_TIMESTAMP$, TIMESTAMP$, USERNAME$,
TARGET_COLMAP$, SOURCE_COLMAP$, RSID$, DEPTNO,
DNAME, LOC
) AS SELECT
OPERATION$, CSCN$, COMMIT_TIMESTAMP$, TIMESTAMP$,
USERNAME$, TARGET_COLMAP$, SOURCE_COLMAP$, RSID$,
"DEPTNO", "DNAME", "LOC"
FROM "BORIS_PUBLISHER"."CDC_DEPT"
WHERE CSCN$ >= 40802127 AND CSCN$ <= 41013754
WITH READ ONLY
20
21. Processing Change Records
SELECT * FROM CDC_DEPT_vw
ORDER BY
CSCN$
, COMMIT_TIMESTAMP$, TIMESTAMP$
, OPERATION$ Desc
Note 1: Don¡¯t forget to specify ¡®order¡¯ clause!
Note 2: Watch for batches that update millions of
records!
21
23. What columns were changed
Source_colmap$
Apparently Oracle¡¯s inner presentation of the values is as a set of
binary words (two bytes). For historical reasons, these are usually
reversed in memory presentation. The least significant byte comes
first and the most significant byte follows.
23
24. What columns were changed
(cont.)
Learning what columns have been changed
may be important.
Using SOURCE_COLMAP$ may not give
you the correct results since Oracle does
not check whether or not the values really
changed.
It grabs columns that were mentioned in the
UPDATE statement even if this statement
is assigning the same values back.
24
25. Dropping CDC Views
connect boris_subscriber/boris_subscriber@whatever
-- Get the handle
SELECT handle INTO vHandle FROM all_subscriptions
WHERE description = v_subscription_description;
-- Drop the synonym
vSQL := 'DROP SYNONYM ' || v_cdc_view_name;
EXECUTE IMMEDIATE vSQL;
-- Drop the subscriber view(s) ¨C for all tables
v_source_table := ' emp ¡®;
DBMS_CDC_SUBSCRIBE.DROP_SUBSCRIBER_VIEW(
SUBSCRIPTION_HANDLE=> vHandle
,SOURCE_SCHEMA => v_source_schema
,SOURCE_TABLE => v_source_table);
Subscriber View for table 'CDC_DEPT' was dropped. Handle # 86
Subscriber View for table 'CDC_EMP' was dropped. Handle # 86
25
26. Purge the subscription window
-- Get the handle
SELECT handle INTO vHandle FROM all_subscriptions
WHERE description = v_subscription_description;
-- Purge window
DBMS_CDC_SUBSCRIBE.PURGE_WINDOW(
SUBSCRIPTION_HANDLE=> vHandle);
Subscriber Window for subscription 'scott -> Datawarehouse'
was successfully purged
26
27. Practical Advice
A slightly different sequence of steps is recommended
for a production environment:
Step 1 ¨C drop the CDC views (this will fail the first time,
since there are none)
Step 2 ¨C purge the CDC window (this will also fail the
first time)
Step 3 ¨C extend the windows, create CDC views, create
synonyms
Step 4 ¨C process updates
This sequence leaves your CDC views intact between
runs and you can do the research what went wrong
between runs.
27
28. Advice (Cont.)
If your source database is really on another instance,
your update process will be the one with a lot of
@db_link tables.
It is a good idea to design the update process in such
a way that it could be applied again without causing
problems.
You may want to treat Inserts as Updates if the key
already exists in a target database or Deletes will
not really delete anything (this will happen if you
are running your update script the second time).
This allows for better debugging of the scripts.
28
29. Advices (Cont. 1)
Your update script may run quickly or take a long time,
depending upon the intensity of updates in the system.
You should design your scripts in such way that they
will not run into each other.
It is a given that you are going to make a lot of
mistakes before setting everything up ¡°just so¡±, so the
following script can be used to undo the changes and
start over (See etl_undo_cdc.sql).
29
30. Overview of CDC process
Create Change Tables Drop CDC Views
Purge CDC Window
Create Subscription
Extend Subscription
Activate Subscription Window
Extend Subscription Create CDC Views
Window
Process
CDC Views
Drop Subscription
Drop Change Tables
30