ORADICAS: OGG for DAA

segunda-feira, 14 de abril de 2025

[Oracle] GoldenGate for Distributed Applications and Analytics (GG for DAA) & Iceberg replication

Hello everyone.

How are you doing?

I was talking to Alex Lima, Oracle GoldenGate Product Manager, today and he suggested I take a look at a new OGG fo DAA.

You already know that GG isn't just about replicating data from Oracle databases to Oracle databases, right? And did you know that it's not just limited to transactional databases?

Now, did you know that you can replicate tables in Iceberg format using GG for DAA from version 23.7?

That's right, my little master, you can. But first, what is Iceberg?

In a nutshell, Apache Iceberg is an open source table format designed for large-scale analysis in data lakes. In other words, the Iceberg format is a high-performance table format for extremely large analytical tables, designed to provide scalable and efficient data management.

Iceberg brings the reliability and simplicity of SQL tables to GG for DAA, while enabling engines such as Spark, Trino, Flink, Presto, Hive and Impala to work securely with the same tables at the same time.

And how can I do that? By using GG for DAA Handlers.

GG for DAA Handlers are native source and destination connectors for message streaming data/delta lake, cloud warehouse and NoSQL database technologies. They provide low-impact capture and real-time data ingestion capabilities with high accuracy and data throughput.

The OGG for DAA can be configured to work with any of the formats supported by Iceberg:

Parquet
Avro
ORC

The default file format for Iceberg data files is Parquet.

The following Iceberg catalogs are also supported:

Hadoop catalog
Nessie Catalog
AWS Glue Catalog
Polaris Catalog
REST Catalog
JDBC Catalog

And the following operations are supported as well:

INSERT: Generates files for insert operations.
UPDATE: Generates data files and delete files for update operations.
DELETE: Generates delete files for delete operations.
TRUNCATE: Generates a delete file with a condition of always true to truncate the target table.

We can also work with Compressed Update Handling. Oracle GoldenGate trails can contain compressed or uncompressed update records. A compressed update record (Default) in the Oracle GoldenGate trail file contains values for the key columns and the modified columns. If we work in uncompressed format, we will have values for all columns.

Oracle GoldenGate Iceberg Replicat can also replicate GoldenGate trail records to Iceberg tables. The files can be written to local files, AWS S3, Google Cloud Storage (GCS) or Azure DataLake Storage (ADLS).

Another very interesting point is the Delete and Merge-On-Read (MoR) file. Oracle GoldenGate generates Iceberg delete files for UPDATE and DELETE operations. To do this, the write.update.mode property of the Iceberg table is set to merge-on-read.

Iceberg supports two types of delete files:

Exclusions by equality: The excluded records are identified by the equality of the values in the columns specified in the exclusion file.
Exclusions by position: The excluded records are identified by the position of the records in the Iceberg data file.

Currently, Oracle GoldenGate uses Iceberg Equality Deletes to delete records from the Iceberg table.

One point to watch out for is primary key updates with missing column values. This will cause files to be transferred to the Iceberg table before the transfer interval, potentially resulting in small data files and delete files for the primary key update operation. For workloads or tables with frequent primary key updates, it would be more interesting to generate trace files with uncompressed update records. In addition, we should set gg.validate.keyupdate=true for the trail generated from the Oracle source.

The configuration of the Iceberg replication properties is stored in the Replicat properties file. And we can make the settings below:

Nessie Catalog
AWS Glue Catalog
Polaris Catalog
REST Catalog
JDBC Catalog
Hadoop Catalog

So that's it, if you didn't know about this Oracle GG for DAA capability, now you do and you can start exploiting this functionality.

And if you want to know more details, you can check it out here and here.

I hope this has helped you.

See you.

Mario

segunda-feira, 14 de abril de 2025

[Oracle] GoldenGate for Distributed Applications and Analytics (GG for DAA) & Iceberg replication

Postagem em destaque

[Oracle] GoldenGate for Distributed Applications and Analytics (GG for DAA) & Iceberg replication