Hello everyone.
How are you doing?
I was talking to Alex Lima, Oracle GoldenGate Product Manager, today and he suggested I take a look at a new OGG fo DAA.
You already know that GG isn't just about replicating data from Oracle databases to Oracle databases, right? And did you know that it's not just limited to transactional databases?
Now, did you know that you can replicate tables in Iceberg format using GG for DAA from version 23.7?
Now, did you know that you can replicate tables in Iceberg format using GG for DAA from version 23.7?
That's right, my little master, you can. But first, what is Iceberg?
In a nutshell, Apache Iceberg is an open source table format designed for large-scale analysis in data lakes. In other words, the Iceberg format is a high-performance table format for extremely large analytical tables, designed to provide scalable and efficient data management.
Iceberg brings the reliability and simplicity of SQL tables to GG for DAA, while enabling engines such as Spark, Trino, Flink, Presto, Hive and Impala to work securely with the same tables at the same time.
And how can I do that? By using GG for DAA Handlers.
GG for DAA Handlers are native source and destination connectors for message streaming data/delta lake, cloud warehouse and NoSQL database technologies. They provide low-impact capture and real-time data ingestion capabilities with high accuracy and data throughput.
The OGG for DAA can be configured to work with any of the formats supported by Iceberg:
In a nutshell, Apache Iceberg is an open source table format designed for large-scale analysis in data lakes. In other words, the Iceberg format is a high-performance table format for extremely large analytical tables, designed to provide scalable and efficient data management.
Iceberg brings the reliability and simplicity of SQL tables to GG for DAA, while enabling engines such as Spark, Trino, Flink, Presto, Hive and Impala to work securely with the same tables at the same time.
And how can I do that? By using GG for DAA Handlers.
GG for DAA Handlers are native source and destination connectors for message streaming data/delta lake, cloud warehouse and NoSQL database technologies. They provide low-impact capture and real-time data ingestion capabilities with high accuracy and data throughput.
The OGG for DAA can be configured to work with any of the formats supported by Iceberg:
- Parquet
- Avro
- ORC
The following Iceberg catalogs are also supported:
- Hadoop catalog
- Nessie Catalog
- AWS Glue Catalog
- Polaris Catalog
- REST Catalog
- JDBC Catalog
And the following operations are supported as well:
Oracle GoldenGate Iceberg Replicat can also replicate GoldenGate trail records to Iceberg tables. The files can be written to local files, AWS S3, Google Cloud Storage (GCS) or Azure DataLake Storage (ADLS).
- INSERT: Generates files for insert operations.
- UPDATE: Generates data files and delete files for update operations.
- DELETE: Generates delete files for delete operations.
- TRUNCATE: Generates a delete file with a condition of always true to truncate the target table.
Oracle GoldenGate Iceberg Replicat can also replicate GoldenGate trail records to Iceberg tables. The files can be written to local files, AWS S3, Google Cloud Storage (GCS) or Azure DataLake Storage (ADLS).
Another very interesting point is the Delete and Merge-On-Read (MoR) file. Oracle GoldenGate generates Iceberg delete files for UPDATE and DELETE operations. To do this, the write.update.mode property of the Iceberg table is set to merge-on-read.
Iceberg supports two types of delete files:
Iceberg supports two types of delete files:
- Exclusions by equality: The excluded records are identified by the equality of the values in the columns specified in the exclusion file.
- Exclusions by position: The excluded records are identified by the position of the records in the Iceberg data file.
One point to watch out for is primary key updates with missing column values. This will cause files to be transferred to the Iceberg table before the transfer interval, potentially resulting in small data files and delete files for the primary key update operation. For workloads or tables with frequent primary key updates, it would be more interesting to generate trace files with uncompressed update records. In addition, we should set gg.validate.keyupdate=true for the trail generated from the Oracle source.
The configuration of the Iceberg replication properties is stored in the Replicat properties file. And we can make the settings below:
And if you want to know more details, you can check it out here and here.
- Nessie Catalog
- AWS Glue Catalog
- Polaris Catalog
- REST Catalog
- JDBC Catalog
- Hadoop Catalog
And if you want to know more details, you can check it out here and here.
I hope this has helped you.
See you.
See you.
Mario