Prerequisites¶
Note
onETL's Iceberg connection is actually a SparkSession configured to work with Apache Iceberg tables. All data motion is made using Spark. Iceberg catalog (REST, Hadoop, etc.) is used only to store tables metadata, while data itself is stored in a warehouse location (HDFS, S3, or another supported filesystem).
Version Compatibility¶
- Iceberg catalog: depends on chosen implementation (e.g. REST, Hadoop)
- Spark versions: 3.2.x -- 4.0.x
- Java versions: 8 -- 22
See official documentation for details on catalog and warehouse configuration.
Installing PySpark¶
To use Iceberg connector you should have PySpark installed (or injected to sys.path) BEFORE creating the connector instance.
See installation instruction for more details.
Popular Metastore Implementations¶
Iceberg supports multiple catalog implementations. Here are some popular options: