Skip to content

Reading from Iceberg using `Iceberg.sql`¶

Iceberg.sql allows passing custom SQL query, but does not support incremental strategies.

Warning

Unlike DBReader, in SQL queries table names must include catalog prefix.

Syntax support¶

Only queries with the following syntax are supported:

✅︎ SELECT ... FROM ...
✅︎ WITH alias AS (...) SELECT ...
❌ SET ...; SELECT ...; - multiple statements not supported

Examples¶

from onetl.connection import Iceberg

iceberg = Iceberg(catalog_name="my_catalog", ...)
df = iceberg.sql(
    """
    SELECT
        id,
        key,
        CAST(value AS string) value,
        updated_at
    FROM
        my_catalog.my_schema.my_table
    WHERE
        key = 'something'
    """,
)

Recommendations¶

Select only required columns¶

Avoid SELECT *. List only required columns to minimize I/O and improve query performance.

Use filters¶

Include WHERE clauses on columns to allow Spark to prune unnecessary data, e.g. operators =, >, <, BETWEEN.