Skip to content

Snapshot Strategy

Bases: BaseStrategy

Snapshot strategy for DB Reader/File Downloader.

Used for fetching all the rows/files from a source. Does not support HWM.

Note

This is a default strategy.

For DB Reader: Every snapshot run is executing the simple query which fetches all the table data:

SELECT id, data FROM public.mydata;
For File Downloader: Every snapshot run is downloading all the files (from the source, or user-defined list):

$ hdfs dfs -ls /path

/path/my/file1
/path/my/file2
DownloadResult(
    ...,
    successful={
        LocalFile("/downloaded/file1"),
        LocalFile("/downloaded/file2"),
    },
)

Added in 0.1.0

Examples:

from onetl.db import DBReader, DBWriter
from onetl.strategy import SnapshotStrategy

reader = DBReader(
    connection=postgres,
    source="public.mydata",
    columns=["id", "data"],
    hwm=DBReader.AutoDetectHWM(name="some_hwm_name", expression="id"),
)

writer = DBWriter(connection=hive, target="db.newtable")

with SnapshotStrategy():
    df = reader.run()
    writer.run(df)

# current run will execute following query:

# SELECT id, data FROM public.mydata;
from onetl.file import FileDownloader
from onetl.strategy import SnapshotStrategy

downloader = FileDownloader(
    connection=sftp,
    source_path="/remote",
    local_path="/local",
)

with SnapshotStrategy():
    df = downloader.run()

# current run will download all files from 'source_path'