Snapshot Strategy¶
Bases: BaseStrategy
Snapshot strategy for DB Reader/File Downloader.
Used for fetching all the rows/files from a source. Does not support HWM.
Note
This is a default strategy.
For DB Reader: Every snapshot run is executing the simple query which fetches all the table data:
SELECT id, data FROM public.mydata;
$ hdfs dfs -ls /path
/path/my/file1
/path/my/file2
DownloadResult(
...,
successful={
LocalFile("/downloaded/file1"),
LocalFile("/downloaded/file2"),
},
)
Added in 0.1.0
Examples:
from onetl.db import DBReader, DBWriter
from onetl.strategy import SnapshotStrategy
reader = DBReader(
connection=postgres,
source="public.mydata",
columns=["id", "data"],
hwm=DBReader.AutoDetectHWM(name="some_hwm_name", expression="id"),
)
writer = DBWriter(connection=hive, target="db.newtable")
with SnapshotStrategy():
df = reader.run()
writer.run(df)
# current run will execute following query:
# SELECT id, data FROM public.mydata;
from onetl.file import FileDownloader
from onetl.strategy import SnapshotStrategy
downloader = FileDownloader(
connection=sftp,
source_path="/remote",
local_path="/local",
)
with SnapshotStrategy():
df = downloader.run()
# current run will download all files from 'source_path'