Skip to content

S3 connection

Bases: FileConnection

S3 file connection. support hooks

Based on minio-py client.

Warning

Since onETL v0.7.0 to use S3 connector you should install package as follows:

pip install "onetl[s3]"

# or
pip install "onetl[files]"
See File connections install installation instruction for more details.

Added in 0.5.1

Parameters:

  • host (str) –

    Host of S3 source. For example: s3.domain.com

  • port (int) –

    Port of S3 source

  • bucket (str) –

    Bucket name in the S3 file source

  • access_key (str) –

    Access key (aka user ID) of an account in the S3 service

  • secret_key (str) –

    Secret key (aka password) of an account in the S3 service

  • protocol (str, default: https ) –

    Connection protocol. Allowed values: https or http

    Changed in 0.6.0

    Renamed secure: bool to protocol: Literal["https", "http"]

  • region (str) –

    Region name of bucket in S3 service. Optional for some S3 implementations (MinIO, Ozone), but could be mandatory for others.

  • session_token (str) –

    Session token generated by S3 STS service, if used.

Examples:

Create and check S3 connection:

from onetl.connection import S3

s3 = S3(
    host="s3.domain.com",
    protocol="http",
    bucket="my-bucket",
    access_key="ACCESS_KEY",
    secret_key="SECRET_KEY",
    region="us-east-1",
).check()

path_exists(path)

Check if specified path exists on remote filesystem. support hooks.

Added in 0.8.0

Parameters:

  • path (str | PathLike) –

    Path to check

Returns:

  • bool

    True if path exists, False otherwise.

Examples:

>>> connection.path_exists("/path/to/file.csv")
True
>>> connection.path_exists("/path/to/dir")
True
>>> connection.path_exists("/path/to/missing")
False

resolve_dir(path)

Returns directory at specific path, with stats. support hooks

Added in 0.8.0

Parameters:

  • path (str | PathLike) –

    Path to resolve

Returns:

  • Directory path with stats

Raises:

  • DirectoryNotFoundError

    Path does not exist

  • NotADirectoryError

    Path is not a directory

Examples:

>>> dir_path = connection.resolve_dir("/path/to/dir")
>>> os.fspath(dir_path)
'/path/to/dir'
>>> dir_path.stat().st_uid  # owner id
12345

resolve_file(path)

Returns file at specific path, with stats. support hooks

Added in 0.8.0

Parameters:

  • path (str | PathLike) –

    Path to resolve

Returns:

  • File path with stats

Raises:

  • FileNotFoundError

    Path does not exist

  • NotAFileError

    Path is not a file

Examples:

>>> file_path = connection.resolve_file("/path/to/dir/file.csv")
>>> os.fspath(file_path)
'/path/to/dir/file.csv'
>>> file_path.stat().st_uid  # owner id
12345

create_dir(path)

Creates directory tree on remote filesystem. support hooks

Added in 0.8.0

Parameters:

  • path (str | PathLike) –

    Directory path

Returns:

  • Created directory with stats

Raises:

  • NotAFileError

    Path is not a file

Examples:

>>> dir_path = connection.create_dir("/path/to/dir")
>>> os.fspath(dir_path)
'/path/to/dir'

remove_dir(path, *, recursive=False)

Remove directory or directory tree. support hooks

If directory does not exist, no exception is raised.

Added in 0.8.0

Parameters:

  • path (str | PathLike) –

    Directory path to remove

  • recursive (bool, default: False ) –

    If True, remove directory tree recursively (including files and subdirectories).

    If False, remove only directory itself. Directory should be empty.

Returns:

  • bool

    True if directory was removed, False if directory does not exist in the first place.

Raises:

  • NotADirectoryError

    Path is not a directory

  • DirectoryNotEmptyError

    Directory is not empty, and recursive is False

Examples:

>>> connection.remove_dir("/path/to/dir")
Traceback (most recent call last):
    ...
onetl.exception.DirectoryNotEmptyError: Cannot delete non-empty directory '/path/to/dir'
>>> connection.remove_dir("/path/to/dir", recirsive=True)
True
>>> connection.path_exists("/path/to/dir")
False
>>> connection.path_exists("/path/to/dir/file.csv")
False
>>> connection.remove_dir("/path/to/dir")  # already deleted, no error
False