Contributing Guide¶
Welcome! There are many ways to contribute, including submitting bug reports, improving documentation, submitting feature requests, reviewing new submissions, or contributing code that can be incorporated into the project.
Review process¶
For any significant changes please create a new GitHub issue and enhancements that you wish to make. Describe the feature you would like to see, why you need it, and how it will work. Discuss your ideas transparently and get community feedback before proceeding.
Small changes can directly be crafted and submitted to the GitHub Repository as a Pull Request. This requires creating a repo fork using instruction.
Important notes¶
Please take into account that:
- Some companies still use old Spark versions, like 3.2.0. So it is required to keep compatibility if possible, e.g. adding branches for different Spark versions.
- Different users uses onETL in different ways - some uses only DB connectors, some only files. Connector-specific dependencies should be optional.
- Instead of creating classes with a lot of different options, prefer splitting them into smaller classes, e.g. options class, context manager, etc, and using composition.
Initial setup for local development¶
Install Git¶
Please follow instruction.
Clone the repo¶
Open terminal and run these commands to clone a forked repo:
git clone git@github.com:myuser/onetl.git -b develop
cd onetl
Enable pre-commit hooks¶
Create virtualenv and install dependencies:
make venv-install
Install pre-commit hooks:
prek install --install-hooks
Test pre-commit hooks run:
prek run
How to¶
Run tests locally¶
Note
You can skip this if only documentation is changed.
Setup environment¶
Create virtualenv and install dependencies:
make venv-install
Using docker-compose¶
Build image for running tests:
docker-compose build
Start all containers with dependencies:
docker-compose --profile all up -d
You can run limited set of dependencies:
docker-compose --profile mongodb up -d
Run tests:
docker-compose run --rm onetl pytest
You can pass additional arguments, they will be passed to pytest:
docker-compose run --rm onetl pytest -m mongodb -lsx -vvvv --log-cli-level=INFO
You can run interactive bash session and use it:
docker-compose run --rm onetl bash
pytest -m mongodb -lsx -vvvv --log-cli-level=INFO
See logs of test container:
docker-compose logs -f onetl
Stop all containers and remove created volumes:
docker-compose --profile all down -v
Without docker-compose¶
Warning
To run HDFS tests locally you should add the following line to your /etc/hosts (file path depends on OS):
# HDFS server returns container hostname as connection address, causing error in DNS resolution
127.0.0.1 hdfs
Note
To run Oracle tests you need to install Oracle instantclient,
and pass its path to ONETL_ORA_CLIENT_PATH and LD_LIBRARY_PATH environment variables,
e.g. ONETL_ORA_CLIENT_PATH=/path/to/client64/lib.
It may also require to add the same path into LD_LIBRARY_PATH environment variable
Note
To run Greenplum tests, you should:
- Download VMware Greenplum connector for Spark
- Either move it to
~/.ivy2/jars/, or pass file path toCLASSPATH - Set environment variable
ONETL_GP_PACKAGE_VERSION=local.
Start all containers with dependencies:
docker-compose --profile all up -d
You can run limited set of dependencies:
docker-compose --profile mongodb up -d
Run core tests:
make test-core
Run specific connection tests:
make test-spark PYTEST_ARGS="-m mongodb"
make test-no-spark PYTEST_ARGS="-m ftp"
You can pass additional arguments, they will be passed to pytest:
make test-spark PYTEST_ARGS="-m mongodb -lsx -vvvv --log-cli-level=INFO"
Stop all containers and remove created volumes:
docker-compose --profile all down -v
Build documentation¶
Note
You can skip this if only source code behavior remains the same.
Create virtualenv and install dependencies:
make venv-install
Build documentation using Sphinx:
cd docs
make html
Then open in browser docs/_build/index.html.
Create pull request¶
Commit your changes:
git commit -m "Commit message"
git push
Then open Github interface and create pull request. Please follow guide from PR body template.
After pull request is created, it get a corresponding number, e.g. 123 (pr_number).
Write release notes¶
onETL uses towncrier
for changelog management.
To submit a change note about your PR, add a text file into the docs/changelog/next_release folder. It should contain an explanation of what applying this PR will change in the way end-users interact with the project. One sentence is usually enough but feel free to add as many details as you feel necessary for the users to understand what it means.
Use the past tense for the text in your fragment because, combined with others, it will be a part of the "news digest" telling the readers what changed in a specific version of the library since the previous version.
You should also use
reStructuredText syntax for highlighting code (inline or block),
linking parts of the docs or external sites.
If you wish to sign your change, feel free to add -- by
:user:github-username` at the end (replacegithub-username`
with your own!).
Finally, name your file following the convention that Towncrier
understands: it should start with the number of an issue or a
PR followed by a dot, then add a patch type, like feature,
doc, misc etc., and add .rst as a suffix. If you
need to add more than one fragment, you may add an optional
sequence number (delimited with another period) between the type
and the suffix.
In general the name will follow <pr_number>.<category>.rst pattern,
where the categories are:
feature: Any new featurebugfix: A bug fiximprovement: An improvementdoc: A change to the documentationdependency: Dependency-related changesmisc: Changes internal to the repo like CI, test and build changes
A pull request may have more than one of these components, for example a code change may introduce a new feature that deprecates an old feature, in which case two fragments should be added. It is not necessary to make a separate documentation fragment for documentation changes accompanying the relevant code changes.
Examples for adding changelog entries to your Pull Requests¶
Added a `:github:user:` role to Sphinx config -- by :github:user:`someuser`
Fixed behavior of `WebDAV` connector -- by :github:user:`someuser`
Added support of `timeout` in `S3` connector
-- by :github:user:`someuser`, :github:user:`anotheruser` and :github:user:`otheruser`
How to skip change notes check?¶
Just add ci:skip-changelog label to pull request.
Tip
See pyproject.toml for all available categories (tool.towncrier.type).
Release Process¶
Note
This is for repo maintainers only
Before making a release from the develop branch, follow these steps:
- Checkout to
developbranch and update it to the actual state
git checkout develop
git pull -p
- Backup
NEXT_RELEASE.rst
cp "docs/changelog/NEXT_RELEASE.rst" "docs/changelog/temp_NEXT_RELEASE.rst"
- Build the Release notes with Towncrier
VERSION=$(cat onetl/VERSION)
towncrier build "--version=${VERSION}" --yes
- Change file with changelog to release version number
mv docs/changelog/NEXT_RELEASE.rst "docs/changelog/${VERSION}.rst"
- Remove content above the version number heading in the
${VERSION}.rstfile
awk '!/^.*towncrier release notes start/' "docs/changelog/${VERSION}.rst" > temp && mv temp "docs/changelog/${VERSION}.rst"
- Update Changelog Index
awk -v version=${VERSION} '/DRAFT/{print;print " " version;next}1' docs/changelog/index.rst > temp && mv temp docs/changelog/index.rst
- Restore
NEXT_RELEASE.rstfile from backup
mv "docs/changelog/temp_NEXT_RELEASE.rst" "docs/changelog/NEXT_RELEASE.rst"
- Commit and push changes to
developbranch
git add .
git commit -m "Prepare for release ${VERSION}"
git push
- Merge
developbranch tomaster, WITHOUT squashing
git checkout master
git pull
git merge develop
git push
- Add git tag to the latest commit in
masterbranch
git tag "$VERSION"
git push origin "$VERSION"
- Update version in
developbranch after release:
git checkout develop
NEXT_VERSION=$(echo "$VERSION" | awk -F. '/[0-9]+\./{$NF++;print}' OFS=.)
echo "$NEXT_VERSION" > onetl/VERSION
git add .
git commit -m "Bump version"
git push