Thanks for your interest in contributing to this package! No contibution is too small! We're hoping it can be made even better through community contributions.
For any bugs, issues or feature requests please open an issue on the project.
We have some general requirements for all contributions then specific requirements when adding completely new transformers to the package. This is to ensure consistency with the existing codebase.
For External contributors, first create your own fork of this repo.
Then clone the fork (or this repository if internal);
git clone https://github.com/lvgig/tubular.git cd tubular
Then install tubular and dependencies for development;
pip install . -r requirements-dev.txt
We use pre-commit for this project which is configured to check that code is formatted with black and passes ruff checks. For a list of ruff rules follwed by this project check .ruff.toml.
To configure pre-commit
for your local repository run the following;
pre-commit install
If working in a codespace the dev requirements and precommit will be installed automatically in the dev container.
If you are building the documentation locally you will need the docs/requirements.txt.
A point of surprise for some might be that requirements.txt and requirements-dev.txt are not user-edited files in this repo - they are compiled using pip-tools= from dependencies listed pyproject.toml. When adding a new direct dependency, simply add it to the appropriate field inside the package config - there is no need to pin it, but you can specify a minimum requirement. Then use pip-compile to create a pinned set of dependencies, ensuring reproducibility.
requirements.txt and requirements-dev.txt are still tracked under source control, despite being 'compiled'.
To compile using pip-tools:
pip install pip-tools # optional pip-compile -v --no-emit-index-url --no-emit-trusted-host --output-file requirements.txt pyproject.toml pip-compile --extra dev -v --no-emit-index-url --no-emit-trusted-host --output-file requirements-dev.txt pyproject.toml
- Please try and keep each pull request to one change or feature only
- Make sure to update the changelog with details of your change
We use black to format our code and follow pep8 conventions.
As mentioned above we use pre-commit
which streamlines checking that code has been formatted correctly.
Make sure that pull requests pass our CI. It includes checks that;
- code is formatted with black
- flake8 passes
- the tests for the project pass, with a minimum of 80% branch coverage
- bandit passes
We use pytest as our testing framework.
All existing tests must pass and new functionality must be tested. We aim for 100% coverage on new features that are added to the package.
There are some similarities across the tests for the different transformers in the package. Please refer to existing tests as they give great examples to work from and show what is expected to be covered in the tests.
We also make use of the test-aide package to make mocking easier and to help with generating data when parametrizing tests for the correct output of transformers' transform methods.
We organise our tests with one script per transformer then group together tests for a particular method into a test class.
We follow the numpy docstring style guide.
Docstrings need to be updated for the relevant changes and docstrings need to be added for new transformers.
Transformers in the package are designed to work with pandas DataFrame objects.
To be consistent with scikit-learn, all transformers must implement at least a transform(X)
method which applies the data transformation.
If information must be learnt from the data before applying the transform then a fit(X, y=None)
method is required. X
is the input DataFrame and y
is the response, which may not be required.
Optionally a reverse_transform(X)
method may be appropriate too if there is a way to apply the inverse of the transform
method.
For the full list of contributors see the contributors page.
Prior to the open source release of the package there have been contributions from many individuals in the LV GI Data Science team;
- Richard Angell
- Ned Webster
- Dapeng Wang
- David Silverstone
- Shreena Patel
- Angelos Charitidis
- David Hopkinson
- Liam Holmes
- Sandeep Karkhanis
- KarHor Yap
- Alistair Rogers
- Maria Navarro
- Marek Allen
- James Payne