This changelog follows the great advice from https://keepachangelog.com/.
Each section will have a title of the format X.Y.Z (YYYY-MM-DD)
giving the version of the package and the date of release of that version. Unreleased changes i.e. those that have been merged into main (e.g. with a .dev suffix) but which are not yet in a new release (on pypi) are added to the changelog but with the title X.Y.Z (unreleased)
. Unreleased sections can be combined when they are released and the date of release added to the title.
Subsections for each version can be one of the following;
Added
for new features.Changed
for changes in existing functionality.Deprecated
for soon-to-be removed features.Removed
for now removed features.Fixed
for any bug fixes.Security
in case of vulnerabilities.
Each individual change should have a link to the pull request after the description of the change.
- converted OneHotEncodingTransformer to narwhals #355 <https://github.com/lvgig/tubular/issues/355>_
- updated WeightsColumnMixin to use new narwhals 'is_finite' method
- narwhalified ModeImputer #321 <https://github.com/lvgig/tubular/issues/321>_
- fixed issues with all null and nullable-bool column handling in dataframe_init_dispatch
- added NaN error handling to WeightColumnMixin
- narwhalified BaseNumericTransformer #358 <https://github.com/lvgig/tubular/issues/358>_
- narwhalified BaseCappingTransformer #357 <https://github.com/lvgig/tubular/issues/357>_
- narwhalified CappingTransformer #361 <https://github.com/lvgig/tubular/issues/361>_
- narwhalified OutOfRangeNullTransformer #362 <https://github.com/lvgig/tubular/issues/362>_
- narwhalified MeanImputer #344 https://github.com/lvgig/tubular/issues/344_
- narwhalified BaseGenericDateTransformer. As part of this updated test data handling of date columns across repo #365 <https://github.com/lvgig/tubular/issues/365>_
- narwhalified BaseNumericTransformer #358 #358
- narwhalified DropOriginalMixin #352 <https://github.com/lvgig/tubular/issues/352>_
- narwhalified BaseMappingTransformer #367 <https://github.com/lvgig/tubular/issues/367>_
- narwhalified BaseMappingTransformerMixin. As part of this made mapping transformers more
type-conscious, they now rely on an input 'return_dtypes' dict arg. #369 <https://github.com/lvgig/tubular/issues/369>_ - As part of #369, updated OrdinalEncoderTransformer to output Int8 type - As part of #369, updated NominalToIntegerTransformer to output Int8 type. Removed inverse_mapping functionality, as this is more complicated when transform is opinionated on types. - narwhalified GroupRareLevelsTransformer. As part of this, had to make transformer more opinionated and refuse columns with nulls (raises an error directing to imputers.) #372 <https://github.com/lvgig/tubular/issues/372>_ - narwhalified BaseDatetimeTransformer #375 <#375> - Optional wanted_levels feature has been integrated into the OneHotEncodingTransformer which allows users to specify which levels in a column they wish to encode. #384 <https://github.com/azukds/tubular/issues/384>_ - Created unit tests to check if the values provided for wanted_values are as expected and if the output is as expected. - placeholder - placeholder - placeholder - placeholder
- Refactored BaseImputer to utilise narwhals #314 <https://github.com/lvgig/tubular/issues/314>_
- Converted test dfs to flexible pandas/polars setup
- Converted BaseNominalTransformer to utilise narwhals #334 <https://github.com/lvgig/tubular/issues/334>_
- narwhalified CheckNumericMixin #336 <https://github.com/lvgig/tubular/issues/336>_
- Changed behaviour of NearestMeanResponseImputer so that if there are no nulls at fit, it warns and has no effect at transform, as opposed to erroring. The error was problematic for e.g. lightweight test runs where nulls are less likely to be present.
- Modified OneHotEncodingTransformer, made an instance of OneHotEncoder and assign it to attribut _encoder #308 <#309>
- Refactored BaseDateTransformer, BaseDateTwoColumnTransformer and associated testing #273
- BaseTwoColumnTransformer removed in favour of mixin classes TwoColumnMixin and NewColumnNameMixin to handle validation of two columns and new_column_name arguments #273
- Refactored tests for InteractionTransformer #283
- Refactored tests for StringConcatenator and SeriesStrMethodTransformer, added separator mixin class. #286
- Refactored MeanResponseTransformer tests in new format #262
- refactored build tools and package config into pyproject.toml #271
- set up automatic versioning using setuptools-scm #271
- Refactored TwoColumnOperatorTransformer tests in new format #274
- Refactored PCATransformer tests in new format #277
- Refactored tests for NullIndicator #301
- Refactored BetweenDatesTransformer tests in new format #294
- As part of above, edited dates file transformers to use BaseDropOriginalMixin in transform
- Refactored DateDifferenceTransformer tests in new format. Had to turn off autodefine new_column_name functionality to match generic test expectations. Suggest we look to turn back on in the future. #296 #296
- Refactored DateDiffLeapYearTransformer tests in new format. As part of this had to remove the autodefined new_column_name, as this conflicts with the generic testing. Suggest we look to turn back on in future. #295 #295
- Edited base testing setup for dates file, created new BaseDatetimeTransformer class
- Refactored DatetimeInfoExtractor tests in new format #297
- Refactored DatetimeSinusoidCalculator tests in new format. #310
- fixed a bug in CappingTransformer which was preventing use of .get_params method #311
- Setup requirements for narwhals, remove python3.8 from our build pipelines as incompatible with polars
- Narwhal-ified BaseTransformer #313 <https://github.com/lvgig/tubular/issues/313>_
- Refactored ToDatetimeTransformer tests in new format #300
- Refactors tests for SeriesDtMethodTransformer in new format. Changed column arg to columns to fit generic format. #299 <https://github.com/lvgig/tubular/issues/299>_
- Refactored OrdinalEncoderTransformer tests in new format #330
- Narwhal-ified NullIndicator #319 <https://github.com/lvgig/tubular/issues/319>_
- Narwhal-ified NearestMeanResponseImputer #320 <https://github.com/lvgig/tubular/issues/320>_
- Narwhal-ified MedianImputer #317 <https://github.com/lvgig/tubular/issues/317>_
- Refactored NominalToIntegerTransformer tests in new format #261
- Refactored GroupRareLevelsTransformer tests in new format #259
- DatetimeInfoExtractor.mappings_provided changed from a dict.keys() object to list so transformer is serialisable. #258
- Created BaseNumericTransformer class to support test refactor of numeric file #266
- Updated testing approach for LogTransformer #268
- Refactored ScalingTransformer tests in new format #284
- Inheritable tests for generic base behaviours for base transformer in base_tests.py, with fixtures to allow for this in conftest.py
- Split existing input check into two better defined checks for TwoColumnOperatorTransformer #183
- Created unit tests for checking column type and size #183
- Automated weights column checks through a mixin class and captured common weight tests in generic test classes for weighted transformers
- Standardised naming of weight arg across transformers
- Update DataFrameMethodTransformer tests to have inheritable init class that can be used by othe test files.
- Moved BaseTransformer, DataFrameMethodTransformer, BaseMappingTransformer, BaseMappingTransformerMixin, CrossColumnMappingTransformer and Mapping Transformer over to the new testing framework.
- Refactored MappingTransformer by removing redundant init method.
- Refactored tests for ColumnDtypeSetter, and renamed (from SetColumnDtype)
- Refactored tests for SetValueTransformer
- Refactored ArbitraryImputer by removing redundant fillna call in transform method. This should increase tubular's efficiency and maintainability.
- Fixed bugs in MedianImputer and ModeImputer where they would error for all null columns.
- Refactored ArbitraryImputer and BaseImputer tests in new format.
- Refactored MedianImputer tests in new format.
- Replaced occurrences of pd.Dataframe.drop() with del statement to speed up tubular. Note that no additional unit testing has been done for copy=False as this release is scheduled to remove copy.
- Created BaseCrossColumnNumericTransformer class. Refactored CrossColumnAddTransformer and CrossColumnMultiplyTransformer to use this class. Moved tests for these objects to new approach.
- Created BaseCrossColumnMappingTransformer class and integrated into CrossColumnMappingTransformer tests
- Refactored BaseNominalTransformer tests in new format & moved its logic to the transform method.
- Refactored ModeImputer tests in new format.
- Added generic init tests to base tests for transformers that take two columns as an input.
- Refactored EqualityChecker tests in new format.
- Bugfix to MeanResponseTransformer to ignore unobserved categorical levels
- Refactored dates.py to prepare for testing refactor. Edited BaseDateTransformer (and created BaseDateTwoColumnTransformer) to follow standard format, implementing validations at init/fit/transform. To reduce complexity of file, made transformers more opinionated to insist on specific and consistent column dtypes. #246
- Added test_BaseTwoColumnTransformer base class for columns that require a list of two columns for input
- Added BaseDropOriginalMixin to mixin transformers to handle validation and method of dropping original features, also added appropriate test classes.
- Refactored MeanImputer tests in new format #250
- Refactored DatetimeInfoExtractor to condense and improve readability
- added minimal_dataframe_lookup fixture to conftest, and edited generic tests to use this
- Alphabetised the minimial attribute dictionary for readability.
- Refactored OHE transformer tests to align with new testing framework.
- Moved fixtures relating only to a single test out of conftest and into testing script where utilised.
- !!!Introduced dependency on Sklearn's OneHotEncoder by adding test to check OHE transformer (which we are calling from within our OHE wrapper) is fit before transform
- Refactored NearestMeanResponseImputer in line with new testing framework.
- Functionality for BaseTransformer (and thus all transformers) to take None as an option for columns. This behaviour was inconsistently implemented across transformers. Rather than extending to all we decided to remove this functionality. This required updating a lot of test files.
- The columns_set_or_check() method from BaseTransformer. With the above change it was no longer necessary. Subsequent updates to nominal transformers and their tests were required.
- Set pd copy_on_write to True (will become default in pandas 3.0) which allowed the functionality of the copy method of the transformers to be dropped #197
- Created unit test for checking if log1p is working and well conditioned for small x #178
- Changed LogTransformer to use log1p(x) instead of log(x+1) #178
- Changed unit tests using log(x+1) to log1p(x) #178
- Updated GroupRareLevelsTransformer so that when working with category dtypes it forgets categories encoded as rare (this is wanted behaviour as these categories are no longer present in the data) #177
- Update OneHotEncodingTransformer to default to returning int8 columns #175
- Updated NullIndicator to return int8 columns #173
- Updated MeanResponseTransformer to coerce return to float (useful behaviour for category type features) #174
- added type hints #128
- added some error handling to transform method of nominal transformers #162
- added new release pipeline #161
- added flake8_bugbear (B) to ruff rules #131
- added flake8_datetimez (DTZ) to ruff rules #132
- added option to avoid passing unseen levels to rare in GroupRareLevelsTransformer #141
- minor changes to comply with flake8_bugbear (B) ruff rules #131
- minor changes to comply with flake8_datetimez (DTZ) ruff rules #132
- BaseMappingTransformerMixin chnaged to use Dataframe.replace rather than looping over columns #135
- MeanResponseTransformer.map_imputer_values() added to decouple from BaseMappingTransformerMixin #135
- BaseDateTransformer added to standardise datetime data handling #148
- now compatible with pandas>=2.0.0 #123
- DateDifferenceTransformer no longer supports 'Y' or 'M' units #123
- replaced flake8 with ruff linting. For a list of rules implemented, code changes made for compliance and further rule sets planned for future see PR #92
- minor change to GroupRareLevelsTransformer test_super_transform_called test to align with other cases #90
- removed pin of scikit-learn version to <1.20 #90
- update black version in pre-commit-config #90
- added support for vscode dev container with python 3.8, requirments-dev.txt, pylance/gitlens extensions and precommit all preinstalled #83
- added sklearn < 1.2 dependency #86
- added support for handling unseen levels in MeanResponseTransformer #80
- added pandas < 2.0.0 dependency #81
- DateDifferenceTransformer M and Y units are incpompatible with pandas 2.0.0 and will be removed or changed in a future version #81
- added support for passing multiple columns and periods/units parameters to DatetimeSinusoidCalculator #74
- added support for handling a multi level response to MeanResponseTransformer #67
- changed ArbitraryImputer to preserve the dtype of columns (previously would upcast dtypes like int8 or float32) #76
- fixed issue with OneHotencodingTransformer use of deprecated sklearn.OneHotEencoder.get_feature_names method #66
- added support for prior mean encoding (regularised encodings) #46
- added support for weights to mean, median and mode imputers #47
- added classname() method to BaseTransformer and prefixed all errors with classname call for easier debugging #48
- added DatetimeInfoExtractor transformer in
tubular/dates.py
associated tests withtests/dates/test_DatetimeInfoExtractor.py
and examples withexamples/dates/DatetimeInfoExtractor.ipynb
#49 - added DatetimeSinusoidCalculator in
tubular/dates.py
associated tests withtests/dates/test_DatetimeSinusoidCalculator.py
and examples withexamples/dates/DatetimeSinusoidCalculator.ipynb
#50 - added TwoColumnOperatorTransformer in
tubular/numeric.py
associated tests withtests/numeric/test_TwoColumnOperatorTransformer.py
and examples withexamples/dates/TwoColumnOperatorTransformer.ipynb
#51 - added StringConcatenator in
tubular/strings.py
associated tests withtests/strings/test_StringConcatenator.py
and examples withexamples/strings/StringConcatenator.ipynb
#52 - added SetColumnDtype in
tubular/misc.py
associated tests withtests/misc/test_StringConcatenator.py
and examples withexamples/strings/StringConcatenator.ipynb
#53 - added warning to MappingTransformer in
tubular/mapping.py
for unexpected changes in dtype #54 - added new module
tubular/comparison.py
containing EqualityChecker. Also added associated tests withtests/comparison/test_EqualityChecker.py
and examples withexamples/comparison/EqualityChecker.ipynb
#55 - added PCATransformer in
tubular/numeric.py
associated tests withtests/misc/test_PCATransformer.py
and examples withexamples/numeric/PCATransformer.ipynb
#57
- updated black version to 22.3.0 and flake8 version to 5.0.4 to fix compatibility issues #45
- removed kwargs argument from BaseTransfomer in
tubular/base.py
to avoid silent erroring if incorrect arguments passed to transformers. Fixed a few tests which were revealed to have incorrect arguments passed by change #56
- Added InteractionTransformer in
tubular/numeric.py
, associated tests withtests/numeric/test_InteractionTransformer.py
file and examples withexamples/numeric/InteractionTransformer.ipynb
file.`#38 <#38>`_
- Added
tests/test_transformers.py
file with test to be applied all transformers #30
- Set min
pandas
version to 1.0.0 inrequirements.txt
,requirements-dev.txt
, anddocs/requirements.txt
#31 - Changed
y
argument in fit to only acceptpd.Series
objects #26 - Added new
_combine_X_y
method toBaseTransformer
which cbinds X and y #26 - Updated
MeanResponseTransformer
to usey
arg infit
and remove settingresponse_column
in init #26 - Updated
OrdinalEncoderTransformer
to usey
arg infit
and remove settingresponse_column
in init #26 - Updated
NearestMeanResponseImputer
to usey
arg infit
and remove settingresponse_column
in init #26 - Updated version of
black
used in thepre-commit-config
to21.9b0
#25 - Modified
DataFrameMethodTransformer
to add the possibility of drop original columns #24
- Added attributes to date and numeric transformers to allow transformer to be printed #30
- Removed copy of mappings in
MappingTransformer
to allow transformer to work with sklearn.base.clone #30 - Changed data values used in some tests for
MeanResponseTransformer
so the test no longer depends on pandas <1.3.0 or >=1.3.0, required due to change #25 in pandas behaviour with groupby mean BaseTransformer
now correctly raisesTypeError
exceptions instead ofValueError
when input values are the wrong type #26- Updated version of
black
used in thepre-commit-config
to21.9b0
#25
- Removed
pytest
andpytest-mock
fromrequirements.txt
#31
- Added
scaler_kwargs
as an empty attribute to theScalingTransformer
class to avoid anAttributeError
raised bysklearn
#21 - Added
test-aide
package torequirements-dev.txt
#21 - Added logo for the package #22
- Added
pre-commit
to the project to manage pre-commit hooks #22 - Added quick-start guide to docs #22
- Added code of conduct for the project #22
- Moved
testing/test_data.py
totests
folder #21 - Updated example notebooks to use California housing dataset from sklearn instead of Boston house prices dataset #21
- Changed
changelog
to berst
format and a changelog page added to docs #22 - Changed the default branch in the repository from
master
tomain
- Removed testing module and updated tests to use helpers from test-aide package #21
- Add github action to run pytest, flake8, black and bandit #10
- Modified
GroupRareLevelsTransformer
to remove the constraint type ofrare_level_name
being string, instead it must be the same type as the columns selected #13 - Fix failing
NullIndicator.transform
tests #14
- Update
NearestMeanResponseImputer
to remove fallback to median imputation when no nulls present in a column #10
- Open source release of the package on Github