Skip to content

Latest commit

 

History

History
409 lines (322 loc) · 25.2 KB

CHANGELOG.rst

File metadata and controls

409 lines (322 loc) · 25.2 KB

Changelog

This changelog follows the great advice from https://keepachangelog.com/.

Each section will have a title of the format X.Y.Z (YYYY-MM-DD) giving the version of the package and the date of release of that version. Unreleased changes i.e. those that have been merged into main (e.g. with a .dev suffix) but which are not yet in a new release (on pypi) are added to the changelog but with the title X.Y.Z (unreleased). Unreleased sections can be combined when they are released and the date of release added to the title.

Subsections for each version can be one of the following;

  • Added for new features.
  • Changed for changes in existing functionality.
  • Deprecated for soon-to-be removed features.
  • Removed for now removed features.
  • Fixed for any bug fixes.
  • Security in case of vulnerabilities.

Each individual change should have a link to the pull request after the description of the change.

1.4.2 (unreleased)

Changed

type-conscious, they now rely on an input 'return_dtypes' dict arg. #369 <https://github.com/lvgig/tubular/issues/369>_ - As part of #369, updated OrdinalEncoderTransformer to output Int8 type - As part of #369, updated NominalToIntegerTransformer to output Int8 type. Removed inverse_mapping functionality, as this is more complicated when transform is opinionated on types. - narwhalified GroupRareLevelsTransformer. As part of this, had to make transformer more opinionated and refuse columns with nulls (raises an error directing to imputers.) #372 <https://github.com/lvgig/tubular/issues/372>_ - narwhalified BaseDatetimeTransformer #375 <#375> - Optional wanted_levels feature has been integrated into the OneHotEncodingTransformer which allows users to specify which levels in a column they wish to encode. #384 <https://github.com/azukds/tubular/issues/384>_ - Created unit tests to check if the values provided for wanted_values are as expected and if the output is as expected. - placeholder - placeholder - placeholder - placeholder

1.4.1 (02/12/2024)

Changed

1.4.0 (2024-10-15)

Changed

  • Modified OneHotEncodingTransformer, made an instance of OneHotEncoder and assign it to attribut _encoder #308 <#309>
  • Refactored BaseDateTransformer, BaseDateTwoColumnTransformer and associated testing #273
  • BaseTwoColumnTransformer removed in favour of mixin classes TwoColumnMixin and NewColumnNameMixin to handle validation of two columns and new_column_name arguments #273
  • Refactored tests for InteractionTransformer #283
  • Refactored tests for StringConcatenator and SeriesStrMethodTransformer, added separator mixin class. #286
  • Refactored MeanResponseTransformer tests in new format #262
  • refactored build tools and package config into pyproject.toml #271
  • set up automatic versioning using setuptools-scm #271
  • Refactored TwoColumnOperatorTransformer tests in new format #274
  • Refactored PCATransformer tests in new format #277
  • Refactored tests for NullIndicator #301
  • Refactored BetweenDatesTransformer tests in new format #294
  • As part of above, edited dates file transformers to use BaseDropOriginalMixin in transform
  • Refactored DateDifferenceTransformer tests in new format. Had to turn off autodefine new_column_name functionality to match generic test expectations. Suggest we look to turn back on in the future. #296 #296
  • Refactored DateDiffLeapYearTransformer tests in new format. As part of this had to remove the autodefined new_column_name, as this conflicts with the generic testing. Suggest we look to turn back on in future. #295 #295
  • Edited base testing setup for dates file, created new BaseDatetimeTransformer class
  • Refactored DatetimeInfoExtractor tests in new format #297
  • Refactored DatetimeSinusoidCalculator tests in new format. #310
  • fixed a bug in CappingTransformer which was preventing use of .get_params method #311
  • Setup requirements for narwhals, remove python3.8 from our build pipelines as incompatible with polars
  • Narwhal-ified BaseTransformer #313 <https://github.com/lvgig/tubular/issues/313>_
  • Refactored ToDatetimeTransformer tests in new format #300
  • Refactors tests for SeriesDtMethodTransformer in new format. Changed column arg to columns to fit generic format. #299 <https://github.com/lvgig/tubular/issues/299>_
  • Refactored OrdinalEncoderTransformer tests in new format #330
  • Narwhal-ified NullIndicator #319 <https://github.com/lvgig/tubular/issues/319>_
  • Narwhal-ified NearestMeanResponseImputer #320 <https://github.com/lvgig/tubular/issues/320>_
  • Narwhal-ified MedianImputer #317 <https://github.com/lvgig/tubular/issues/317>_

1.3.1 (2024-07-18)

Changed

  • Refactored NominalToIntegerTransformer tests in new format #261
  • Refactored GroupRareLevelsTransformer tests in new format #259
  • DatetimeInfoExtractor.mappings_provided changed from a dict.keys() object to list so transformer is serialisable. #258
  • Created BaseNumericTransformer class to support test refactor of numeric file #266
  • Updated testing approach for LogTransformer #268
  • Refactored ScalingTransformer tests in new format #284

1.3.0 (2024-06-13)

Added

  • Inheritable tests for generic base behaviours for base transformer in base_tests.py, with fixtures to allow for this in conftest.py
  • Split existing input check into two better defined checks for TwoColumnOperatorTransformer #183
  • Created unit tests for checking column type and size #183
  • Automated weights column checks through a mixin class and captured common weight tests in generic test classes for weighted transformers

Changed

  • Standardised naming of weight arg across transformers
  • Update DataFrameMethodTransformer tests to have inheritable init class that can be used by othe test files.
  • Moved BaseTransformer, DataFrameMethodTransformer, BaseMappingTransformer, BaseMappingTransformerMixin, CrossColumnMappingTransformer and Mapping Transformer over to the new testing framework.
  • Refactored MappingTransformer by removing redundant init method.
  • Refactored tests for ColumnDtypeSetter, and renamed (from SetColumnDtype)
  • Refactored tests for SetValueTransformer
  • Refactored ArbitraryImputer by removing redundant fillna call in transform method. This should increase tubular's efficiency and maintainability.
  • Fixed bugs in MedianImputer and ModeImputer where they would error for all null columns.
  • Refactored ArbitraryImputer and BaseImputer tests in new format.
  • Refactored MedianImputer tests in new format.
  • Replaced occurrences of pd.Dataframe.drop() with del statement to speed up tubular. Note that no additional unit testing has been done for copy=False as this release is scheduled to remove copy.
  • Created BaseCrossColumnNumericTransformer class. Refactored CrossColumnAddTransformer and CrossColumnMultiplyTransformer to use this class. Moved tests for these objects to new approach.
  • Created BaseCrossColumnMappingTransformer class and integrated into CrossColumnMappingTransformer tests
  • Refactored BaseNominalTransformer tests in new format & moved its logic to the transform method.
  • Refactored ModeImputer tests in new format.
  • Added generic init tests to base tests for transformers that take two columns as an input.
  • Refactored EqualityChecker tests in new format.
  • Bugfix to MeanResponseTransformer to ignore unobserved categorical levels
  • Refactored dates.py to prepare for testing refactor. Edited BaseDateTransformer (and created BaseDateTwoColumnTransformer) to follow standard format, implementing validations at init/fit/transform. To reduce complexity of file, made transformers more opinionated to insist on specific and consistent column dtypes. #246
  • Added test_BaseTwoColumnTransformer base class for columns that require a list of two columns for input
  • Added BaseDropOriginalMixin to mixin transformers to handle validation and method of dropping original features, also added appropriate test classes.
  • Refactored MeanImputer tests in new format #250
  • Refactored DatetimeInfoExtractor to condense and improve readability
  • added minimal_dataframe_lookup fixture to conftest, and edited generic tests to use this
  • Alphabetised the minimial attribute dictionary for readability.
  • Refactored OHE transformer tests to align with new testing framework.
  • Moved fixtures relating only to a single test out of conftest and into testing script where utilised.
  • !!!Introduced dependency on Sklearn's OneHotEncoder by adding test to check OHE transformer (which we are calling from within our OHE wrapper) is fit before transform
  • Refactored NearestMeanResponseImputer in line with new testing framework.

Removed

  • Functionality for BaseTransformer (and thus all transformers) to take None as an option for columns. This behaviour was inconsistently implemented across transformers. Rather than extending to all we decided to remove this functionality. This required updating a lot of test files.
  • The columns_set_or_check() method from BaseTransformer. With the above change it was no longer necessary. Subsequent updates to nominal transformers and their tests were required.
  • Set pd copy_on_write to True (will become default in pandas 3.0) which allowed the functionality of the copy method of the transformers to be dropped #197

1.2.2 (2024-02-20)

Added

  • Created unit test for checking if log1p is working and well conditioned for small x #178

Changed

  • Changed LogTransformer to use log1p(x) instead of log(x+1) #178
  • Changed unit tests using log(x+1) to log1p(x) #178

1.2.1 (2024-02-08)

Added

  • Updated GroupRareLevelsTransformer so that when working with category dtypes it forgets categories encoded as rare (this is wanted behaviour as these categories are no longer present in the data) #177

1.2.0 (2024-02-06)

Added

  • Update OneHotEncodingTransformer to default to returning int8 columns #175
  • Updated NullIndicator to return int8 columns #173
  • Updated MeanResponseTransformer to coerce return to float (useful behaviour for category type features) #174

1.1.1 (2024-01-18)

Added

  • added type hints #128
  • added some error handling to transform method of nominal transformers #162
  • added new release pipeline #161

1.1.0 (2023-12-19)

Added

  • added flake8_bugbear (B) to ruff rules #131
  • added flake8_datetimez (DTZ) to ruff rules #132
  • added option to avoid passing unseen levels to rare in GroupRareLevelsTransformer #141

Changed

  • minor changes to comply with flake8_bugbear (B) ruff rules #131
  • minor changes to comply with flake8_datetimez (DTZ) ruff rules #132
  • BaseMappingTransformerMixin chnaged to use Dataframe.replace rather than looping over columns #135
  • MeanResponseTransformer.map_imputer_values() added to decouple from BaseMappingTransformerMixin #135
  • BaseDateTransformer added to standardise datetime data handling #148

Removed

  • removed some unnescessary implementation tests #130
  • ReturnKeyDict class removed #135

1.0.0 (2023-07-24)

Changed

  • now compatible with pandas>=2.0.0 #123
  • DateDifferenceTransformer no longer supports 'Y' or 'M' units #123

0.3.8 (2023-07-10)

Changed

  • replaced flake8 with ruff linting. For a list of rules implemented, code changes made for compliance and further rule sets planned for future see PR #92

0.3.7 (2023-07-05)

Changed

  • minor change to GroupRareLevelsTransformer test_super_transform_called test to align with other cases #90
  • removed pin of scikit-learn version to <1.20 #90
  • update black version in pre-commit-config #90

0.3.6 (2023-05-24)

Added

  • added support for vscode dev container with python 3.8, requirments-dev.txt, pylance/gitlens extensions and precommit all preinstalled #83

Changed

  • added sklearn < 1.2 dependency #86

0.3.5 (2023-04-26)

Added

  • added support for handling unseen levels in MeanResponseTransformer #80

Changed

  • added pandas < 2.0.0 dependency #81

Deprecated

  • DateDifferenceTransformer M and Y units are incpompatible with pandas 2.0.0 and will be removed or changed in a future version #81

0.3.4 (2023-03-14)

Added

  • added support for passing multiple columns and periods/units parameters to DatetimeSinusoidCalculator #74
  • added support for handling a multi level response to MeanResponseTransformer #67

Changed

  • changed ArbitraryImputer to preserve the dtype of columns (previously would upcast dtypes like int8 or float32) #76

Fixed

  • fixed issue with OneHotencodingTransformer use of deprecated sklearn.OneHotEencoder.get_feature_names method #66

0.3.3 (2023-01-19)

Added

  • added support for prior mean encoding (regularised encodings) #46
  • added support for weights to mean, median and mode imputers #47
  • added classname() method to BaseTransformer and prefixed all errors with classname call for easier debugging #48
  • added DatetimeInfoExtractor transformer in tubular/dates.py associated tests with tests/dates/test_DatetimeInfoExtractor.py and examples with examples/dates/DatetimeInfoExtractor.ipynb #49
  • added DatetimeSinusoidCalculator in tubular/dates.py associated tests with tests/dates/test_DatetimeSinusoidCalculator.py and examples with examples/dates/DatetimeSinusoidCalculator.ipynb #50
  • added TwoColumnOperatorTransformer in tubular/numeric.py associated tests with tests/numeric/test_TwoColumnOperatorTransformer.py and examples with examples/dates/TwoColumnOperatorTransformer.ipynb #51
  • added StringConcatenator in tubular/strings.py associated tests with tests/strings/test_StringConcatenator.py and examples with examples/strings/StringConcatenator.ipynb #52
  • added SetColumnDtype in tubular/misc.py associated tests with tests/misc/test_StringConcatenator.py and examples with examples/strings/StringConcatenator.ipynb #53
  • added warning to MappingTransformer in tubular/mapping.py for unexpected changes in dtype #54
  • added new module tubular/comparison.py containing EqualityChecker. Also added associated tests with tests/comparison/test_EqualityChecker.py and examples with examples/comparison/EqualityChecker.ipynb #55
  • added PCATransformer in tubular/numeric.py associated tests with tests/misc/test_PCATransformer.py and examples with examples/numeric/PCATransformer.ipynb #57

Fixed

  • updated black version to 22.3.0 and flake8 version to 5.0.4 to fix compatibility issues #45
  • removed kwargs argument from BaseTransfomer in tubular/base.py to avoid silent erroring if incorrect arguments passed to transformers. Fixed a few tests which were revealed to have incorrect arguments passed by change #56

0.3.2 (2022-01-13)

Added

  • Added InteractionTransformer in tubular/numeric.py , associated tests with tests/numeric/test_InteractionTransformer.py file and examples with examples/numeric/InteractionTransformer.ipynb file.`#38 <#38>`_

0.3.1 (2021-11-09)

Added

  • Added tests/test_transformers.py file with test to be applied all transformers #30

Changed

  • Set min pandas version to 1.0.0 in requirements.txt, requirements-dev.txt, and docs/requirements.txt #31
  • Changed y argument in fit to only accept pd.Series objects #26
  • Added new _combine_X_y method to BaseTransformer which cbinds X and y #26
  • Updated MeanResponseTransformer to use y arg in fit and remove setting response_column in init #26
  • Updated OrdinalEncoderTransformer to use y arg in fit and remove setting response_column in init #26
  • Updated NearestMeanResponseImputer to use y arg in fit and remove setting response_column in init #26
  • Updated version of black used in the pre-commit-config to 21.9b0 #25
  • Modified DataFrameMethodTransformer to add the possibility of drop original columns #24

Fixed

  • Added attributes to date and numeric transformers to allow transformer to be printed #30
  • Removed copy of mappings in MappingTransformer to allow transformer to work with sklearn.base.clone #30
  • Changed data values used in some tests for MeanResponseTransformer so the test no longer depends on pandas <1.3.0 or >=1.3.0, required due to change #25 in pandas behaviour with groupby mean
  • BaseTransformer now correctly raises TypeError exceptions instead of ValueError when input values are the wrong type #26
  • Updated version of black used in the pre-commit-config to 21.9b0 #25

Removed

  • Removed pytest and pytest-mock from requirements.txt #31

0.3.0 (2021-11-03)

Added

  • Added scaler_kwargs as an empty attribute to the ScalingTransformer class to avoid an AttributeError raised by sklearn #21
  • Added test-aide package to requirements-dev.txt #21
  • Added logo for the package #22
  • Added pre-commit to the project to manage pre-commit hooks #22
  • Added quick-start guide to docs #22
  • Added code of conduct for the project #22

Changed

  • Moved testing/test_data.py to tests folder #21
  • Updated example notebooks to use California housing dataset from sklearn instead of Boston house prices dataset #21
  • Changed changelog to be rst format and a changelog page added to docs #22
  • Changed the default branch in the repository from master to main

Removed

  • Removed testing module and updated tests to use helpers from test-aide package #21

0.2.15 (2021-10-06)

Added

  • Add github action to run pytest, flake8, black and bandit #10

Changed

  • Modified GroupRareLevelsTransformer to remove the constraint type of rare_level_name being string, instead it must be the same type as the columns selected #13
  • Fix failing NullIndicator.transform tests #14

Removed

  • Update NearestMeanResponseImputer to remove fallback to median imputation when no nulls present in a column #10

0.2.14 (2021-04-23)

Added

  • Open source release of the package on Github