OrdinalEncoder unseen value spec #283

lqrz · 2020-11-15T13:17:16Z

Hi, this is not a bug report but rather a feature request (not sure if this is the place or how).

It would be great to be able to specify the "value" an unseen val should take when using the OrdinalEncoder -- rather than fixing it to -1.

For instance, I would like to use this encoder as my preprocessing step, before calling a LightGBM classifier (which expects all categorical feature values to be non-negative integers), within a PMMLPipeline (which currently supports ce.OrdinalEncoder).

Nowadays, the way around this would be to construct my encoding mapping beforehand and specifying it as the mapping param, which is a bit of an overkill... am I missing something? is there another way?

Thanks!

The text was updated successfully, but these errors were encountered:

lqrz · 2020-11-15T13:22:16Z

I understand this is going to be introduced in SKlearn's OrdinalEncoder in v.0.24.

PaulWestenthanner · 2023-01-31T15:53:57Z

One of the big advantages of this library is a rather common interface to all the different encoders (e.g. for handling missing values or unknowns). It makes a lot of sense to keep this. So if we want to have this flexible we'd need to introduce it for all encoders and then it would be consequent to also have the missing (currently -2) flexible as well.
The downside to this is that another 2 parameters are introduced in the __init__ function which makes it somewhat big. The workaround is also rather easy, isnt it? you just replace the -1 with some other value using df.replace`.
But I'm open to discuss this further if a lot of people find it super convenient it can be worth it.

EDIT: I just realised the replace workaround won't work in pipelines (as you noted) and supplying a mapping indeed feels like an overkill

PaulWestenthanner added enhancement discussion labels Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OrdinalEncoder unseen value spec #283

OrdinalEncoder unseen value spec #283

lqrz commented Nov 15, 2020

lqrz commented Nov 15, 2020 •

edited

Loading

PaulWestenthanner commented Jan 31, 2023 •

edited

Loading

OrdinalEncoder unseen value spec #283

OrdinalEncoder unseen value spec #283

Comments

lqrz commented Nov 15, 2020

lqrz commented Nov 15, 2020 • edited Loading

PaulWestenthanner commented Jan 31, 2023 • edited Loading

lqrz commented Nov 15, 2020 •

edited

Loading

PaulWestenthanner commented Jan 31, 2023 •

edited

Loading