You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, this is not a bug report but rather a feature request (not sure if this is the place or how).
It would be great to be able to specify the "value" an unseen val should take when using the OrdinalEncoder -- rather than fixing it to -1.
For instance, I would like to use this encoder as my preprocessing step, before calling a LightGBM classifier (which expects all categorical feature values to be non-negative integers), within a PMMLPipeline (which currently supports ce.OrdinalEncoder).
Nowadays, the way around this would be to construct my encoding mapping beforehand and specifying it as the mapping param, which is a bit of an overkill... am I missing something? is there another way?
Thanks!
The text was updated successfully, but these errors were encountered:
One of the big advantages of this library is a rather common interface to all the different encoders (e.g. for handling missing values or unknowns). It makes a lot of sense to keep this. So if we want to have this flexible we'd need to introduce it for all encoders and then it would be consequent to also have the missing (currently -2) flexible as well.
The downside to this is that another 2 parameters are introduced in the __init__ function which makes it somewhat big. The workaround is also rather easy, isnt it? you just replace the -1 with some other value using df.replace`.
But I'm open to discuss this further if a lot of people find it super convenient it can be worth it.
EDIT: I just realised the replace workaround won't work in pipelines (as you noted) and supplying a mapping indeed feels like an overkill
Hi, this is not a bug report but rather a feature request (not sure if this is the place or how).
It would be great to be able to specify the "value" an unseen val should take when using the
OrdinalEncoder
-- rather than fixing it to-1
.For instance, I would like to use this encoder as my preprocessing step, before calling a LightGBM classifier (which expects all categorical feature values to be non-negative integers), within a PMMLPipeline (which currently supports
ce.OrdinalEncoder
).Nowadays, the way around this would be to construct my encoding mapping beforehand and specifying it as the
mapping
param, which is a bit of an overkill... am I missing something? is there another way?Thanks!
The text was updated successfully, but these errors were encountered: