Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-16033: Add GBLinear #16034

Conversation

tomasfryda
Copy link
Contributor

@tomasfryda tomasfryda commented Jan 24, 2024

#16033

Initial step of adding gblinear to automl. Now only as an optional step.

@tomasfryda tomasfryda self-assigned this Jan 24, 2024
Comment on lines +82 to +115
aml_with_gblinear = H2OAutoML(max_models=MAX_MODELS, seed=1, modeling_plan=[
dict(name="XGBoost", steps=[
dict(id="def_2", group=1, weight=10),
dict(id="def_1", group=2, weight=10),
dict(id="def_3", group=3, weight=10),
dict(id="grid_1", group=4, weight=90),
dict(id="grid_gblinear", group=4, weight=90), # << XGBoost GBLinear booster grid
dict(id="lr_search", group=7, weight=30),
]), dict(name="GLM", steps=[
dict(id="def_1", group=1, weight=10),
]), dict(name="DRF", steps=[
dict(id="def_1", group=2, weight=10),
dict(id="XRT", group=3, weight=10),
]), dict(name="GBM", steps=[
dict(id="def_5", group=1, weight=10),
dict(id="def_2", group=2, weight=10),
dict(id="def_3", group=2, weight=10),
dict(id="def_4", group=2, weight=10),
dict(id="def_1", group=3, weight=10),
dict(id="grid_1", group=4, weight=60),
dict(id="lr_annealing", group=7, weight=10),
]), dict(name="DeepLearning", steps=[
dict(id="def_1", group=3, weight=10),
dict(id="grid_1", group=4, weight=30),
dict(id="grid_2", group=5, weight=30),
dict(id="grid_3", group=5, weight=30),
]), dict(name="completion", steps=[
dict(id="resume_best_grids", group=6, weight=60),
]), dict(name="StackedEnsemble", steps=[
dict(id="monotonic", group=9, weight=10),
dict(id="best_of_family_xglm", group=10, weight=10),
dict(id="all_xglm", group=10, weight=10),
])])
aml_with_gblinear.train(y=ds.target, training_frame=ds.train)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of how to run normal automl with gblinear grid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what we want to give to the customer when they want to try gblinear?
I mean if it's one single customer, I think it's fine, but it fixes AutoML behavior once and for all…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK it's just one customer and from a limited benchmark gblinear doesn't seem to bring much (except for higher training time). But it might be worth experimenting with once we have more gblinear parameters exposed.

@tomasfryda tomasfryda added this to the 3.46.0.1 milestone Jan 25, 2024
int ncols = aml().getTrainingFrame().numCols() - (aml().getBuildSpec().getNonPredictors().length +
(aml().getBuildSpec().input_spec.ignored_columns != null ? aml().getBuildSpec().input_spec.ignored_columns.length : 0));

searchParams.put("_top_k", IntStream.range(0, ncols-1).boxed().toArray(Integer[]::new));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to add this in now. You can uncomment it once Adam exposed those parameters.

Copy link
Contributor

@sebhrusen sebhrusen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks @tomasfryda

Comment on lines +82 to +115
aml_with_gblinear = H2OAutoML(max_models=MAX_MODELS, seed=1, modeling_plan=[
dict(name="XGBoost", steps=[
dict(id="def_2", group=1, weight=10),
dict(id="def_1", group=2, weight=10),
dict(id="def_3", group=3, weight=10),
dict(id="grid_1", group=4, weight=90),
dict(id="grid_gblinear", group=4, weight=90), # << XGBoost GBLinear booster grid
dict(id="lr_search", group=7, weight=30),
]), dict(name="GLM", steps=[
dict(id="def_1", group=1, weight=10),
]), dict(name="DRF", steps=[
dict(id="def_1", group=2, weight=10),
dict(id="XRT", group=3, weight=10),
]), dict(name="GBM", steps=[
dict(id="def_5", group=1, weight=10),
dict(id="def_2", group=2, weight=10),
dict(id="def_3", group=2, weight=10),
dict(id="def_4", group=2, weight=10),
dict(id="def_1", group=3, weight=10),
dict(id="grid_1", group=4, weight=60),
dict(id="lr_annealing", group=7, weight=10),
]), dict(name="DeepLearning", steps=[
dict(id="def_1", group=3, weight=10),
dict(id="grid_1", group=4, weight=30),
dict(id="grid_2", group=5, weight=30),
dict(id="grid_3", group=5, weight=30),
]), dict(name="completion", steps=[
dict(id="resume_best_grids", group=6, weight=60),
]), dict(name="StackedEnsemble", steps=[
dict(id="monotonic", group=9, weight=10),
dict(id="best_of_family_xglm", group=10, weight=10),
dict(id="all_xglm", group=10, weight=10),
])])
aml_with_gblinear.train(y=ds.target, training_frame=ds.train)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what we want to give to the customer when they want to try gblinear?
I mean if it's one single customer, I think it's fine, but it fixes AutoML behavior once and for all…

@tomasfryda tomasfryda merged commit e69e2ce into master Jan 30, 2024
66 of 68 checks passed
@tomasfryda tomasfryda deleted the tomf_GH-8381_AutoML_experiment_XGB_with_different_booster_gblinear_now_safe_to_use branch January 30, 2024 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants