-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-16033: Add GBLinear #16034
GH-16033: Add GBLinear #16034
Conversation
aml_with_gblinear = H2OAutoML(max_models=MAX_MODELS, seed=1, modeling_plan=[ | ||
dict(name="XGBoost", steps=[ | ||
dict(id="def_2", group=1, weight=10), | ||
dict(id="def_1", group=2, weight=10), | ||
dict(id="def_3", group=3, weight=10), | ||
dict(id="grid_1", group=4, weight=90), | ||
dict(id="grid_gblinear", group=4, weight=90), # << XGBoost GBLinear booster grid | ||
dict(id="lr_search", group=7, weight=30), | ||
]), dict(name="GLM", steps=[ | ||
dict(id="def_1", group=1, weight=10), | ||
]), dict(name="DRF", steps=[ | ||
dict(id="def_1", group=2, weight=10), | ||
dict(id="XRT", group=3, weight=10), | ||
]), dict(name="GBM", steps=[ | ||
dict(id="def_5", group=1, weight=10), | ||
dict(id="def_2", group=2, weight=10), | ||
dict(id="def_3", group=2, weight=10), | ||
dict(id="def_4", group=2, weight=10), | ||
dict(id="def_1", group=3, weight=10), | ||
dict(id="grid_1", group=4, weight=60), | ||
dict(id="lr_annealing", group=7, weight=10), | ||
]), dict(name="DeepLearning", steps=[ | ||
dict(id="def_1", group=3, weight=10), | ||
dict(id="grid_1", group=4, weight=30), | ||
dict(id="grid_2", group=5, weight=30), | ||
dict(id="grid_3", group=5, weight=30), | ||
]), dict(name="completion", steps=[ | ||
dict(id="resume_best_grids", group=6, weight=60), | ||
]), dict(name="StackedEnsemble", steps=[ | ||
dict(id="monotonic", group=9, weight=10), | ||
dict(id="best_of_family_xglm", group=10, weight=10), | ||
dict(id="all_xglm", group=10, weight=10), | ||
])]) | ||
aml_with_gblinear.train(y=ds.target, training_frame=ds.train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example of how to run normal automl with gblinear grid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this what we want to give to the customer when they want to try gblinear?
I mean if it's one single customer, I think it's fine, but it fixes AutoML behavior once and for all…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK it's just one customer and from a limited benchmark gblinear
doesn't seem to bring much (except for higher training time). But it might be worth experimenting with once we have more gblinear
parameters exposed.
int ncols = aml().getTrainingFrame().numCols() - (aml().getBuildSpec().getNonPredictors().length + | ||
(aml().getBuildSpec().input_spec.ignored_columns != null ? aml().getBuildSpec().input_spec.ignored_columns.length : 0)); | ||
|
||
searchParams.put("_top_k", IntStream.range(0, ncols-1).boxed().toArray(Integer[]::new)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to add this in now. You can uncomment it once Adam exposed those parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG, thanks @tomasfryda
aml_with_gblinear = H2OAutoML(max_models=MAX_MODELS, seed=1, modeling_plan=[ | ||
dict(name="XGBoost", steps=[ | ||
dict(id="def_2", group=1, weight=10), | ||
dict(id="def_1", group=2, weight=10), | ||
dict(id="def_3", group=3, weight=10), | ||
dict(id="grid_1", group=4, weight=90), | ||
dict(id="grid_gblinear", group=4, weight=90), # << XGBoost GBLinear booster grid | ||
dict(id="lr_search", group=7, weight=30), | ||
]), dict(name="GLM", steps=[ | ||
dict(id="def_1", group=1, weight=10), | ||
]), dict(name="DRF", steps=[ | ||
dict(id="def_1", group=2, weight=10), | ||
dict(id="XRT", group=3, weight=10), | ||
]), dict(name="GBM", steps=[ | ||
dict(id="def_5", group=1, weight=10), | ||
dict(id="def_2", group=2, weight=10), | ||
dict(id="def_3", group=2, weight=10), | ||
dict(id="def_4", group=2, weight=10), | ||
dict(id="def_1", group=3, weight=10), | ||
dict(id="grid_1", group=4, weight=60), | ||
dict(id="lr_annealing", group=7, weight=10), | ||
]), dict(name="DeepLearning", steps=[ | ||
dict(id="def_1", group=3, weight=10), | ||
dict(id="grid_1", group=4, weight=30), | ||
dict(id="grid_2", group=5, weight=30), | ||
dict(id="grid_3", group=5, weight=30), | ||
]), dict(name="completion", steps=[ | ||
dict(id="resume_best_grids", group=6, weight=60), | ||
]), dict(name="StackedEnsemble", steps=[ | ||
dict(id="monotonic", group=9, weight=10), | ||
dict(id="best_of_family_xglm", group=10, weight=10), | ||
dict(id="all_xglm", group=10, weight=10), | ||
])]) | ||
aml_with_gblinear.train(y=ds.target, training_frame=ds.train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this what we want to give to the customer when they want to try gblinear?
I mean if it's one single customer, I think it's fine, but it fixes AutoML behavior once and for all…
#16033
Initial step of adding
gblinear
to automl. Now only as an optional step.