GH-16033: Add GBLinear #16034

tomasfryda · 2024-01-24T16:40:51Z

Initial step of adding gblinear to automl. Now only as an optional step.

tomasfryda · 2024-01-25T11:34:31Z

h2o-py/tests/testdir_algos/automl/pyunit_automl_xgboost_gblinear_large.py

+    aml_with_gblinear = H2OAutoML(max_models=MAX_MODELS, seed=1, modeling_plan=[
+        dict(name="XGBoost", steps=[
+            dict(id="def_2", group=1, weight=10),
+            dict(id="def_1", group=2, weight=10),
+            dict(id="def_3", group=3, weight=10),
+            dict(id="grid_1", group=4, weight=90),
+            dict(id="grid_gblinear", group=4, weight=90),  # << XGBoost GBLinear booster grid
+            dict(id="lr_search", group=7, weight=30),
+        ]), dict(name="GLM", steps=[
+            dict(id="def_1", group=1, weight=10),
+        ]), dict(name="DRF", steps=[
+            dict(id="def_1", group=2, weight=10),
+            dict(id="XRT", group=3, weight=10),
+        ]), dict(name="GBM", steps=[
+            dict(id="def_5", group=1, weight=10),
+            dict(id="def_2", group=2, weight=10),
+            dict(id="def_3", group=2, weight=10),
+            dict(id="def_4", group=2, weight=10),
+            dict(id="def_1", group=3, weight=10),
+            dict(id="grid_1", group=4, weight=60),
+            dict(id="lr_annealing", group=7, weight=10),
+        ]), dict(name="DeepLearning", steps=[
+            dict(id="def_1", group=3, weight=10),
+            dict(id="grid_1", group=4, weight=30),
+            dict(id="grid_2", group=5, weight=30),
+            dict(id="grid_3", group=5, weight=30),
+        ]), dict(name="completion", steps=[
+            dict(id="resume_best_grids", group=6, weight=60),
+        ]), dict(name="StackedEnsemble", steps=[
+            dict(id="monotonic", group=9, weight=10),
+            dict(id="best_of_family_xglm", group=10, weight=10),
+            dict(id="all_xglm", group=10, weight=10),
+        ])])
+    aml_with_gblinear.train(y=ds.target, training_frame=ds.train)


Example of how to run normal automl with gblinear grid.

is this what we want to give to the customer when they want to try gblinear?
I mean if it's one single customer, I think it's fine, but it fixes AutoML behavior once and for all…

AFAIK it's just one customer and from a limited benchmark gblinear doesn't seem to bring much (except for higher training time). But it might be worth experimenting with once we have more gblinear parameters exposed.

wendycwong · 2024-01-26T17:23:19Z

h2o-automl/src/main/java/ai/h2o/automl/modeling/XGBoostSteps.java

+            int ncols = aml().getTrainingFrame().numCols() - (aml().getBuildSpec().getNonPredictors().length +
+                    (aml().getBuildSpec().input_spec.ignored_columns != null ? aml().getBuildSpec().input_spec.ignored_columns.length : 0));
+
+            searchParams.put("_top_k", IntStream.range(0, ncols-1).boxed().toArray(Integer[]::new));


Nice to add this in now. You can uncomment it once Adam exposed those parameters.

sebhrusen

LG, thanks @tomasfryda

sebhrusen · 2024-01-29T16:12:28Z

h2o-py/tests/testdir_algos/automl/pyunit_automl_xgboost_gblinear_large.py

+    aml_with_gblinear = H2OAutoML(max_models=MAX_MODELS, seed=1, modeling_plan=[
+        dict(name="XGBoost", steps=[
+            dict(id="def_2", group=1, weight=10),
+            dict(id="def_1", group=2, weight=10),
+            dict(id="def_3", group=3, weight=10),
+            dict(id="grid_1", group=4, weight=90),
+            dict(id="grid_gblinear", group=4, weight=90),  # << XGBoost GBLinear booster grid
+            dict(id="lr_search", group=7, weight=30),
+        ]), dict(name="GLM", steps=[
+            dict(id="def_1", group=1, weight=10),
+        ]), dict(name="DRF", steps=[
+            dict(id="def_1", group=2, weight=10),
+            dict(id="XRT", group=3, weight=10),
+        ]), dict(name="GBM", steps=[
+            dict(id="def_5", group=1, weight=10),
+            dict(id="def_2", group=2, weight=10),
+            dict(id="def_3", group=2, weight=10),
+            dict(id="def_4", group=2, weight=10),
+            dict(id="def_1", group=3, weight=10),
+            dict(id="grid_1", group=4, weight=60),
+            dict(id="lr_annealing", group=7, weight=10),
+        ]), dict(name="DeepLearning", steps=[
+            dict(id="def_1", group=3, weight=10),
+            dict(id="grid_1", group=4, weight=30),
+            dict(id="grid_2", group=5, weight=30),
+            dict(id="grid_3", group=5, weight=30),
+        ]), dict(name="completion", steps=[
+            dict(id="resume_best_grids", group=6, weight=60),
+        ]), dict(name="StackedEnsemble", steps=[
+            dict(id="monotonic", group=9, weight=10),
+            dict(id="best_of_family_xglm", group=10, weight=10),
+            dict(id="all_xglm", group=10, weight=10),
+        ])])
+    aml_with_gblinear.train(y=ds.target, training_frame=ds.train)


is this what we want to give to the customer when they want to try gblinear?
I mean if it's one single customer, I think it's fine, but it fixes AutoML behavior once and for all…

Add GBLinear

1265403

tomasfryda self-assigned this Jan 24, 2024

Add example with normal automl run + gblinear

6b3b15c

tomasfryda commented Jan 25, 2024

View reviewed changes

tomasfryda added the please review label Jan 25, 2024

tomasfryda added this to the 3.46.0.1 milestone Jan 25, 2024

tomasfryda requested review from wendycwong and sebhrusen January 25, 2024 11:35

Fix java tests

5211045

wendycwong reviewed Jan 26, 2024

View reviewed changes

wendycwong approved these changes Jan 26, 2024

View reviewed changes

sebhrusen approved these changes Jan 29, 2024

View reviewed changes

tomasfryda merged commit e69e2ce into master Jan 30, 2024
66 of 68 checks passed

tomasfryda deleted the tomf_GH-8381_AutoML_experiment_XGB_with_different_booster_gblinear_now_safe_to_use branch January 30, 2024 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-16033: Add GBLinear #16034

GH-16033: Add GBLinear #16034

tomasfryda commented Jan 24, 2024 •

edited

Loading

tomasfryda Jan 25, 2024

sebhrusen Jan 29, 2024

tomasfryda Jan 30, 2024

wendycwong Jan 26, 2024

sebhrusen left a comment

sebhrusen Jan 29, 2024

GH-16033: Add GBLinear #16034

GH-16033: Add GBLinear #16034

Conversation

tomasfryda commented Jan 24, 2024 • edited Loading

tomasfryda Jan 25, 2024

Choose a reason for hiding this comment

sebhrusen Jan 29, 2024

Choose a reason for hiding this comment

tomasfryda Jan 30, 2024

Choose a reason for hiding this comment

wendycwong Jan 26, 2024

Choose a reason for hiding this comment

sebhrusen left a comment

Choose a reason for hiding this comment

sebhrusen Jan 29, 2024

Choose a reason for hiding this comment

tomasfryda commented Jan 24, 2024 •

edited

Loading