Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_params() not working with XGBoost and gridsearch #16042

Open
wendycwong opened this issue Jan 30, 2024 · 2 comments
Open

get_params() not working with XGBoost and gridsearch #16042

wendycwong opened this issue Jan 30, 2024 · 2 comments
Assignees
Labels

Comments

@wendycwong
Copy link
Contributor

follow up to support ticket: https://support.h2o.ai/a/tickets/107319

Here is what Gen has run into:

import h2o
from h2o.estimators import H2OXGBoostEstimator
from h2o.grid.grid_search import H2OGridSearch

h2o.init()

prostate = h2o.import_file("http://s3.amazonaws.com/h2o-public-test-data/smalldata/prostate/prostate.csv.zip")

convert the CAPSULE column to a factor

prostate["CAPSULE"] = prostate["CAPSULE"].asfactor()
response = "CAPSULE"
seed = 1234

import random

GBM hyperparameters

gbm_params2 = {'learn_rate': [0.01],
'max_depth': [2],
'sample_rate': [0.1],
'col_sample_rate': [0.1],
'seed': random.sample(range(1, 1000), 100) # generating a sample of different seed values
}

#Monotone_constraints

monotone_constraints={"x3":1, "x5": 1}

Search criteria

search_criteria = {'strategy': 'RandomDiscrete', 'max_models': 2, 'seed': 1} # this will sample 36 different seed values from the options above

Train and validate a random grid of GBMs

gbm_grid = H2OGridSearch(model=H2OXGBoostEstimator(monotone_constraints={"AGE":1}),
grid_id='xgboostt_cap',
hyper_params=gbm_params2,
search_criteria=search_criteria,
)
gbm_grid.train(y=response, ignored_columns=["ID"], training_frame=prostate)

Get the grid results, sorted by validation AUC

gbm_gridperf2 = gbm_grid.get_grid(sort_by='auc', decreasing=True)
gbm_gridperf2

best_gbm2 = gbm_gridperf2.models[0]
best_gbm2.get_params()

image

@wendycwong
Copy link
Contributor Author

wendycwong commented Jan 30, 2024

I cobbled together the following and I can see that monotone constraint is set in the model:

assert H2OXGBoostEstimator.available() is True

# CPU Backend is forced for the results to be comparable
h2oParamsS = {"tree_method": "exact", "seed": 123, "backend": "cpu", "ntrees": 5}

trainFile = pyunit_utils.genTrainFrame(100, 10, enumCols=0, randseed=17)
print(trainFile)
myX = trainFile.names
y='response'
myX.remove(y)

h2oParamsS["monotone_constraints"] = {
    "C1": -1,
    "C3": 1,
    "C7": 1
}

gbm_params2 = {'learn_rate':[0.01, 0.02]}

gridM = H2OGridSearch(H2OXGBoostEstimator(**h2oParamsS), hyper_params=gbm_params2)
gridM.train(x=myX, y=y, training_frame=trainFile)
gridS = gridM.get_grid(sort_by="auc", decreasing=True)
best_gmb2 = gridS.models[0]
native_params2 = best_gmb2._model_json["output"]["native_parameters"].as_data_frame()
constraints2 = (native_params2[native_params2['name'] == "monotone_constraints"])['value'].values[0]
params = best_gmb2.get_params(deep=True)

h2oModelS = H2OXGBoostEstimator(**h2oParamsS)
h2oModelS.train(x=myX, y=y, training_frame=trainFile)

native_params = h2oModelS._model_json["output"]["native_parameters"].as_data_frame()
print(native_params)

constraints = (native_params[native_params['name'] == "monotone_constraints"])['value'].values[0]

assert constraints == u'(-1,0,1,0,0,0,1,0,0,0)'

Constraint2 is the same as constraints.

@wendycwong
Copy link
Contributor Author

Something is wrong with get_params()...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants