Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-6722 constraint glm #16035

Merged
merged 24 commits into from
Apr 26, 2024
Merged

Conversation

wendycwong
Copy link
Contributor

@wendycwong wendycwong commented Jan 25, 2024

issue: #6722

I added linear constraint support to our GLM toolbox. However, I am working on adding the following:

  • tests in java for ExactLinearSearch
  • tests in Python and R.

Guys:

I know this is not complete but I would like you guys to start checking my implementation earlier than later. Sorry for getting it done so late.

@wendycwong
Copy link
Contributor Author

wendycwong commented Jan 25, 2024

Here is the doc that I have implemented the algo based on:

H2O Constrained GLM Implementation.pdf

@wendycwong wendycwong requested a review from tomasfryda February 1, 2024 23:51

// copy a square array
public static double[][] copy2DArray(double[][] src_array) {
double[][] dest_array = new double[src_array.length][src_array.length];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use the MemoryManager? Also it seems to me that copy2DArray(double[][], double[][]) supports any 2d array so I'd prefer not limiting the functionality without changing the name. So either I would change the memory allocation like this:

double[][] dest_array = new double[src_array.length][];
for(int i = 0; i < src_array.length; ++i)
    dest_array[i] = MemoryManager.malloc8d(src_array[i].length);

or just rename it to copySquareArray or something like that. It still might be worth considering changing the memory allocation to use the MemoryManager. (I still don't know the code well enough to know if it copies just small matrices or even the big ones but if you think it's ok to have this memory allocation that's good enough for me).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the change but kept the name because I want to copy rectangular arrays too.

@wendycwong wendycwong added the do not merge For PRs that are not supposed to be merged label Feb 5, 2024
@wendycwong wendycwong force-pushed the wendy_gh_6722_constraint_GLM_master branch from 4ecc837 to 3550df2 Compare February 6, 2024 16:47
@wendycwong wendycwong force-pushed the wendy_gh_6722_constraint_GLM_master branch 2 times, most recently from b25636f to da3fa6c Compare February 26, 2024 16:39
@wendycwong wendycwong force-pushed the wendy_gh_6722_constraint_GLM_master branch from a5b56e9 to 728600e Compare February 29, 2024 18:11
error("_max_iterations", H2O.technote(2, "if specified, must be >= 1."));
warn("max_iterations", " must be >= 1 to obtain proper model. Setting it to be 0 will only" +
" return the correct coefficient names and an empty model.");
//error("_max_iterations", H2O.technote(2, "if specified, m_be4ust be >= 1."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you'd check that those linear_constrains are valid even in the expensive == false branch. I think it's an unwritten contract that we expect e.g. in h2o-flow. Just a simple check that would validate:

  • the strings are as expected ("lessthanequal" vs "less_than_equal")
  • basic validity/satisfiability of constraints (e.g., x > 10 and x < 5 are not satisfiable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved some checks to be before expensive=true. However, the more complicated checks need information like _dinfo and there cannot be moved.

@wendycwong wendycwong force-pushed the wendy_gh_6722_constraint_GLM_master branch from 54d0cab to 5167ca7 Compare March 14, 2024 19:28
@wendycwong wendycwong removed the do not merge For PRs that are not supposed to be merged label Mar 22, 2024
wendycwong and others added 18 commits April 17, 2024 13:29
…waiting for complete model building process. User just need to set max_iterations=0 and then call the model.coef_names() in python or h2o.coef_names(model) in R.

GH-6722: extract constraints from betaConstraints and from linear constraints with and without standardization.
GH-6722: complete tests to make sure constraint extraction with or without standardization is correct.
GH-6722: Streamline GLMConstrainedTest, build matrix representing beta and linear constraints.
GH-6722: adding redundant constraint matrix check
GH-6722: add QR to check for rank of constraint matrix and added test
GH-6722: adding error message about redundant constraints so that user will know which one to remove.
GH-6722: added contributinos from constraints for objective function.
GH-6722: adding constraint contribution to gram and gradient
GH-6722: added test to make sure constraints contribution to gram and gradient is correct.
GH-6722: Added constraint parameters update and constraint stopping conditions.
GH-6722: finished taken care of gram zero cols and added tests.
GH-6722: Add exact line search.
GH-6722: Complete constraints/derivative/gram contribution update from beta change.  Add short circuit test.
GH-6722: Allow users to set parameters so that they can control how the constraint parameters change.
GH-6722 moved some linear constraint checks before expensive=true
GH-6722 make objective() the function call to get training objective results with linear constraints.

incorporate Tomas F comments
…nto one and then throw an error for all the errors.
@wendycwong wendycwong force-pushed the wendy_gh_6722_constraint_GLM_master branch from 31d3f45 to 51bcfbd Compare April 17, 2024 21:14
@wendycwong wendycwong requested a review from valenad1 April 26, 2024 16:04
Copy link
Collaborator

@valenad1 valenad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank you!

@valenad1 valenad1 dismissed maurever’s stale review April 26, 2024 16:09

Unblocking merge

@wendycwong wendycwong merged commit cef5321 into rel-3.46.0 Apr 26, 2024
68 checks passed
@wendycwong wendycwong deleted the wendy_gh_6722_constraint_GLM_master branch April 26, 2024 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants