Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Track v1.4] New PR branch to serve as submodule for scikit-tree #53

Open
wants to merge 333 commits into
base: main
Choose a base branch
from

Conversation

adam2392
Copy link
Collaborator

Reference Issues/PRs

As of v0.2 for sktree, we have decided we do not need a custom built and released via pypi scikit-learn fork. Instead, we just have to keep an updated fork branch here that maintains the changes under tree/ and ensemble/.

This branch has significantly lower diff and less complexity compared to e.g. #44

What does this implement/fix? Explain your changes.

Any other comments?

@github-actions
Copy link

github-actions bot commented Aug 11, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: dda0df6. Link to the linter CI: here

@PSSF23

This comment was marked as outdated.

Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot reproduce the error on my local machine with following code:

from sklearn.ensemble import RandomForestClassifier

import numpy as np
from inspect import signature

rnd = np.random.RandomState(0)
n_samples = 30
X = rnd.uniform(size=(n_samples, 3))
y = np.arange(n_samples)

clf_1 = RandomForestClassifier()
clf_1.set_params(random_state=0)

func = getattr(clf_1, "fit", None)
func(X,y)
args = [p.name for p in signature(func).parameters.values()]

func = getattr(clf_1, "score", None)
func(X,y)
args = [p.name for p in signature(func).parameters.values()]

func = getattr(clf_1, "partial_fit", None)
func(X,y)
args = [p.name for p in signature(func).parameters.values()]

The code should replicate most of the check_fit_score_takes_y test, but it runs smoothly every time. I also don't understand why only these 2 CIs failed when all should have the same test library.

  • Linux_Runs pylatest_conda_forge_mkl
  • macOS pylatest_conda_mkl_no_openmp

@adam2392
Copy link
Collaborator Author

I cannot reproduce the error on my local machine with following code:

from sklearn.ensemble import RandomForestClassifier

import numpy as np
from inspect import signature

rnd = np.random.RandomState(0)
n_samples = 30
X = rnd.uniform(size=(n_samples, 3))
y = np.arange(n_samples)

clf_1 = RandomForestClassifier()
clf_1.set_params(random_state=0)

func = getattr(clf_1, "fit", None)
func(X,y)
args = [p.name for p in signature(func).parameters.values()]

func = getattr(clf_1, "score", None)
func(X,y)
args = [p.name for p in signature(func).parameters.values()]

func = getattr(clf_1, "partial_fit", None)
func(X,y)
args = [p.name for p in signature(func).parameters.values()]

The code should replicate most of the check_fit_score_takes_y test, but it runs smoothly every time. I also don't understand why only these 2 CIs failed when all should have the same test library.

  • Linux_Runs pylatest_conda_forge_mkl
  • macOS pylatest_conda_mkl_no_openmp

If you try running the HonestForestClassifier instead, then the error seems to be able to be produced on my computer

@PSSF23
Copy link
Member

PSSF23 commented Aug 25, 2023

@adam2392 Is sktree updated with the current submodule? After the resizing bug is fixed?

@adam2392
Copy link
Collaborator Author

Yeah it should be. The test is commented out tho rn.

@PSSF23

This comment was marked as outdated.

Copy link
Member

@PSSF23 PSSF23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All CIs passed.

@adam2392 adam2392 force-pushed the submodulev3 branch 2 times, most recently from 62f0c60 to ea330a7 Compare September 8, 2023 18:36
@adam2392
Copy link
Collaborator Author

adam2392 commented Oct 6, 2023

TODO: Change

int -> intp_t
double -> float64_t
SIZE_t -> intp_t
DTYPE_t -> float32_t
INT32_t -> int32_t

see: https://github.com/scikit-learn/scikit-learn/pull/27352/files and related PRs

@adam2392
Copy link
Collaborator Author

adam2392 commented Oct 9, 2023

TODO: Change

int -> intp_t double -> float64_t SIZE_t -> intp_t DTYPE_t -> float32_t INT32_t -> int32_t

see: https://github.com/scikit-learn/scikit-learn/pull/27352/files and related PRs

This was accomplished in 9a5d91b

@adam2392
Copy link
Collaborator Author

adam2392 commented Mar 10, 2024

5ccd00f introduces a major change, where n_constant_features is migrated to WITHIN the SplitRecord. This is to enable over-riding what gets passed from parent node to child node.

I.e. MultiviewSplitRecord could in principle override SplitRecord and store additional n_constant_features. One per feature set. Ofc, I don't think this is actually the best design cuz the idea of "multi-view" really breaks away from the underlying assumption that there is only one feature set.

@adam2392
Copy link
Collaborator Author

adam2392 commented Sep 6, 2024

We should remove binning for now as it is a untested feature...

Signed-off-by: Adam Li <[email protected]>
adrinjalali and others added 30 commits October 29, 2024 09:44
…ant and keep_empty_features is False (scikit-learn#29950)

Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Marc Torrellas Socastro <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
…mes to contribution (scikit-learn#30177)

Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…t actually support NaN values (scikit-learn#25330)

Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: adrinjalali <[email protected]>
Signed-off-by: Adam Li <[email protected]>
<!--
Thanks for contributing a pull request! Please ensure you have taken a
look at
the contribution guidelines:
https://github.com/scikit-learn/scikit-learn/blob/main/CONTRIBUTING.md
-->

#### Reference Issues/PRs
neurodata/treeple#339
<!--
Example: Fixes scikit-learn#1234. See also scikit-learn#3456.
Please use keywords (e.g., Fixes) to create link to the issues or pull
requests
you resolved, so that they will automatically be closed when your pull
request
is merged. See
https://github.com/blog/1506-closing-issues-via-pull-requests
-->


#### What does this implement/fix? Explain your changes.


#### Any other comments?


<!--
Please be aware that we are a loose team of volunteers so patience is
necessary; assistance handling other issues is very welcome. We value
all user contributions, no matter how minor they are. If we are slow to
review, either the pull request needs some benchmarking, tinkering,
convincing, etc. or more likely the reviewers are simply busy. In either
case, we ask for your understanding during the review process.
For more information, see our FAQ on this topic:

https://scikit-learn.org/dev/faq.html#why-is-my-pull-request-not-getting-any-attention.

Thanks for contributing!
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.