Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with random forest classifier when using more and deeper leaners #80

Open
GradOpt opened this issue Jan 9, 2023 · 0 comments

Comments

@GradOpt
Copy link

GradOpt commented Jan 9, 2023

Hi, I'm new to thunderGBM,

I just run the example of random forest
'''
from thundergbm import TGBMClassifier
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score

x, y = load_digits(return_X_y=True)
clf = TGBMClassifier(bagging=1,depth=12, n_trees=1,n_parallel_trees=100)
clf.fit(x, y)
y_pred = clf.predict(x)
accuracy = accuracy_score(y, y_pred)
print(accuracy)
'''
and several problems have arisen:

First, I watch the verbose and found that, when set "n_trees=1", the classifier only use 1 leaner, no matter how I set the value of "n_parallel_trees", contrary to the claim in issue #42 .

Furthermore, I try more and deeper learners, when "depth" is more than 20, or "n_trees" more than 70, the program may well crash. When I use python file, it turns out to be a Segmentation fault (core dumped), when I use jupyter notebook, the kernel died. When I try a large dataset with millions of samples, it crashed even when converting csr to csc. Cause I'm using a workstation with a CPU of 32 cores, 128 GB memory, and a RTX 3090 GPU, I don't believe this is a hardware issue. Is thunderGBM only capable to train really small forests on small datasets ? That's unacceptable. I'm confused and hope to see the power of thunderGBM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant