Problems with random forest classifier when using more and deeper leaners #80

GradOpt · 2023-01-09T11:17:32Z

Hi, I'm new to thunderGBM,

I just run the example of random forest
'''
from thundergbm import TGBMClassifier
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score

x, y = load_digits(return_X_y=True)
clf = TGBMClassifier(bagging=1,depth=12, n_trees=1,n_parallel_trees=100)
clf.fit(x, y)
y_pred = clf.predict(x)
accuracy = accuracy_score(y, y_pred)
print(accuracy)
'''
and several problems have arisen:

First, I watch the verbose and found that, when set "n_trees=1", the classifier only use 1 leaner, no matter how I set the value of "n_parallel_trees", contrary to the claim in issue #42 .

Furthermore, I try more and deeper learners, when "depth" is more than 20, or "n_trees" more than 70, the program may well crash. When I use python file, it turns out to be a Segmentation fault (core dumped), when I use jupyter notebook, the kernel died. When I try a large dataset with millions of samples, it crashed even when converting csr to csc. Cause I'm using a workstation with a CPU of 32 cores, 128 GB memory, and a RTX 3090 GPU, I don't believe this is a hardware issue. Is thunderGBM only capable to train really small forests on small datasets ? That's unacceptable. I'm confused and hope to see the power of thunderGBM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with random forest classifier when using more and deeper leaners #80

Problems with random forest classifier when using more and deeper leaners #80

GradOpt commented Jan 9, 2023

Problems with random forest classifier when using more and deeper leaners #80

Problems with random forest classifier when using more and deeper leaners #80

Comments

GradOpt commented Jan 9, 2023