Skip to content

rapidsai/legate-boost

Folders and files

NameName
Last commit message
Last commit date

Latest commit

a265724 · Mar 17, 2025
Mar 6, 2025
Nov 26, 2024
Mar 17, 2025
Jan 2, 2025
Mar 17, 2025
Mar 6, 2025
Mar 6, 2025
Mar 6, 2025
Mar 6, 2025
Jan 15, 2024
May 12, 2023
Mar 6, 2025
Jul 30, 2024
Sep 6, 2023
Nov 19, 2024
Feb 26, 2025
Mar 6, 2025
Feb 15, 2023
Aug 2, 2024
Feb 6, 2025
Mar 17, 2025
Mar 6, 2025
Feb 27, 2025
Mar 17, 2025
Feb 17, 2025

legate-boost

GBM implementation on Legate. The primary goals of legate-boost is to provide a state-of-the-art distributed GBM implementation on Legate, capable of running on CPUs or GPUs at supercomputer scale.

API Documentation

For developers - see contributing

Installation

Install using conda.

# stable release
conda install -c legate -c conda-forge -c nvidia legate-boost

# nightly release
conda install -c legate/label/experimental -c legate -c conda-forge -c nvidia legate-boost

On systems without a GPU, the CPU-only package should automatically be installed. On systems with a GPU and compatible CUDA version, the GPU package should automatically be installed.

To force conda to prefer one, pass the build strings *_cpu* or *_gpu*, for example:

# nightly release (CPU-only)
conda install --dry-run -c legate/label/experimental -c legate -c conda-forge -c nvidia \
    'legate-boost=*=*_cpu*'

For more details on building from source and setting up a development environment, see contributing.md.

Simple example

Run with the legate launcher

legate example_script.py
>>> import cupynumeric as cn
>>> import legateboost as lb

>>> X = cn.random.random((1000, 10))
>>> y = cn.random.random(X.shape[0])
>>> model = lb.LBRegressor().fit(X, y)

Features

Model ensembling

legate-boost can create models from linear combinations of other models. Ensembling is as easy as:

>>> import cupynumeric as cn
>>> import legateboost as lb

>>> X = cn.random.random((1000, 10))
>>> X_train_a = X[:500]
>>> X_train_b = X[500:]
>>> y = cn.random.random(X.shape[0])
>>> y_train_a = y[:500]
>>> y_train_b = y[500:]

>>> model_a = lb.LBRegressor().fit(X_train_a, y_train_a)
>>> len(model_a)
100
>>> model_b = lb.LBRegressor().fit(X_train_b, y_train_b)
>>> len(model_b)
100
>>> model_c = (model_a + model_b) * 0.5
>>> len(model_c)
200

Probabilistic regression

legate-boost can learn distributions for continuous data. This is useful in cases where simply predicting the mean does not carry enough information about the training data:

drawing

The above example can be found here: examples/probabilistic_regression.

Batch training

legate-boost can train on datasets that do not fit into memory by splitting the dataset into batches and training the model with partial_fit.

>>> import cupynumeric as cn
>>> import legateboost as lb
>>> from sklearn.utils import gen_even_slices
>>> X = cn.random.random((1000, 10))
>>> y = cn.random.random(X.shape[0])

>>> total_estimators = 100
>>> estimators_per_batch = 10
>>> n_batches = total_estimators // estimators_per_batch

>>> train_batches = [(X[i], y[i]) for i in gen_even_slices(X.shape[0], n_batches)]
>>> model = lb.LBRegressor(n_estimators=estimators_per_batch)
>>> for i in range(total_estimators // estimators_per_batch):
...     X_batch, y_batch = train_batches[i % n_batches]
...     model = model.partial_fit(X_batch, y_batch)

drawing

The above example can be found here: examples/batch_training.

Different model types

legate-boost supports tree models, linear models, kernel ridge regression models, custom user models and any combinations of these models.

The following example shows a model combining linear and decision tree base learners on a synthetic dataset.

model = lb.LBRegressor(base_models=(lb.models.Linear(), lb.models.Tree(max_depth=1),), **params).fit(X, y)

drawing

The second example shows a model combining kernel ridge regression and decision tree base learners on the wine quality dataset.

model = lb.LBRegressor(base_models=(lb.models.KRR(sigma=0.5), lb.models.Tree(max_depth=5),), **params).fit(X, y)

drawing