legate-boost

GBM implementation on Legate. The primary goals of legate-boost is to provide a state-of-the-art distributed GBM implementation on Legate, capable of running on CPUs or GPUs at supercomputer scale.

API Documentation

For developers - see contributing

Installation

Install using conda.

# stable release
conda install -c legate -c conda-forge -c nvidia legate-boost

# nightly release
conda install -c legate/label/experimental -c legate -c conda-forge -c nvidia legate-boost

On systems without a GPU, the CPU-only package should automatically be installed. On systems with a GPU and compatible CUDA version, the GPU package should automatically be installed.

To force conda to prefer one, pass the build strings *_cpu* or *_gpu*, for example:

# nightly release (CPU-only)
conda install --dry-run -c legate/label/experimental -c legate -c conda-forge -c nvidia \
    'legate-boost=*=*_cpu*'

For more details on building from source and setting up a development environment, see contributing.md.

Simple example

Run with the legate launcher

legate example_script.py

>>> import cupynumeric as cn
>>> import legateboost as lb

>>> X = cn.random.random((1000, 10))
>>> y = cn.random.random(X.shape[0])
>>> model = lb.LBRegressor().fit(X, y)

Features

Model ensembling

legate-boost can create models from linear combinations of other models. Ensembling is as easy as:

>>> import cupynumeric as cn
>>> import legateboost as lb

>>> X = cn.random.random((1000, 10))
>>> X_train_a = X[:500]
>>> X_train_b = X[500:]
>>> y = cn.random.random(X.shape[0])
>>> y_train_a = y[:500]
>>> y_train_b = y[500:]

>>> model_a = lb.LBRegressor().fit(X_train_a, y_train_a)
>>> len(model_a)
100
>>> model_b = lb.LBRegressor().fit(X_train_b, y_train_b)
>>> len(model_b)
100
>>> model_c = (model_a + model_b) * 0.5
>>> len(model_c)
200

Probabilistic regression

legate-boost can learn distributions for continuous data. This is useful in cases where simply predicting the mean does not carry enough information about the training data:

The above example can be found here: examples/probabilistic_regression.

Batch training

legate-boost can train on datasets that do not fit into memory by splitting the dataset into batches and training the model with partial_fit.

>>> import cupynumeric as cn
>>> import legateboost as lb
>>> from sklearn.utils import gen_even_slices
>>> X = cn.random.random((1000, 10))
>>> y = cn.random.random(X.shape[0])

>>> total_estimators = 100
>>> estimators_per_batch = 10
>>> n_batches = total_estimators // estimators_per_batch

>>> train_batches = [(X[i], y[i]) for i in gen_even_slices(X.shape[0], n_batches)]
>>> model = lb.LBRegressor(n_estimators=estimators_per_batch)
>>> for i in range(total_estimators // estimators_per_batch):
...     X_batch, y_batch = train_batches[i % n_batches]
...     model = model.partial_fit(X_batch, y_batch)

The above example can be found here: examples/batch_training.

Different model types

legate-boost supports tree models, linear models, kernel ridge regression models, custom user models and any combinations of these models.

The following example shows a model combining linear and decision tree base learners on a synthetic dataset.

model = lb.LBRegressor(base_models=(lb.models.Linear(), lb.models.Tree(max_depth=1),), **params).fit(X, y)

The second example shows a model combining kernel ridge regression and decision tree base learners on the wine quality dataset.

model = lb.LBRegressor(base_models=(lb.models.KRR(sigma=0.5), lb.models.Tree(max_depth=5),), **params).fit(X, y)

Name	Name	Last commit message	Last commit date
Latest commit seberg MAINT: Start 25.05 development [skip ci] (#223 ) Mar 17, 2025 a265724 · Mar 17, 2025 History 318 Commits
.github	.github	Implement an sklearn style TargetEncoder to support categoricals (#214 )	Mar 6, 2025
benchmark	benchmark	Update cunumeric (#187 )	Nov 26, 2024
ci	ci	MAINT: Prepare for building a 25.03 release (#222 )	Mar 17, 2025
cmake	cmake	MAINT: Use RAPIDS 24.10 cmake and scikit-build-core setup (#195 )	Jan 2, 2025
conda	conda	MAINT: Prepare for building a 25.03 release (#222 )	Mar 17, 2025
docs	docs	Implement an sklearn style TargetEncoder to support categoricals (#214 )	Mar 6, 2025
examples	examples	Implement an sklearn style TargetEncoder to support categoricals (#214 )	Mar 6, 2025
legateboost	legateboost	Implement an sklearn style TargetEncoder to support categoricals (#214 )	Mar 6, 2025
src	src	Implement an sklearn style TargetEncoder to support categoricals (#214 )	Mar 6, 2025
thirdparty/LICENSES	thirdparty/LICENSES	Implementation of gamma forecasting. (#77 )	Jan 15, 2024
.clang-format	.clang-format	Revert "Remove cpp code"	May 12, 2023
.clang-tidy	.clang-tidy	Implement an sklearn style TargetEncoder to support categoricals (#214 )	Mar 6, 2025
.flake8	.flake8	remove setup.cfg, move more configuration into pyproject.toml (#131 )	Jul 30, 2024
.gitattributes	.gitattributes	Allow updating tree models (#54 )	Sep 6, 2023
.gitignore	.gitignore	Run clang-tidy static analysis on C++/CUDA codebase (#186 )	Nov 19, 2024
.pre-commit-config.yaml	.pre-commit-config.yaml	enable sccache in builds, update some pre-commit hooks (#215 )	Feb 26, 2025
CMakeLists.txt	CMakeLists.txt	Implement an sklearn style TargetEncoder to support categoricals (#214 )	Mar 6, 2025
LICENSE	LICENSE	add license and readme	Feb 15, 2023
MANIFEST.in	MANIFEST.in	reduce hard-coding of version (#135 )	Aug 2, 2024
README.md	README.md	Test docstrings (#206 )	Feb 6, 2025
VERSION	VERSION	MAINT: Start 25.05 development [skip ci] (#223 )	Mar 17, 2025
build.sh	build.sh	Implement an sklearn style TargetEncoder to support categoricals (#214 )	Mar 6, 2025
contributing.md	contributing.md	Replace `conda-mambabuild`with `conda-build` (#216 )	Feb 27, 2025
dependencies.yaml	dependencies.yaml	MAINT: Prepare for building a 25.03 release (#222 )	Mar 17, 2025
pyproject.toml	pyproject.toml	Implement multi-label classification (#211 )	Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

legate-boost

Installation

Simple example

Features

Model ensembling

Probabilistic regression

Batch training

Different model types

About

Releases

Packages

Contributors 11

Languages

License

rapidsai/legate-boost

Folders and files

Latest commit

History

Repository files navigation

legate-boost

Installation

Simple example

Features

Model ensembling

Probabilistic regression

Batch training

Different model types

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 11

Languages

Packages