Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(rf optimizations): enabling oneDPL and sort primitive refactoring #3046

Merged
Merged
Show file tree
Hide file tree
Changes from 73 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
65d9322
init adding dpl
Alexandr-Solovev Jan 16, 2025
f8028b7
fixes for dpl
Alexandr-Solovev Jan 20, 2025
0b553e8
minor fix
Alexandr-Solovev Jan 21, 2025
2a91928
minor fix
Alexandr-Solovev Jan 21, 2025
ab367c0
minor fix for dpl from toolkit
Alexandr-Solovev Jan 22, 2025
e053cdf
minor fix for script
Alexandr-Solovev Jan 22, 2025
3f1a6fe
minor fixes
Alexandr-Solovev Jan 22, 2025
700cd10
minor fix
Alexandr-Solovev Jan 22, 2025
809760f
minor fix
Alexandr-Solovev Jan 22, 2025
d01ea31
minor fix for dpl
Alexandr-Solovev Jan 22, 2025
064bb12
fix correct link
Alexandr-Solovev Jan 22, 2025
6e3587d
minor fixes
Alexandr-Solovev Jan 22, 2025
0d9edd6
Merge branch 'uxlfoundation:main' into dev/asolovev_radix_sort_opt
Alexandr-Solovev Jan 23, 2025
a60eb07
minor fix
Alexandr-Solovev Jan 23, 2025
16c8f6c
minor fix
Alexandr-Solovev Jan 23, 2025
91410dc
Merge branch 'uxlfoundation:main' into dev/asolovev_radix_sort_opt
Alexandr-Solovev Feb 5, 2025
8fd11bb
fixes
Alexandr-Solovev Feb 5, 2025
bf9d31f
fixes for memory
Alexandr-Solovev Feb 7, 2025
4575642
reduce memory usage
Alexandr-Solovev Feb 7, 2025
16b1ca5
optimizations
Alexandr-Solovev Feb 10, 2025
09995cf
fixes for tree_order
Alexandr-Solovev Feb 10, 2025
0ecfd35
initial internal dispatcher
Alexandr-Solovev Feb 10, 2025
94b260b
fixes
Alexandr-Solovev Feb 11, 2025
5bdd54f
minor fixes
Alexandr-Solovev Feb 14, 2025
e7f4066
Merge branch 'main' into dev/asolovev_radix_sort_opt
Alexandr-Solovev Feb 20, 2025
aaa4546
fixes
Alexandr-Solovev Feb 20, 2025
a7cfed8
add fraction checker
Alexandr-Solovev Feb 21, 2025
b450d82
minor fixes
Alexandr-Solovev Feb 21, 2025
24cf250
fixes
Alexandr-Solovev Feb 21, 2025
2f397cf
fixes
Alexandr-Solovev Feb 21, 2025
caf932e
fixes for dpl install
Alexandr-Solovev Feb 25, 2025
cb66c3f
minor fixes
Alexandr-Solovev Feb 27, 2025
6f64de2
fixes for memory usage
Alexandr-Solovev Feb 28, 2025
530a00d
minor fix for docs and dpl for non data center devices
Alexandr-Solovev Mar 3, 2025
9238928
fixes
Alexandr-Solovev Mar 3, 2025
85bd20f
minor fixes
Alexandr-Solovev Mar 3, 2025
8e14b6b
fix for docs
Alexandr-Solovev Mar 3, 2025
5a14e25
fixes
Alexandr-Solovev Mar 3, 2025
8a593ca
minor fixes
Alexandr-Solovev Mar 3, 2025
fe2b79e
minor fixes
Alexandr-Solovev Mar 3, 2025
5108a3e
minor fix for naming
Alexandr-Solovev Mar 3, 2025
319ff4b
Merge branch 'uxlfoundation:main' into dev/asolovev_radix_sort_opt
Alexandr-Solovev Mar 5, 2025
e7fa4e8
fixes
Alexandr-Solovev Mar 5, 2025
fc9eba4
bazel test fix
Alexandr-Solovev Mar 6, 2025
9392c7c
fixes for rng test
Alexandr-Solovev Mar 6, 2025
11d043c
minor fix
Alexandr-Solovev Mar 6, 2025
cc967f9
minor fixes
Alexandr-Solovev Mar 6, 2025
636beb1
delete worksapce
Alexandr-Solovev Mar 10, 2025
c980790
Merge branch 'uxlfoundation:main' into dev/asolovev_radix_sort_opt
Alexandr-Solovev Mar 10, 2025
1397da6
minor fix
Alexandr-Solovev Mar 10, 2025
ec41603
Merge branch 'main' into dev/asolovev_radix_sort_opt
Alexandr-Solovev Mar 14, 2025
f80763f
fix for rng
Alexandr-Solovev Mar 14, 2025
3ac1595
upd tests
Alexandr-Solovev Mar 14, 2025
d8a2252
Merge branch 'main' into dev/asolovev_radix_sort_opt
Alexandr-Solovev Mar 19, 2025
b250fd4
minor fixes
Alexandr-Solovev Mar 19, 2025
517f62a
minor fixes
Alexandr-Solovev Mar 19, 2025
6d969bc
minor fixes
Alexandr-Solovev Mar 19, 2025
5279e89
minor restore
Alexandr-Solovev Mar 20, 2025
996833d
Merge branch 'main' into dev/asolovev_radix_sort_opt
Alexandr-Solovev Mar 20, 2025
3b1842d
minor fixes
Alexandr-Solovev Mar 20, 2025
cf7746d
minor fix
Alexandr-Solovev Mar 20, 2025
f13eede
minor fixes
Alexandr-Solovev Mar 20, 2025
dae0fcf
fixes
Alexandr-Solovev Mar 20, 2025
11c2b65
add namespaces
Alexandr-Solovev Mar 20, 2025
afbaf18
fixes
Alexandr-Solovev Mar 20, 2025
5bdc11f
fix
Alexandr-Solovev Mar 20, 2025
b027afc
minor fix for docs
Alexandr-Solovev Mar 20, 2025
45c9f34
minor fix
Alexandr-Solovev Mar 20, 2025
333549c
fix docs
Alexandr-Solovev Mar 21, 2025
0ceaa95
minor fix
Alexandr-Solovev Mar 21, 2025
f7c5649
minor fix
Alexandr-Solovev Mar 21, 2025
8a71af9
fix for docs
Alexandr-Solovev Mar 21, 2025
87fe44f
Merge branch 'uxlfoundation:main' into dev/asolovev_radix_sort_opt
Alexandr-Solovev Mar 21, 2025
34d960b
minor fix
Alexandr-Solovev Mar 21, 2025
baf774b
minor fix
Alexandr-Solovev Mar 21, 2025
ec0e362
remove version
Alexandr-Solovev Mar 21, 2025
e4df736
fixes
Alexandr-Solovev Mar 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .ci/env/apt.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,14 @@ function install_tbb {
sudo apt-get install -y intel-oneapi-tbb-devel-2022.0
}

function install_dpl {
sudo apt-get install -y intel-oneapi-libdpstd-devel
}

function install_mkl {
sudo apt-get install -y intel-oneapi-mkl-devel-2025.0
install_tbb
install_dpl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is dpl a dependency of MKL? I thought tbb was integrated here to install_mkl for that reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess mkl and tbb have no deps on each other, but my understanding its a step for install all necessary deps for onedal

}

function install_clang-format {
Expand Down Expand Up @@ -129,6 +134,9 @@ elif [ "${component}" == "tbb" ]; then
elif [ "${component}" == "mkl" ]; then
add_repo
install_mkl
elif [ "${component}" == "dpl" ]; then
add_repo
install_dpl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to the help list at the end of this file "dpl"

elif [ "${component}" == "gnu-cross-compilers" ]; then
update
install_gnu-cross-compilers "$2"
Expand Down
2 changes: 1 addition & 1 deletion .ci/pipeline/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ variables:
VM_IMAGE : 'ubuntu-24.04'
SYSROOT_OS: 'noble'
WINDOWS_BASEKIT_URL: 'https://registrationcenter-download.intel.com/akdlm/IRC_NAS/b380d914-366b-4b77-a74a-05e3c38b3514/intel-oneapi-base-toolkit-2025.0.0.882_offline.exe'
WINDOWS_DPCPP_COMPONENTS: 'intel.oneapi.win.mkl.devel:intel.oneapi.win.tbb.devel'
WINDOWS_DPCPP_COMPONENTS: 'intel.oneapi.win.mkl.devel:intel.oneapi.win.tbb.devel:intel.oneapi.win.dpl'

resources:
repositories:
Expand Down
20 changes: 18 additions & 2 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Required Software:
* BLAS and LAPACK libraries - both provided by oneMKL
* Python version 3.9 or higher
* oneTBB library (repository contains script to download it)
* oneDPL library
* Microsoft Visual Studio\* (Windows\* only)
* [MSYS2](http://msys2.github.io) (Windows\* only)
* `make` and `dos2unix` tools; install these packages using MSYS2 on Windows\* as follows:
Expand Down Expand Up @@ -105,9 +106,24 @@ is available as an alternative to the manual setup.

./dev/download_tbb.sh

6. Download and install Python (version 3.9 or higher).
6. Set up oneDPL
_Note: if you used the general oneAPI setvars script from a Base Toolkit installation, this step will not be necessary as oneDPL will already have been set up._

7. Build oneDAL via command-line interface. Choose the appropriate commands based on the interface, platform, and the compiler you use. Interface and platform are required arguments of makefile while others are optional. Below you can find the set of examples for building oneDAL. You may use a combination of them to get the desired build configuration:
Download and install [Intel(R) oneDPL](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-library.html).
Set the environment variables for for Intel(R) oneDPL. For example:

- oneDPL (Windows\*):

call "C:\Program Files (x86)\Intel\oneAPI\dpl\latest\env\vars.bat" intel64

- oneDPL (Linux\*):

source /opt/intel/oneapi/dpl/latest/env/vars.sh intel64


7. Download and install Python (version 3.9 or higher).

8. Build oneDAL via command-line interface. Choose the appropriate commands based on the interface, platform, and the compiler you use. Interface and platform are required arguments of makefile while others are optional. Below you can find the set of examples for building oneDAL. You may use a combination of them to get the desired build configuration:

- DAAL interfaces on **Linux\*** using **Intel(R) C++ Compiler**:

Expand Down
15 changes: 15 additions & 0 deletions MODULE.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,21 @@ ccl_repo(
]
)

dpl_repo = use_repo_rule("@onedal//dev/bazel/deps:dpl.bzl", "dpl_repo")
dpl_repo(
name = "dpl",
root_env_var = "DPL_ROOT",
urls = [
"https://files.pythonhosted.org/packages/95/f6/18f78cb933e01ecd9e99d37a10da4971a795fcfdd1d24640799b4050fdbb/onedpl_devel-2022.7.1-py2.py3-none-manylinux_2_28_x86_64.whl",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question, but how do we find these values/maintain them? It looks painful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do the same thing for all other packages like tbb and mkl. Find it on pypi and copy links)

],
sha256s = [
"3b270999d2464c5151aa0e7995dda9e896d072c75069ccee1efae9dc56bdc417",
],
strip_prefixes = [
"onedpl_devel-2022.7.1.data/data",
],
)

mkl_repo = use_repo_rule("@onedal//dev/bazel/deps:mkl.bzl", "mkl_repo")
mkl_repo(
name = "mkl",
Expand Down
1 change: 1 addition & 0 deletions cpp/oneapi/dal.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
#include "oneapi/dal/exceptions.hpp"
#include "oneapi/dal/infer.hpp"
#include "oneapi/dal/read.hpp"
#include "oneapi/dal/rng.hpp"
#include "oneapi/dal/train.hpp"
#include "oneapi/dal/partial_compute.hpp"
#include "oneapi/dal/finalize_compute.hpp"
Expand Down
1 change: 1 addition & 0 deletions cpp/oneapi/dal/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ dal_module(
],
dpc_deps = [
"@mkl//:mkl_dpc",
"@dpl//:headers",
],
)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include "oneapi/dal/table/row_accessor.hpp"
#include "oneapi/dal/backend/memory.hpp"
#include "oneapi/dal/detail/profiler.hpp"
#include <string>

#ifdef ONEDAL_DATA_PARALLEL

Expand All @@ -34,8 +35,15 @@ inline sycl::event sort_inplace(sycl::queue& queue_,
pr::ndarray<Float, 1>& src,
const bk::event_vector& deps = {}) {
ONEDAL_ASSERT(src.get_count() > 0);
auto device = queue_.get_device();
std::string device_name = device.get_info<sycl::info::device::name>();
auto src_ind = pr::ndarray<Index, 1>::empty(queue_, { src.get_count() });
return pr::radix_sort_indices_inplace<Float, Index>{ queue_ }(src, src_ind, deps);
if (device_name.find("Data Center GPU Max") != std::string::npos) {
Copy link
Contributor

@icfaust icfaust Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels dangerous somehow. Definitely add some comments. Ideally device checking should exist as a primitive rather than in an algo because this is a bit of a nasty surprise to anyone not well-versed in this algo when trying to debug on various hardware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vika-F planned to add this feature in future

return pr::radix_sort_indices_inplace_dpl<Float, Index>(queue_, src, src_ind, deps);
}
else {
return pr::radix_sort_indices_inplace<Float, Index>{ queue_ }(src, src_ind, deps);
}
}

template <typename Float, typename Bin, typename Index>
Expand Down Expand Up @@ -429,15 +437,36 @@ sycl::event indexed_features<Float, Bin, Index>::operator()(const table& tbl,
pr::ndarray<Bin, 1>::empty(queue_, { row_count_ }, sycl::usm::alloc::device);
}

pr::radix_sort_indices_inplace<Float, Index> sort{ queue_ };

sycl::event last_event;

for (Index i = 0; i < column_count_; i++) {
last_event = extract_column(data_nd_, values_nd, indices_nd, i, { last_event });
last_event = sort(values_nd, indices_nd, { last_event });
last_event =
compute_bins(values_nd, indices_nd, column_bin_vec_[i], entries_[i], i, { last_event });
auto device = queue_.get_device();
std::string device_name = device.get_info<sycl::info::device::name>();
if (device_name.find("Data Center GPU Max") != std::string::npos) {
for (Index i = 0; i < column_count_; i++) {
last_event = extract_column(data_nd_, values_nd, indices_nd, i, { last_event });
last_event = pr::radix_sort_indices_inplace_dpl<Float, Index>(queue_,
values_nd,
indices_nd,
{ last_event });
last_event = compute_bins(values_nd,
indices_nd,
column_bin_vec_[i],
entries_[i],
i,
{ last_event });
}
}
else {
pr::radix_sort_indices_inplace<Float, Index> sort{ queue_ };
for (Index i = 0; i < column_count_; i++) {
last_event = extract_column(data_nd_, values_nd, indices_nd, i, { last_event });
last_event = sort(values_nd, indices_nd, { last_event });
last_event = compute_bins(values_nd,
indices_nd,
column_bin_vec_[i],
entries_[i],
i,
{ last_event });
}
}

last_event.wait_and_throw();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
#include "oneapi/dal/backend/primitives/utils.hpp"
#include "oneapi/dal/algo/decision_forest/train_types.hpp"

#include "oneapi/dal/backend/primitives/rng/host_engine_collection.hpp"
#include "oneapi/dal/backend/primitives/rng/device_engine.hpp"

#include "oneapi/dal/algo/decision_forest/backend/gpu/train_misc_structs.hpp"
#include "oneapi/dal/algo/decision_forest/backend/gpu/train_impurity_data.hpp"
Expand Down Expand Up @@ -50,8 +50,7 @@ class train_kernel_hist_impl {
using model_manager_t = train_model_manager<Float, Index, Task>;
using train_context_t = train_context<Float, Index, Task>;
using imp_data_t = impurity_data<Float, Index, Task>;
using rng_engine_t = pr::host_engine;
using rng_engine_list_t = std::vector<rng_engine_t>;
using rng_engine_t = pr::device_engine;
using msg = dal::detail::error_messages;
using comm_t = bk::communicator<spmd::device_memory_access::usm>;
using node_t = node<Index>;
Expand Down Expand Up @@ -79,7 +78,7 @@ class train_kernel_hist_impl {
Index class_count) const;

sycl::event gen_initial_tree_order(train_context_t& ctx,
rng_engine_list_t& rng_engine_list,
rng_engine_t& rng_engine,
pr::ndarray<Index, 1>& node_list,
pr::ndarray<Index, 1>& tree_order_level,
Index engine_offset,
Expand All @@ -103,6 +102,7 @@ class train_kernel_hist_impl {
const table& data,
const table& labels,
const table& weights);

/// Allocates all buffers that are used for training.
/// @param[in] ctx a training context structure for a GPU backend
void allocate_buffers(const train_context_t& ctx);
Expand All @@ -115,12 +115,12 @@ class train_kernel_hist_impl {
/// @param[in] ctx a training context structure for a GPU backend
/// @param[in] node_count number of nodes on the current level
/// @param[in] node_vs_tree_map an initial tree order
/// @param[in] rng_engine_list a list of random generator engines
/// @param[in] rng_engine a random generator engine
std::tuple<pr::ndarray<Index, 1>, sycl::event> gen_feature_list(
const train_context_t& ctx,
Index node_count,
const pr::ndarray<Index, 1>& node_vs_tree_map,
rng_engine_list_t& rng_engine_list);
rng_engine_t& rng_engine);

/// Generates random thresholds for each node and for each selected feature for node.
/// Thresholds are used for a random splitter kernel to split each node.
Expand All @@ -129,12 +129,12 @@ class train_kernel_hist_impl {
/// @param[in] ctx a training context structure for a GPU backend
/// @param[in] node_count number of nodes on the current level
/// @param[in] node_vs_tree_map an initial tree order
/// @param[in] rng_engine_list a list of random generator engines
/// @param[in] rng_engine a random generator engine
std::tuple<pr::ndarray<Float, 1>, sycl::event> gen_random_thresholds(
const train_context_t& ctx,
Index node_count,
const pr::ndarray<Index, 1>& node_vs_tree_map,
rng_engine_list_t& rng_engine_list);
rng_engine_t& rng_engine);

/// Computes initial impurity for each node.
///
Expand Down Expand Up @@ -561,7 +561,7 @@ class train_kernel_hist_impl {
/// @param[in] oob_per_obs_list an array of OOB values per observation
/// @param[in] var_imp variable importance values
/// @param[in] var_imp_variance variable importance variance values
/// @param[in] rng_engine_arr a list of random generator engines
/// @param[in] rng_engine a random generator engine
/// @param[in] tree_idx a tree index
/// @param[in] tree_in_block number of trees in the computational block
/// @param[in] built_tree_count number of built trees
Expand All @@ -575,7 +575,7 @@ class train_kernel_hist_impl {
pr::ndarray<hist_type_t, 1>& oob_per_obs_list,
pr::ndarray<Float, 1>& var_imp,
pr::ndarray<Float, 1>& var_imp_variance,
const rng_engine_list_t& rng_engine_arr,
rng_engine_t& rng_engine,
Index tree_idx,
Index tree_in_block,
Index built_tree_count,
Expand Down
Loading
Loading