A systems-oriented C++11 benchmark support library brining the following tools under one roof:
- CUDA support (including nvToolsExt)
- NUMA support
- OpenMP support
- CPU Cache Control (amd64 and ppc64le)
- CPU turbo control (linux)
- CPU governor control (amd64/linux and ppc64le/linux)
- Google's Benchmark library
- Lyra CLI parsing library
- spdlog logging library
This work was started at the University of Illinois with Professor Wen-Mei Hwu's IMPACT research group in collaboration with IBM's T. J. Watson Research as the SCOPE project. This project reworks the SCOPE framework as a library.
The Comm|Scope multi-GPU communication benchmarking tool uses this library.
Get CMake 3.17+ (needed for FindCUDAToolkit)
Add to your CMakeLists.txt
:
add_subdirectory(thirdparty/scope)
target_link_libraries(<target> scope::scope)
Include "scope/scope.hpp"
#include "scope/scope.hpp"
int main(int argc, char **argv) {
// initialize scope framework things
scope::init(&argc, argv);
// run all registered benchmarks
scope::run();
// clean up scope
scope::finalize();
}
Define a benchmark using google/benchmark. Scope includes it built in and supports all google benchmark command line flags.
All Scope applications support the following command line options:
--cuda <device ID>
: add GPU visibility (default: all). May be repeated to add more GPUs.--numa <node ID>
: add NUMA visibility (default: all). May be repeated to add more NUMA nodes.
scope::init()
will record the CPU's current turbo state, and attempt to disable it, if it is executed with sufficient permissions (sudo).
When scope
exits from SIGINT or finalize()
s, the original state will be restored.
Otherwise, use enable-turbo
to enable CPU turbo again.
You may also programatically control the CPU turbo state with the following library functions:
namespace turbo {
/* true if we are able to control the turbo state
*/
bool can_modify();
/* enable turbo
*/
Result enable();
/* disable turbo
*/
Result disable();
/* record current turbo state in `state`.
*/
Result get_state(State *state);
/* set turbo to `state`
*/
Result set_state(const State &state);
/* record the current turbo state into the global state
*/
Result get_state();
/* set turbo state from the global state
*/
Result set_state();
}
scope::init()
will record the current CPU governor, and attempt to set it to maximum it, if it is executed with sufficient permissions (sudo).
When scope
exits from SIGINT or finalize()
s, the original governor will be restored.
Otherwise, use set-minimum
to restore the powersave
governor.
You may also programatically control the CPU turbo state with the following library functions:
namespace governor {
/* whether modifying the governor is supported
*/
bool can_modify();
/* "performance" on linux
*/
Result set_state_maximum();
/* "powersave" on linux
*/
Result set_state_minimum();
/* record the current CPU goverors to `state`
*/
Result get_state(State *state);
/* set the CPU governor to `state`
*/
Result set_state(const State &state);
/* save the current governor, to be used with restore()
*/
Result record();
/* restore the governor last captured with record()
*/
Result restore();
} // namespace turbo
by default scope
is compiled with NUMA support (SCOPE_USE_NUMA=1). It can be turned off with cmake -DUSE_NUMA=0
.
Either way, the following API is exposed in the numa
namespace.
If NUMA support is disabled, the API is consistent with a system that has a single NUMA domain with ID 0.
/* True if there is NUMA support and the system supports NUMA, false otherwise
*/
bool numa::available();
/* bind future processing and allocation by this thread to `node`.
If no NUMA support, does nothing
*/
void numa::bind_node(int node);
/* return the number of numa nodes
If no NUMA support, return 1
*/
int numa::node_count();
/* return the NUMA ids present in the system
*/
std::vector<int> numa::ids();
There is also a numa::ScopedBind
class that is an RAII-wrapper around numa::bind_node()
// Code out here runs anywhere
{
numa::ScopedBind binder(13);
// this code now runs on node 13
}
// Code out here runs anywhere
// flush the cache line containing p
void flush(void *p);
// mfence (amd64) or sync 0 (ppc64le)
void barrier_all();
// flush all cache lines for the n-byte region starting at p
void flush_all(void *p, const size_t n);
- v1.1.2 (July 17, 2020)
- fix a bug in getting available NUMA nodes
- v1.1.1 (July 17, 2020)
- fix a bug in getting available CUDA devices
- v1.1.0 (July 17, 2020)
- Re-raise INT, HUP, and KILL signals after cleanup
- add
--cuda
and--numa
flags - Cache NUMA configuration to improve benchmark registration performance
- v1.0.0
- Initial port from
c3sr/scope
- CPU governor API
- CPU turbo API
- google/benchmark 1.5.1
- bfgroup/lyra 1.4.1
- gabime/spdlog 1.6.1
- Initial port from
@inproceedings{10.1145/3297663.3310299,
author = {Pearson, Carl and Dakkak, Abdul and Hashash, Sarah and Li, Cheng and Chung, I-Hsin and Xiong, Jinjun and Hwu, Wen-Mei},
title = {Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects},
year = {2019},
isbn = {9781450362399},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3297663.3310299},
doi = {10.1145/3297663.3310299},
booktitle = {Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering},
pages = {209–218},
numpages = {10},
keywords = {nvlink, numa, power, x86, benchmarking, cuda, gpu},
location = {Mumbai, India},
series = {ICPE ’19}
}
@article{DBLP:journals/corr/abs-1809-08311,
author = {Carl Pearson and
Abdul Dakkak and
Cheng Li and
Sarah Hashash and
Jinjun Xiong and
Wen{-}Mei W. Hwu},
title = {{SCOPE:} {C3SR} Systems Characterization and Benchmarking Framework},
journal = {CoRR},
volume = {abs/1809.08311},
year = {2018},
url = {http://arxiv.org/abs/1809.08311},
archivePrefix = {arXiv},
eprint = {1809.08311},
timestamp = {Fri, 05 Oct 2018 11:34:52 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1809-08311.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{pearson2018numa,
title={NUMA-aware data-transfer measurements for power/NVLink multi-GPU systems},
author={Pearson, Carl and Chung, I-Hsin and Sura, Zehra and Hwu, Wen-Mei and Xiong, Jinjun},
booktitle={International Conference on High Performance Computing},
pages={448--454},
year={2018},
organization={Springer}
}
Thanks to Sarah Hashash (MIT), I-Hsin Chung (IBM T. J. Watson), and Jinjun Xiong (IBM T. J. Watson) for their support, guidance, and contributions.
Built with ❤️ using