The latest release of the single header can be downloaded directly using this link.
A Quasi-Monte-Carlo (QMC) integrator library with NVIDIA CUDA support.
The library can be used to integrate multi-dimensional real or complex functions numerically. Multi-threading is supported via the C++17 threading library and multiple CUDA compatible accelerators are supported. A variance reduction procedure based on fitting a smooth function to the inverse cumulative distribution function of the integrand dimension-by-dimension is also implemented.
To read more about the library see our publication.
Prerequisites:
- A C++17 compatible C++ compiler.
- (Optional GPU support) A CUDA compatible compiler (typically
nvcc
). - (Optional GPU support) CUDA compatible hardware with Compute Capability 3.0 or greater.
The qmc library is header only. Simply put the single header file somewhere reachable from your project or directly into your project tree itself then #include "qmc.hpp"
in your project.
Example: Integrate x0x1x2 over the unit hypercube
#include <iostream>
#include "qmc.hpp"
struct my_functor_t {
const unsigned long long int number_of_integration_variables = 3;
#ifdef __CUDACC__
__host__ __device__
#endif
double operator()(double* x) const
{
return x[0]*x[1]*x[2];
}
} my_functor;
int main() {
const unsigned int MAXVAR = 3; // Maximum number of integration variables of integrand
integrators::Qmc<double,double,MAXVAR,integrators::transforms::Korobov<3>::type> integrator;
integrators::result<double> result = integrator.integrate(my_functor);
std::cout << "integral = " << result.integral << std::endl;
std::cout << "error = " << result.error << std::endl;
return 0;
}
Compile without GPU support:
$ c++ -std=c++17 -pthread -I../src 1_minimal_demo.cpp -o 1_minimal_demo.out -lgsl -lgslcblas
Compute with GPU support:
$ nvcc -arch=<arch> -std=c++17 -rdc=true -x cu -Xptxas -O0 -Xptxas --disable-optimizer-constants -I../src 1_minimal_demo.cpp -o 1_minimal_demo.out -lgsl -lgslcblas
where <arch>
is the architecture of the target GPU or compute_30
if you are happy to use Just-in-Time compilation (See the Nvidia nvcc
manual for more details).
Output:
integral = 0.125
error = 5.43058e-11
For further examples see the examples folder.
The Qmc class has 7 template parameters:
T
the return type of the function to be integrated (assumed to be a real or complex floating point type)D
the argument type of the function to be integrated (assumed to be a floating point type)M
the maximum number of integration variables of any integrand that will be passed to the integratorP
an integral transform to be applied to the integrand before integrationF
a function to be fitted to the inverse cumulative distribution function of the integrand in each dimension, used to reduce the variance of the integrand (default:fitfunctions::None::template type
)G
a C++17 style pseudo-random number engine (default:std::mt19937_64
)H
a C++17 style uniform real distribution (default:std::uniform_real_distribution<D>
)
Internally, unsigned integers are assumed to be of type U = unsigned long long int
.
Typically the return type T
and argument type D
are set to type double
(for real numbers), std::complex<double>
(for complex numbers on the CPU only) or thrust::complex<double>
(for complex numbers on the GPU and CPU). In principle, the qmc library supports integrating other floating point types (e.g. quadruple precision, arbitrary precision, etc), though they must be compatible with the relevant STL library functions or provide compatible overloads.
To integrate alternative floating point types, first include the header(s) defining the new type into your project and set the template arguments of the Qmc class T
and D
to your type. The following standard library functions must be compatible with your type or a compatible overload must be provided:
sqrt
,abs
,modf
,pow
std::max
,std::min
If your type is not intended to represent a real or complex type number then you may also need to overload functions required for calculating the error resulting from the numerical integration, see the files src/overloads/real.hpp
and src/overloads/complex.hpp
.
Example 9_boost_minimal_demo
demonstrates how to instantiate the qmc with a non-standard type (boost::multiprecision::cpp_bin_float_quad
), to compile this example you will need the boost
library available on your system.
Logger logger
A wrapped std::ostream
object to which log output from the library is written.
To write the text output of the library to a particular file, first #include <fstream>
, create a std::ofstream
instance pointing to your file then set the logger
of the integrator to the std::ofstream
. For example to output very detailed output to the file myoutput.log
:
std::ofstream out_file("myoutput.log");
integrators::Qmc<double,double,MAXVAR,integrators::transforms::Korobov<3>::type> integrator;
integrator.verbosity=3;
integrator.logger = out_file;
Default: std::cout
.
G randomgenerator
A C++17 style pseudo-random number engine.
The seed of the pseudo-random number engine can be changed via the seed
member function of the pseudo-random number engine.
For total reproducability you may also want to set cputhreads = 1
and devices = {-1}
which disables multi-threading, this helps to ensure that the floating point operations are done in the same order each time the code is run.
For example:
integrators::Qmc<double,double,MAXVAR,integrators::transforms::Korobov<3>::type> integrator;
integrator.randomgenerator.seed(1) // seed = 1
integrator.cputhreads = 1; // no multi-threading
integrator.devices = {-1}; // cpu only
Default: std::mt19937_64
seeded with a call to std::random_device
.
U minn
The minimum lattice size that should be used for integration. If a lattice of the requested size is not available then n
will be the size of the next available lattice with at least minn
points.
Default: 8191
.
U minm
The minimum number of random shifts of the lattice m
that should be used to estimate the error of the result. Typically 10 to 50.
Default: 32
.
D epsrel
The relative error that the qmc should attempt to achieve.
Default: 0.01
.
D epsabs
The absolute error that the qmc should attempt to achieve. For real types the integrator tries to find an estimate E
for the integral I
which fulfills |E-I| <= max(epsabs, epsrel*I)
. For complex types the goal is controlled by the errormode
setting.
Default: 1e-7
.
U maxeval
The (approximate) maximum number of function evaluations that should be performed while integrating. The actual number of function evaluations can be slightly larger if there is not a suitably sized lattice available.
Default: 1000000
.
U maxnperpackage
Maximum number of points to compute per thread per work package.
Default: 1
.
U maxmperpackage
Maximum number of shifts to compute per thread per work package.
Default: 1024
.
ErrorMode errormode
Controls the error goal that the library attempts to achieve when the integrand return type is a complex type. For real types the errormode
setting is ignored.
Possible values:
all
- try to find an estimateE
for the integralI
which fulfills|E-I| <= max(epsabs, epsrel*I)
for each component (real and imaginary) separately,largest
- try to find an estimateE
for the integralI
such thatmax( |Re[E]-Re[I]|, |Im[E]-Im[I]| ) <= max( epsabs, epsrel*max( |Re[I]|,|Im[I]| ) )
, i.e. to achieve either theepsabs
error goal or that the largest error is smaller thanepsrel
times the value of the largest component (either real or imaginary).
Default: all
.
U cputhreads
The number of CPU threads that should be used to evaluate the integrand function. If GPUs are used 1 additional CPU thread per device will be launched for communicating with the device.
Default: std::thread::hardware_concurrency()
.
U cudablocks
The number of blocks to be launched on each CUDA device.
Default: (determined at run time).
U cudathreadsperblock
The number of threads per block to be launched on each CUDA device. CUDA kernels launched by the qmc library have the execution configuration <<< cudablocks, cudathreadsperblock >>>
. For more information on how to optimally configure these parameters for your hardware and/or integral refer to the NVIDIA guidelines.
Default: (determined at run time).
std::set<int> devices
A set of devices on which the integrand function should be evaluated. The device id -1
represents all CPUs present on the system, the field cputhreads
can be used to control the number of CPU threads spawned. The indices 0,1,...
are device ids of CUDA devices present on the system.
Default: {-1,0,1,...,nd}
where nd
is the number of CUDA devices detected on the system.
std::map<U,std::vector<U>> generatingvectors
A map of available generating vectors which can be used to generate a lattice. The implemented QMC algorithm requires that the generating vectors be generated with a prime lattice size. By default the library uses generating vectors with 100 components, thus it supports integration of functions with up to 100 dimensions.
The default generating vectors have been generated with lattice size chosen as the next prime number above (110/100)^i*1020
for i
between 0
and 152
, additionally the lattice 2^31-1
(INT_MAX
for int32
) is included.
Default: cbcpt_dn1_100()
.
U latticecandidates
If latticecandidates>0
, the list of generating vectors is extended using the median quasi-Monte Carlo rules,
using the given number of candidate generating vectors.
Can be used together with generatingevectors=none()
to always use the median QMC rule.
Default: 11
.
bool keeplattices
If set to true
, generating vectors constructed using the median QMC rules are kept in generatingvectors
for subsequent integrations.
Default: false
.
U verbosity
Possible values: 0,1,2,3
. Controls the verbosity of the output to logger
of the qmc library.
0
- no output,1
- key status updates and statistics,2
- detailed output, useful for debugging,3
- very detailed output, useful for debugging.
Default: 0
.
bool batching
If set to true
, attempts to compute batches of points on the cpu. This allows the user to make better use of SIMD instructions on their hardware.
If the user provides it, on the cpu the integrator will use the call operator:
void operator()(double* x, double* res, const U batchsize) const
This call operator should be ready to accept up to maxnperpackage
points.
The parameters are:
x
- a one-dimensional array first containing coordinates of point number0
, then point number1
and so on,res
- the array of results,batchsize
- the number of points passed to the function.
Dafault: false
.
U evaluateminn
The minimum lattice size that should be used by the evaluate
function to evaluate the integrand, if variance reduction is enabled these points are used for fitting the inverse cumulative distribution function. If a lattice of the requested size is not available then n
will be the size of the next available lattice with at least evaluateminn
points.
Default: 100000
.
size_t fitstepsize
Controls the number of points included in the fit used for variance reduction. A step size of x
includes (after sorting by value) every x
th point in the fit.
Default: 10
.
size_t fitmaxiter
See maxiter
in the non-linear least-squares fitting GSL documentation.
Default: 40
.
double fitxtol
See xtol
in the non-linear least-squares fitting GSL documentation.
Default: 3e-3
.
double fitgtol
See gtol
in the non-linear least-squares fitting GSL documentation.
Default: 1e-8
.
double fitftol
See ftol
in the non-linear least-squares fitting GSL documentation.
Default: 1e-8
.
gsl_multifit_nlinear_parameters fitparametersgsl
See gsl_multifit_nlinear_parameters
in the non-linear least-squares fitting GSL documentation.
Default: {}
.
U get_next_n(U preferred_n)
Returns the lattice size n
of the lattice in generatingvectors
that is greater than or equal to preferred_n
. This represents the size of the lattice that would be used for integration if minn
was set to preferred_n
.
template <typename I> result<T,U> integrate(I& func)
Integrates the functor func
. The result is returned in a result
struct with the following members:
T integral
- the result of the integralT error
- the estimated absolute error of the resultU n
- the size of the largest lattice used during integrationU m
- the number of shifts of the largest lattice used during integrationU iterations
- the number of iterations used during integrationU evaluations
- the total number of function evaluations during integration
The functor func
must define its dimension as a public member variable number_of_integration_variables
.
Calls: get_next_n
.
template <typename I> samples<T,D> evaluate(I& func)
Evaluates the functor func
on a lattice of size greater than or equal to evaluateminn
. The samples are returned in a samples
struct with the following members:
std::vector<U> z
- the generating vector of the lattice used to produce the samplesstd::vector<D> d
- the random shift vector used to produce the samplesstd::vector<T> r
- the values of the integrand at each randomly shifted lattice pointU n
- the size of the lattice used to produce the samplesD get_x(const U sample_index, const U integration_variable_index)
- a function which returns the argument (specified byintegration_variable_index
) used to evaluate the integrand for a specific sample (specified bysample_index
).
The functor func
must define its dimension as a public member variable number_of_integration_variables
.
Calls: get_next_n
.
template <typename I> typename F<I,D,M>::transform_t fit(I& func)
Fits a function (specified by the type F
of the integrator) to the inverse cumulative distribution function of the integrand dimension-by-dimension and returns a functor representing the new integrand after this variance reduction procedure.
The functor func
must define its dimension as a public member variable number_of_integration_variables
.
Calls: get_next_n
, evaluate
.
The following generating vectors are distributed with the qmc:
Name | Max. Dimension | Description | Lattice Sizes |
---|---|---|---|
cbcpt_dn1_100 |
100 | Computed using Dirk Nuyens' fastrank1pt.m tool | 1021 - 2147483647 |
cbcpt_dn2_6 |
6 | Computed using Dirk Nuyens' fastrank1pt.m tool | 65521 - 2499623531 |
cbcpt_cfftw1_6 |
6 | Computed using a custom CBC tool based on FFTW | 2500000001 - 15173222401 |
cbcpt_cfftw2_10 |
10 | Computed using a custom CBC tool based on FFTW | 2147483659 - 68719476767 |
none |
inf | Empty list of generating vectors, to be filled using median Qmc rule | arbitrary |
The above generating vectors are produced for Korobov spaces with smoothness alpha=2
using:
- Kernel
omega(x)=2 pi^2 (x^2 - x + 1/6)
, - Weights
gamma_i = 1/s
fori = 1, ..., s
, - Parameters
beta_i = 1
fori = 1, ..., s
.
The generating vectors used by the qmc can be selected by setting the integrator's generatingvectors
member variable. Example (assuming an integrator instance named integrator
):
integrator.generatingvectors = integrators::generatingvectors::cbcpt_dn2_6();
If you prefer to use custom generating vectors and/or 100 dimensions and/or 15173222401 lattice points is not enough, you can supply your own generating vectors. Compute your generating vectors using another tool then put them into a map and set generatingvectors
.
If you prefer to use custom generating vectors and/or 100 dimensions and/or 15173222401 lattice points is not enough, you can supply your own generating vectors. Compute your generating vectors using another tool then put them into a map and set generatingvectors
. For example, to instruct the qmc to use only two generating vectors (z = (1,3)
for n=7
and z = (1,7)
for n=11
) the generatingvectors
map would be set as follows:
std::map<unsigned long long int,std::vector<unsigned long long int>> my_generating_vectors = { {7, {1,3}}, {11, {1,7}} };
integrators::Qmc<double,double,10> integrator;
integrator.generatingvectors = my_generating_vectors;
If you think your generating vectors will be widely useful for other people then please let us know! With your permission we may include them in the code by default.
The following integral transforms are distributed with the qmc:
Name | Description |
---|---|
Korobov<r_0,r_1> |
A polynomial integral transform with weight ∝ x^r_0 * (1-x)^r_1 |
Korobov<r> |
A polynomial integral transform with weight ∝ x^r * (1-x)^r |
Sidi<r> |
A trigonometric integral transform with weight ∝ sin^r(pi*x) |
Baker |
The baker's transformation, phi(x) = 1 - abs(2x-1) |
None |
The trivial transform, phi(x) = x |
The integral transform used by the qmc can be selected when constructing the qmc.
Example (assuming a real type integrator instance named integrator
):
integrators::Qmc<double,double,10,integrators::transforms::Korobov<5,3>::type> integrator;
instantiates an integrator which applies a weight (r_0=5,r_1=3)
Korobov transform to the integrand before integration.
Name | Description |
---|---|
PolySingular |
A 3rd order polynomial with two additional 1/(p-x) terms, f(x) = |p_2|*(x*(p_0-1))/(p_0-x) + |p_3|*(x*(p_1-1))/(p_1-x) + x*(p_4+x*(p_5+x*(1-|p_2|-|p_3|-p_4-p_5))) |
None |
No fit is performed |
The fit function used by the qmc can be selected when constructing the qmc. These functions are used to approximate the inverse cumulative distribution function of the integrand dimension-by-dimension.
Example (assuming a real type integrator instance named integrator
):
integrators::Qmc<double,double,10,integrators::transforms::Korobov<3>::type,integrators::fitfunctions::PolySingular::type> integrator;
instantiates an integrator which reduces the variance of the integrand by fitting a PolySingular
type function before integration.
- Sophia Borowka (@sborowka)
- Gudrun Heinrich (@gudrunhe)
- Stephan Jahn (@jPhy)
- Stephen Jones (@spj101)
- Matthias Kerner (@KernerM)
- Johannes Schlenk (@j-schlenk)