Release 1.7.0
The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as:
- Complete GPU-resident sparse direct solvers feature set and interfaces,
- Improved Cholesky factorization performance,
- A new MC64 reordering,
- Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types,
- MPI support for the SYCL backend,
- Improved ParILU(T)/ParIC(T) preconditioner convergence,
and more!
If you face an issue, please first check our known issues page and the open issues list and if you do not find a solution, feel free to open a new issue or ask a question using the github discussions.
Supported systems and requirements:
- For all platforms, CMake 3.16+
- C++14 compliant compiler
- Linux and macOS
- GCC: 5.5+
- clang: 3.9+
- Intel compiler: 2019+
- Apple Clang: 14.0 is tested. Earlier versions might also work.
- NVHPC: 22.7+
- Cray Compiler: 14.0.1+
- CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+
- HIP module: ROCm 4.5+
- DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to
dpcpp
oricpx
. - MPI: standard version 3.1+, ideally GPU Aware, for best performance
- Windows
- MinGW: GCC 5.5+
- Microsoft Visual Studio: VS 2019+
- CUDA module: CUDA 10.1+, Microsoft Visual Studio
- OpenMP module: MinGW.
Version support changes
- CUDA 9.2 is no longer supported and 10.0 is untested #1382
- Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) #1368
Interface changes
const
Factory parameters can no longer be modified throughwith_*
functions, as this breaks const-correctness #1336 #1439
New Deprecations
- The
device_reset
parameter of CUDA and HIP executors no longer has an effect, and itsallocation_mode
parameters have been deprecated in favor of theAllocator
interface. #1315 - The CMake parameter
GINKGO_BUILD_DPCPP
has been deprecated in favor ofGINKGO_BUILD_SYCL
. #1350 - The
gko::reorder::Rcm
interface has been deprecated in favor ofgko::experimental::reorder::Rcm
based onPermutation
. #1418 - The Permutation class'
permute_mask
functionality. #1415 - Multiple functions with typos (
set_complex_subpsace()
, range functions such asconj_operaton
etc). #1348
Summary of previous deprecations
gko::lend()
is not necessary anymore.- The classes
RelativeResidualNorm
andAbsoluteResidualNorm
are deprecated in favor ofResidualNorm
. - The class
AmgxPgm
is deprecated in favor ofPgm
. - Default constructors for the CSR
load_balance
andautomatical
strategies - The PolymorphicObject's move-semantic
copy_from
variant - The templated
SolverBase
class. - The class
MachineTopology
is deprecated in favor ofmachine_topology
. - Logger constructors and create functions with the
executor
parameter. - The virtual, protected, Dense functions
compute_norm1_impl
,add_scaled_impl
, etc. - Logger events for solvers and criterion without the additional
implicit_tau_sq
parameter. - The global
gko::solver::default_krylov_dim
, use insteadgko::solver::gmres_default_krylov_dim
.
Added features
- Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners #1379
- Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors #1371
- Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. #1413
- Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. #1416 #1437
- Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems #1438.
- Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. #1443.
- New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation #1120
- New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices #1415
- LU and Cholesky Factorizations can now be separated into their factors #1432
- New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern #1445
- Sorting kernels for SparsityCsr on all backends #1343
- Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner #1426
- Add DPCPP kernels for Partition #1034, and CSR's
check_diagonal_entries
andadd_scaled_identity
functionality #1436 - Adds a helper function to create a partition based on either local sizes, or local ranges #1227
- Add function to compute arithmetic mean of dense and distributed vectors #1275
- Adds
icpx
compiler supports #1350 - All backends can be built simultaneously #1333
- Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo #1372
- Reordering algorithms in sparse_blas benchmark #1354
- Benchmarks gained an
-allocator
parameter to specify device allocators #1385 - Benchmarks gained an
-input_matrix
parameter that initializes the input JSON based on the filename #1387 - Benchmark inputs can now be reordered as a preprocessing step #1408
Improvements
- Significantly improve Cholesky factorization performance #1366
- Improve parallel build performance #1378
- Allow constrained parallel test execution using CTest resources #1373
- Use arithmetic type more inside mixed precision ELL #1414
- Most factory parameters of factory type no longer need to be constructed explicitly via
.on(exec)
#1336 #1439 - Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations #1434
Fixes
- Fix an over-allocation for OpenMP reductions #1369
- Fix DPCPP's common-kernel reduction for empty input sizes #1362
- Fix several typos in the API and documentation #1348
- Fix inconsistent
Threads
between generations #1388 - Fix benchmark median condition #1398
- Fix HIP 5.6.0 compilation #1411
- Fix missing destruction of rand_generator from cuda/hip #1417
- Fix PAPI logger destruction order #1419
- Fix TAU logger compilation #1422
- Fix relative criterion to not iterate if the residual is already zero #1079
- Fix memory_order invocations with C++20 changes #1402
- Fix
check_diagonal_entries_exist
report correctly when only missing diagonal value in the last rows. #1440 - Fix checking OpenMPI version in cross-compilation settings #1446
- Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) #1444