Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when using MKLSparse following use of MKL #36

Closed
simonp0420 opened this issue Feb 15, 2023 · 28 comments
Closed

Bug when using MKLSparse following use of MKL #36

simonp0420 opened this issue Feb 15, 2023 · 28 comments

Comments

@simonp0420
Copy link

When I use MKLSparse after a previous use MKL, a bug manifests when multiplying a real-valued SparseMatrixCSC{ComplexF64, Int64} matrix times a ComplexF64 or Float64 vector. The following output exhibits the bug:

julia> include("reproducer.jl")
typeof.((QAAQ, fsrc, b)) = (SparseMatrixCSC{ComplexF64, Int64}, Vector{ComplexF64}, Vector{ComplexF64})
size.((QAAQ, fsrc, b)) = ((429567, 429567), (429567,), (429567,))
nnz(QAAQ) = 11532
maximum(abs ∘ imag, QAAQ) = 0.0
maximum(abs, QAAQ - real(QAAQ)) = 0.0
maximum(abs, b - real(QAAQ) * fsrc) = 0.0
0.0

julia> using MKL  # This does not introduce the bug

julia> include("reproducer.jl")
typeof.((QAAQ, fsrc, b)) = (SparseMatrixCSC{ComplexF64, Int64}, Vector{ComplexF64}, Vector{ComplexF64})
size.((QAAQ, fsrc, b)) = ((429567, 429567), (429567,), (429567,))
nnz(QAAQ) = 11532
maximum(abs ∘ imag, QAAQ) = 0.0
maximum(abs, QAAQ - real(QAAQ)) = 0.0
maximum(abs, b - real(QAAQ) * fsrc) = 0.0
0.0

julia> using MKLSparse  # this will introduce the bug

julia> include("reproducer.jl")
typeof.((QAAQ, fsrc, b)) = (SparseMatrixCSC{ComplexF64, Int64}, Vector{ComplexF64}, Vector{ComplexF64})
size.((QAAQ, fsrc, b)) = ((429567, 429567), (429567,), (429567,))
nnz(QAAQ) = 11532
maximum(abs ∘ imag, QAAQ) = 0.0
maximum(abs, QAAQ - real(QAAQ)) = 0.0
maximum(abs, b - real(QAAQ) * fsrc) = 119.4196587781747
119.4196587781747

The final output should be zero since b was computed as QAAQ * fsrc and QAAQ has zero imaginary part for each element. Note that the bug does not manifest if I use the two packages in the other order: MKLSparse first, followed by MKL.

The four files necessary for reproducing this are in this gist. Here is my version info:

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65e (2023-01-08 06:45 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 8

and my package status:

(error_reproducer) pkg> st
Status `D:\peter\Documents\julia\book_codes\RumpfFDFD\error_reproducer\Project.toml`
  [33e6dc65] MKL v0.6.0
  [0c723cd3] MKLSparse v1.1.0
@TobiasHolicki
Copy link

TobiasHolicki commented Sep 10, 2023

I encountered essentially the same bug. Here is a smaller example that might be helpful:

using SparseArrays
A = sparse([1.0 1.0]); B = [1 0; 0 1.0]

using MKL
A * B # this returns [1.0 1.0] which is ok

using MKLSparse
A * B # this returns [1.0 0.0] which is bad

B = [1 0; 0 1];
A * B # this returns [1.0 1.0] which is ok again

@Leebre
Copy link

Leebre commented Apr 27, 2024

I am seeing similar problems with MKLSparse. However, note that the package seems to be abandoned, as it hasn't been updated in over 2 years:

https://github.com/JuliaSparse/MKLSparse.jl

@amontoison
Copy link
Member

I encountered essentially the same bug. Here is a smaller example that might be helpful:

A = sparse([1.0 1.0]); B = [1 0; 0 1.0]

using MKL
A * B # this returns [1.0 1.0] which is ok

using MKLSparse
A * B # this returns [1.0 0.0] which is bad

B = [1 0; 0 1];
A * B # this returns [1.0 1.0] which is ok again

I highly suspect that the issue is in MKL.
When we do a using MKL, we change the threading mode here.
It should have a side effect on how they perform the sparse products internally.
MKL.jl and MKLSParse.jl use the same library MKL_jll.libmkl_rt.

@amontoison
Copy link
Member

amontoison commented May 6, 2024

I find the culprit, MKL.jl is loading an LP64 interface by default:
JuliaLinearAlgebra/MKL.jl@cbbaacd
It was changed with a recent release of MKL.jl (v6.0).

We load an ILP64 here in MKLSparse.jl:
https://github.com/JuliaSparse/MKLSparse.jl/blob/master/src/MKLSparse.jl#L37

@amontoison
Copy link
Member

@ViralBShah Is it not possible to still load the interface ILP64 in MKL.jl?

@ViralBShah
Copy link
Member

ViralBShah commented May 6, 2024

I thought we use the ILP64 by default in MKL. MKL now has the _64 suffixes for ILP64 as well.

@amontoison
Copy link
Member

I thought we use the ILP64 by default in MKL. MKL now has the _64 suffixes for ILP64 as well.

Do you know in which version of Intel MKL , the _64 suffixes were added for ILP64?
I opened a PR in MKL.jl but we maybe need to update the compat entry on MKL_jll.jl.

@ViralBShah
Copy link
Member

IIRC, it was in 2023, but there were some issues where a few symbols got left out. So, I think 2024 is perhaps the safest. In fact, we may want to make 2024 a minimum requirement even in MKL.jl.

@amontoison
Copy link
Member

We will drop support for Mac but I think it's fine. It never worked well and required sesuential threading.

@mhaense1
Copy link

mhaense1 commented May 13, 2024

Hi,

I also faced a similar issue as the users above today, even though the package was updated recently.
Sorry if I am stating known problems here, but it took me a long time figuring out the related problem in one of my codes.

MWE:

julia > using MKL, MKLSparse
julia > using SparseArrays

julia > idx1 = rand(1:10_000,100_000);

julia > idx2 = rand(1:10_000,100_000);

julia > vals = rand(100_000);

julia > spm = sparse(idx1,idx2,vals,100_000,100_000);

julia > spm*ones(100_000)
100000-element Vector{Float64}:
    3.2610336365211516
    5.981257329062043
    3.6314630135321475
    2.706724008231419
    4.515169252866177
    4.121828693918578
    2.410895086369234
    3.073361618309217
    1.6881840883567407
    0.0
    4.18252725784919
    0.8644031768747882
    5.747019246623832
    ⋮
    0.0
    0.0
    0.0
    0.0
    0.0
    0.0
    0.0
    0.0
    0.0
    0.0
    0.0
 9349.209078325439

Note that by construction, matrix spm will have only zeros for all rows > 10_000. Thus the result spm*ones(100_000) gives is obviously wrong as it should return the sum of spms rows.

Within a project I was working on, MKLSparse also seems to have resulted in Julia crashing in weird ways, as I eventually figured out here. But I couldn't produce an easy MWE for that.

My Pkg status:

Pkg.status()
Status `C:\Users\matth\.julia\environments\v1.10\Project.toml`
⌃ [31c24e10] Distributions v0.25.107
  [33e6dc65] MKL v0.6.3
  [0c723cd3] MKLSparse v1.2.0
⌃ [c3e4b0f8] Pluto v0.19.37
  [295af30f] Revise v3.5.14
Info Packages marked with ⌃ have new versions available and may be upgradable.

@amontoison
Copy link
Member

MKL.jl is loading an LP64 interface whereas MKLSparse.jl is loading the ILP64 interface.
We don't know how to fix the issue yet.

@ViralBShah
Copy link
Member

Does it work if you switch MKL.jl to the ILP64 interface? The problem is that MKL.jl only uses the _64 suffix APIs in LP64. In ILP64, both 32-bit and 64-bit symbols have the same name, creating a lot of headaches.

Is it possible to set ILP64 just for the MKLSparse stuff without affecting the BLAS?

@amontoison
Copy link
Member

We can't set the ILP64 interface for MKLSparse.jl with the __init__ function like here, because it's modified by MKL.jl.

However, we could change the interface within mul!. We can set the ILP64 interface before cscmv! and reset the interface to LP64 after.

@ViralBShah
Copy link
Member

ViralBShah commented May 14, 2024

Do these MKL sparse routines give us a clear way of picking which API to call for 32 vs. 64-bit integers? Another possibility may be to default this package to SparseMatrixCSC with 32-bit ints only?

@ViralBShah
Copy link
Member

ViralBShah commented May 14, 2024

If we were able to use the C API for MKLSparse, it may stay clear of the issues we are facing with the INTERFACE setting, which probably changes the size of the Fortran INTEGER based on LP64 vs ILP64.

Another solution might be to see if we can avoid libmkl_rt and directly pick up the relevant library such as libmkl_intel_ilp64.so

@amontoison
Copy link
Member

The performance with the C API is worse than Fortran routines last time that I tested.
I am at SIAM LA24 this week but I can review your PRs and check that everything works this weekend.

@ViralBShah
Copy link
Member

My PR didn't work. I closed it. Huh, that is weird that the C routines have worse performance. Can you check that once again?

@ViralBShah
Copy link
Member

@mkrainiuk Any insight you have here might be valuable.

@mkrainiuk
Copy link

@mkrainiuk Any insight you have here might be valuable.

The only one potential performance change I'd expect is when you compare 32bit integer API vs 64bit integer API, but not Fortran vs C. Could you please clarify what type of APIs you compared?

As a side note, I'd highly recommend to use IE spBLAS API instead of deprecated NIST style spBLAS API, since we plan to completely remove it from oneMKL in the future. We have a short KB article about migration here

@ViralBShah
Copy link
Member

@mkrainiuk Thank you. If we are using the LP64 interface of MKL, would the integer that is passed to the sparse API calls have to be 32-bit ints? (Currently this package passes 64-bit ints, and when MKL uses the LP64 interface, we observe the crash).

@mkrainiuk
Copy link

@mkrainiuk Thank you. If we are using the LP64 interface of MKL, would the integer that is passed to the sparse API calls have to be 32-bit ints? (Currently this package passes 64-bit ints, and when MKL uses the LP64 interface, we observe the crash).

Hi @ViralBShah, yes, LP64 interface expect 32bit integers, and this is most likely why you observe the crash, you need to use _64 API from LP64 interface library or API from ILP64 interface.

@ViralBShah
Copy link
Member

ViralBShah commented May 15, 2024

@mkrainiuk The thing I couldn't figure out is if the sparse functions also have the _64 interface. For example, there is dgemm_64, but not mkl_dcsrmultcsr_64.

@mkrainiuk
Copy link

mkrainiuk commented May 15, 2024

oneMKL has _64 API for inspector-executor SpBLAS API mkl_sparse_d_mm_64. The _64 API was introduced in oneMKL 2024.0.

mkl_dcsrmultcsr is NIST style API that was deprecated, so there are no plans to extend it, but only remove it in the future.

@ViralBShah
Copy link
Member

@ViralBShah
Copy link
Member

ViralBShah commented May 20, 2024

@rayegun Do you understand how the Intel sparse BLAS compares to Tim's GraphBLAS? Should people take a look at your GraphBLAS Julia package as well?

@rayegun
Copy link
Member

rayegun commented May 20, 2024

@rayegun Do you understand how the Intel sparse BLAS compares to Tim's GraphBLAS? Should people take a look at your GraphBLAS Julia package as well?

Around JuliaCon I would recommend it yes. It's in fine shape now maybe a little rough on the edges, but I'm about 3/4 of the way through a total rewrite. GraphBLAS is a strict superset of Sparse BLAS with the exception of solvers, and SuiteSparse:GraphBLAS is almost always quite a bit faster than MKL. The real problem I will have solved by JuliaCon is that the GraphBLAS interface is not exactly what SparseBLAS people want.

@amontoison
Copy link
Member

amontoison commented Jul 7, 2024

@simonp0420 @TobiasHolicki @Leebre
I released a version 2.0.0 of MKLSparse.jl.
It uses the new API and fix this issue.

@amontoison
Copy link
Member

@mkrainiuk Any insight you have here might be valuable.

The only one potential performance change I'd expect is when you compare 32bit integer API vs 64bit integer API, but not Fortran vs C. Could you please clarify what type of APIs you compared?

As a side note, I'd highly recommend to use IE spBLAS API instead of deprecated NIST style spBLAS API, since we plan to completely remove it from oneMKL in the future. We have a short KB article about migration here

@mkrainiuk Sorry for not answering earlier. I tested the Fortran API (v1.2.0 of MKLSparse.jl) and the C API (v2.0.0 of MKLSparse.jl) for sparse matrix - vector products with CSC matrices.

I don't have any performance regression with the version 2024.2.0 of MKL. 👍
When I tested in 2022, the C API was less efficient than the Fortran one.

My benchmark is the following code:

using BenchmarkTools, SuiteSparseMatrixCollection, MatrixMarket
using SparseArrays, LinearAlgebra, Printf, JLD, Base.Threads

ssmc = ssmc_db(verbose=false)
dataset = ssmc[(ssmc.binary .== false) .& (10000 .≤ ssmc.nrows .≤ 20000) .& (ssmc.numerical_symmetry .== 1.0), :]
# dataset = ssmc[(ssmc.binary .== false) .& (1000 .≤ ssmc.nrows .≤ 10000), :]
paths = fetch_ssmc(dataset, format="MM")
names = dataset[!,:name]

# nb_pbs = length(paths)  # 796 matrices...
nb_pbs = 15
benchmark_sparse = true
benchmark_mkl = true
time_sparse = zeros(nb_pbs)
time_mkl = zeros(nb_pbs)

if benchmark_sparse
  for i = 1 : nb_pbs
    name = dataset[!,:name][i]
    path = paths[i]
    @printf("SparseArrays -- %3d / %3d -- %s\n", i, nb_pbs, name)
    A = Float64.(MatrixMarket.mmread(path * "/$name.mtx"))
    n, m = size(A)
    x = rand(m)
    y = rand(n)
    time_sparse[i] = @belapsed for k in 1:1000 mul!($y, $A, $x) end
  end
  save("time_sparse.jld", "time_sparse", time_sparse)
else
  time_sparse = load("time_sparse.jld", "time_sparse")
end

using MKLSparse

if benchmark_mkl
  for i = 1 : nb_pbs
    name = names[i]
    path = paths[i]
    @printf("MKLSparse -- %3d / %3d -- %s\n", i, nb_pbs, name)
    A = MatrixMarket.mmread(path * "/$name.mtx")
    n, m = size(A)
    x = rand(m)
    y = rand(n)
    time_mkl[i] = @belapsed for k in 1:1000 mul!($y, $A, $x) end
  end
  save("time_mkl.jld", "time_mkl", time_mkl)
else
  time_mkl = load("time_mkl.jld", "time_mkl")
end

using PlotlyJS

p = plot([
  bar(name="mul! SparseArrays", x=names[1:nb_pbs], y=time_sparse),
  bar(name="mul! MKLSparse", x=names[1:nb_pbs], y=time_mkl),
  ],
)
relayout!(p, barmode="group")
savefig(p, "julia_vs_mkl_eps", format="eps")
savefig(p, "julia_vs_mkl_svg", format="svg")

Fortran API:
Capture d’écran du 2024-07-07 20-04-02

C API:
Capture d’écran du 2024-07-07 20-04-32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants