Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checks fail in wisdom-only mode on single-precision aarch64+neon+openmp #378

Open
rdolbeau opened this issue Feb 8, 2025 · 3 comments
Open

Comments

@rdolbeau
Copy link
Contributor

rdolbeau commented Feb 8, 2025

Hello,

testing extensively on aarch64, I found that checks in tests/ would fail under specific circumstances:

  • armv8 a.k.a. aarch64
  • single-precision
  • SIMD enabled (NEON in 3.3.10, NEON or/and SVE in HEAD)
  • openmp

In that case, running make smallcheck check bigcheck in tests eventually fails when testing wisdom-only mode, with some "no cando" (that is the code doesn't run, it doesn't produce invalid results).

This doesn't happen in any double-precision configuration, doesn't happen without openmp, and doesn't happen if neither NEON nor SVE are enabled.

My exact configure was:

./configure --enable-armv8-cntvct-el0 --enable-openmp --enable-single --enable-neon CFLAGS="-O3 -march=native -mtune=native" CXXFLAGS="-O3 -march=native -mtune=native" FFLAGS="-O3 -march=native -mtune=native"

on a NVidia Grace with the distribution gcc: gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-2)

@matteo-frigo
Copy link
Member

I don't have hardware to try. Can you post an example of the failure?

@rdolbeau
Copy link
Contributor Author

rdolbeau commented Feb 8, 2025

The exact test varies (I assume they are somewhat randomized?), but the failure mode is the same: successful run followed by wisdom-only and 'no can_do':

Executing "/home/romain.dolbeau/FFTW/fftw-3.3.10-sp-nofma-neon/tests/bench -o nthreads=3 -o wisdom --verbose=1   --verify '//ofc2x97' --verify '//ifc2x97' --verify 'obc2x97' --verify 'ibc2x97' --verify 'ofc2x97' --verify 'ifc2x97' --verify '//obr2x96' --verify '//ofr2x96' --verify 'obr2x96' --verify 'ibr2x96' --verify 'ofr2x96' --verify 'ifr2x96' --verify '//obc2x96' --verify '//ibc2x96' --verify '//ofc2x96' --verify '//ifc2x96' --verify 'obc2x96' --verify 'ibc2x96' --verify 'ofc2x96' --verify 'ifc2x96' --verify '//obr2x95' --verify '//ofr2x95' --verify 'obr2x95' --verify 'ibr2x95' --verify 'ofr2x95' --verify 'ifr2x95' --verify '//obc2x95' --verify '//ibc2x95' --verify '//ofc2x95' --verify '//ifc2x95' --verify 'obc2x95' --verify 'ibc2x95' --verify 'ofc2x95' --verify 'ifc2x95' --verify '//obr2x94' --verify '//ofr2x94' --verify 'obr2x94' --verify 'ibr2x94' --verify 'ofr2x94' --verify 'ifr2x94' --verify '//obc2x94' --verify '//ibc2x94' --verify '//ofc2x94' --verify '//ifc2x94'"
//ofc2x97 3.25845e-07 7.53168e-07 3.93581e-07
//ifc2x97 2.84473e-07 6.84698e-07 3.30029e-07
obc2x97 2.91938e-07 6.84698e-07 3.22633e-07
ibc2x97 3.57398e-07 8.21638e-07 2.84284e-07
ofc2x97 3.24472e-07 6.84698e-07 3.38843e-07
ifc2x97 2.90055e-07 8.90108e-07 3.784e-07
//obr2x96 1.5842e-07 3.44128e-07 1.41142e-07
//ofr2x96 2.29285e-07 2.75302e-07 1.53849e-07
obr2x96 2.25562e-07 3.44128e-07 1.96399e-07
ibr2x96 2.05299e-07 3.44128e-07 2.05306e-07
ofr2x96 1.87286e-07 2.06477e-07 1.99686e-07
ifr2x96 2.5531e-07 2.40889e-07 1.94088e-07
//obc2x96 1.6506e-07 2.75302e-07 2.51598e-07
//ibc2x96 1.8422e-07 3.44128e-07 2.08354e-07
//ofc2x96 2.1142e-07 3.44128e-07 1.80045e-07
//ifc2x96 1.76997e-07 2.75302e-07 1.68073e-07
obc2x96 1.88696e-07 2.75302e-07 1.71527e-07
ibc2x96 1.65552e-07 2.75302e-07 2.10989e-07
ofc2x96 1.90391e-07 3.44128e-07 2.38077e-07
ifc2x96 1.66194e-07 3.44128e-07 1.81961e-07
//obr2x95 2.8401e-07 4.84308e-07 2.70222e-07
//ofr2x95 2.16878e-07 2.76747e-07 2.01815e-07
obr2x95 2.52941e-07 3.45934e-07 2.10161e-07
ibr2x95 2.06866e-07 4.84308e-07 1.94469e-07
ofr2x95 2.08486e-07 3.45934e-07 1.79237e-07
ifr2x95 2.10915e-07 2.76747e-07 2.06073e-07
//obc2x95 1.99139e-07 3.45934e-07 2.06875e-07
//ibc2x95 2.46852e-07 3.45934e-07 2.2062e-07
//ofc2x95 2.26066e-07 4.15121e-07 1.6927e-07
//ifc2x95 1.94857e-07 3.45934e-07 2.42137e-07
obc2x95 1.97272e-07 2.76747e-07 2.2906e-07
ibc2x95 2.09288e-07 3.45934e-07 1.79343e-07
ofc2x95 2.4405e-07 3.45934e-07 1.86412e-07
ifc2x95 1.8956e-07 3.45934e-07 1.76292e-07
//obr2x94 2.83877e-07 6.95538e-07 3.06095e-07
//ofr2x94 2.35553e-07 4.17323e-07 2.08085e-07
obr2x94 2.68129e-07 5.56431e-07 2.26751e-07
ibr2x94 1.90529e-07 6.95538e-07 2.77685e-07
ofr2x94 2.58158e-07 4.17323e-07 1.99684e-07
ifr2x94 2.72319e-07 4.86877e-07 1.93507e-07
//obc2x94 2.87088e-07 6.95538e-07 2.58064e-07
//ibc2x94 2.57122e-07 4.86877e-07 2.3874e-07
//ofc2x94 2.3597e-07 4.17323e-07 1.91368e-07
//ifc2x94 2.29445e-07 5.56431e-07 2.15847e-07
Executing again in wisdom-only mode
Executing "/home/romain.dolbeau/FFTW/fftw-3.3.10-sp-nofma-neon/tests/bench -o nthreads=3 -o wisdom --verbose=1  -owisdom-only  --verify '//ofc2x97' --verify '//ifc2x97' --verify 'obc2x97' --verify 'ibc2x97' --verify 'ofc2x97' --verify 'ifc2x97' --verify '//obr2x96' --verify '//ofr2x96' --verify 'obr2x96' --verify 'ibr2x96' --verify 'ofr2x96' --verify 'ifr2x96' --verify '//obc2x96' --verify '//ibc2x96' --verify '//ofc2x96' --verify '//ifc2x96' --verify 'obc2x96' --verify 'ibc2x96' --verify 'ofc2x96' --verify 'ifc2x96' --verify '//obr2x95' --verify '//ofr2x95' --verify 'obr2x95' --verify 'ibr2x95' --verify 'ofr2x95' --verify 'ifr2x95' --verify '//obc2x95' --verify '//ibc2x95' --verify '//ofc2x95' --verify '//ifc2x95' --verify 'obc2x95' --verify 'ibc2x95' --verify 'ofc2x95' --verify 'ifc2x95' --verify '//obr2x94' --verify '//ofr2x94' --verify 'obr2x94' --verify 'ibr2x94' --verify 'ofr2x94' --verify 'ifr2x94' --verify '//obc2x94' --verify '//ibc2x94' --verify '//ofc2x94' --verify '//ifc2x94'"
No can_do for //ofc2x97
bench: verify.c:51: assertion failed: 0
FAILED /home/romain.dolbeau/FFTW/fftw-3.3.10-sp-nofma-neon/tests/bench:  --verify '//ofc2x97' --verify '//ifc2x97' --verify 'obc2x97' --verify 'ibc2x97' --verify 'ofc2x97' --verify 'ifc2x97' --verify '//obr2x96' --verify '//ofr2x96' --verify 'obr2x96' --verify 'ibr2x96' --verify 'ofr2x96' --verify 'ifr2x96' --verify '//obc2x96' --verify '//ibc2x96' --verify '//ofc2x96' --verify '//ifc2x96' --verify 'obc2x96' --verify 'ibc2x96' --verify 'ofc2x96' --verify 'ifc2x96' --verify '//obr2x95' --verify '//ofr2x95' --verify 'obr2x95' --verify 'ibr2x95' --verify 'ofr2x95' --verify 'ifr2x95' --verify '//obc2x95' --verify '//ibc2x95' --verify '//ofc2x95' --verify '//ifc2x95' --verify 'obc2x95' --verify 'ibc2x95' --verify 'ofc2x95' --verify 'ifc2x95' --verify '//obr2x94' --verify '//ofr2x94' --verify 'obr2x94' --verify 'ibr2x94' --verify 'ofr2x94' --verify 'ifr2x94' --verify '//obc2x94' --verify '//ibc2x94' --verify '//ofc2x94' --verify '//ifc2x94'
make: *** [Makefile:717 : bigcheck] Erreur 1

In this example, it's only some of the tests that are invalid; from "obr2x96" onward (without the // prefix) they all work, but none of the various 2x97 do.

Also you probably don't need any special hardware, any Arm 64-bits machine should do, including e.g. Raspberry Pi 3 and later. I'll try to double-check on my RPi 4 to confirm.

@rdolbeau
Copy link
Contributor Author

rdolbeau commented Feb 8, 2025

It seems that the wis.dat gets "corrupted" somehow, because:

(a) wis.dat generated during checking on Grace: fails on the Grace, fails on the RPi4 (for the same tests)
(b) wis.dat generated on the RPi 4 with only the test above: works on both
(c) removing the wis.dat on the Grace and trying again: works just fine!

So it seems that much larger wis.dat that was generated during the entire "make smallcheck check bigcheck" is at fault (which, shortsightedly, I discarded during the tests above... so re-running the tests).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants