Skip to content

Introduce ARM Neon and SSE2 SIMD. #743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 32 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
259090c
Introduce ARM Neon SIMD.
samyron Feb 1, 2025
9ad196e
Use the 'rules' implementation instead of the lookup table implementa…
samyron Feb 5, 2025
0c1958a
Merge branch 'master' into arm-neon-simd-v2
samyron Feb 5, 2025
d8a2e56
Refactoring and simplifications.
samyron Feb 5, 2025
89ba0be
Load the SIMD lookup table explicitly without loops.
samyron Feb 6, 2025
a23b84e
Use only 2 64-byte lookup tables for the neon escape_table_basic as w…
samyron Feb 6, 2025
5506091
Simplifications.
samyron Feb 10, 2025
3ae5677
A few more cleanups.
samyron Feb 10, 2025
332107d
Use SIMD for fewer than 16 characters (but at least 8) remaining.
samyron Mar 24, 2025
a47ffa0
Add x86-64 SSE2 support with runtime detection.
samyron Apr 5, 2025
b2cab33
Simplified the SSE2 implementation.
samyron Apr 5, 2025
5cd7b5e
A small simplification to the ARM Neon implementation.
samyron Apr 5, 2025
1d00db9
More cleanups.
samyron Apr 5, 2025
4759254
Neon: Use a mask to locate the characters that need to be escaped ins…
samyron Apr 6, 2025
045115a
Make the Neon implementation configurable based on a build parameter.
samyron Apr 7, 2025
13b2c4f
fix: ensure code builds correctly on x86 after changing the neon impl…
samyron Apr 7, 2025
d4f5bf7
Use a maches mask to determine the location of the maching characters…
samyron Apr 7, 2025
be7456c
Fix a build issue on ruby 2.7 for SSE2 support.
samyron Apr 7, 2025
4970255
PR Feedback.
samyron Apr 7, 2025
1c6ee3d
A few tweaks to the SSE algorithm.
samyron Apr 7, 2025
b7b120b
Changed the '<' comparison to '<=' in the SIMD loop iterating through…
samyron Apr 8, 2025
e5c5e7c
Make the search_escape_basic_impl function pointer static.
samyron Apr 9, 2025
062587e
Ensure all search_escape_basic* functions are inlined.
samyron Apr 9, 2025
f49af9b
Refactor the code that copies the last remaining characters in the SI…
samyron Apr 9, 2025
15f1887
Change 'len' to 'vec_len' to ensure bytes past 'len' do not need to b…
samyron Apr 9, 2025
a666f5a
Added the ability to use the matches_mask in the case there isn't a full
samyron Apr 10, 2025
1dc47f8
SSE implementation of using the escape mask when there isn't a full v…
samyron Apr 10, 2025
af822fc
Optimizations, comments and formatting. Still work in progress.
samyron Apr 17, 2025
ad995fc
Implemented optimizations in the SSE2 implemenation. A few simplifica…
samyron Apr 18, 2025
9cf63a1
Updates to better handle escape-heavy workloads on ARM Neon.
samyron Apr 20, 2025
df76269
Apply the same optimizations to the SSE2 implementation.
samyron Apr 20, 2025
479af08
Merge branch 'master' into arm-neon-simd-v2
samyron Apr 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions ext/json/ext/generator/extconf.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,40 @@
else
append_cflags("-std=c99")
$defs << "-DJSON_GENERATOR"

if enable_config('generator-use-simd', default=true)
if RbConfig::CONFIG['host_cpu'] =~ /^(arm.*|aarch64.*)/
# Try to compile a small program using NEON instructions
if have_header('arm_neon.h')
have_type('uint8x16_t', headers=['arm_neon.h']) && try_compile(<<~'SRC')
#include <arm_neon.h>
int main() {
uint8x16_t test = vdupq_n_u8(32);
return 0;
}
SRC
$defs.push("-DENABLE_SIMD")

if enable_config('generator-use-neon-lut', default=false)
$defs.push('-DUSE_NEON_LUT')
end
Comment on lines +23 to +25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end we shouldn't have such config flag. Either the lookup table implementation is better or it isn't.

SIMD code is hard enough to maintain already, I don't want an extra codepath hidden behind a compile flag.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a benchmark to compare implementations. I'm currently making a few additional optimizations and will compare them again when everything else has been finalized. However, at present, the rules-based/direct-comparison implementation is slightly faster on some benchmarks. Otherwise it's about equal.

end
end

if have_header('x86intrin.h') && have_type('__m128i', headers=['x86intrin.h']) && try_compile(<<~'SRC', opt='-msse2')
#include <x86intrin.h>
int main() {
__m128i test = _mm_set1_epi8(32);
return 0;
}
SRC
$defs.push("-DENABLE_SIMD")
end

have_header('cpuid.h')
end

create_header

create_makefile 'json/ext/generator'
end
Loading