-
Notifications
You must be signed in to change notification settings - Fork 344
Introduce ARM Neon and SSE2 SIMD. #743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
samyron
wants to merge
32
commits into
ruby:master
Choose a base branch
from
samyron:arm-neon-simd-v2
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
259090c
Introduce ARM Neon SIMD.
samyron 9ad196e
Use the 'rules' implementation instead of the lookup table implementa…
samyron 0c1958a
Merge branch 'master' into arm-neon-simd-v2
samyron d8a2e56
Refactoring and simplifications.
samyron 89ba0be
Load the SIMD lookup table explicitly without loops.
samyron a23b84e
Use only 2 64-byte lookup tables for the neon escape_table_basic as w…
samyron 5506091
Simplifications.
samyron 3ae5677
A few more cleanups.
samyron 332107d
Use SIMD for fewer than 16 characters (but at least 8) remaining.
samyron a47ffa0
Add x86-64 SSE2 support with runtime detection.
samyron b2cab33
Simplified the SSE2 implementation.
samyron 5cd7b5e
A small simplification to the ARM Neon implementation.
samyron 1d00db9
More cleanups.
samyron 4759254
Neon: Use a mask to locate the characters that need to be escaped ins…
samyron 045115a
Make the Neon implementation configurable based on a build parameter.
samyron 13b2c4f
fix: ensure code builds correctly on x86 after changing the neon impl…
samyron d4f5bf7
Use a maches mask to determine the location of the maching characters…
samyron be7456c
Fix a build issue on ruby 2.7 for SSE2 support.
samyron 4970255
PR Feedback.
samyron 1c6ee3d
A few tweaks to the SSE algorithm.
samyron b7b120b
Changed the '<' comparison to '<=' in the SIMD loop iterating through…
samyron e5c5e7c
Make the search_escape_basic_impl function pointer static.
samyron 062587e
Ensure all search_escape_basic* functions are inlined.
samyron f49af9b
Refactor the code that copies the last remaining characters in the SI…
samyron 15f1887
Change 'len' to 'vec_len' to ensure bytes past 'len' do not need to b…
samyron a666f5a
Added the ability to use the matches_mask in the case there isn't a full
samyron 1dc47f8
SSE implementation of using the escape mask when there isn't a full v…
samyron af822fc
Optimizations, comments and formatting. Still work in progress.
samyron ad995fc
Implemented optimizations in the SSE2 implemenation. A few simplifica…
samyron 9cf63a1
Updates to better handle escape-heavy workloads on ARM Neon.
samyron df76269
Apply the same optimizations to the SSE2 implementation.
samyron 479af08
Merge branch 'master' into arm-neon-simd-v2
samyron File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end we shouldn't have such config flag. Either the lookup table implementation is better or it isn't.
SIMD code is hard enough to maintain already, I don't want an extra codepath hidden behind a compile flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a benchmark to compare implementations. I'm currently making a few additional optimizations and will compare them again when everything else has been finalized. However, at present, the rules-based/direct-comparison implementation is slightly faster on some benchmarks. Otherwise it's about equal.