Avoid double buffering direct IO index input slices with BufferedIndexInput #14103
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit avoids double buffering direct IO index input slices with
BufferedIndexInput
.Currently
BufferedIndexInput
is used for slicing, since it will handle the initial offset and length, but this adds an extra layer of buffering - the buffer in buffered index input as well as the buffer in direct IO index input. This change reflows direct IO index input so that it can handle an offset and length, so can be its own implementation for slices.Existing tests covered this, but I found case where a clone of a slice was not covered. I added a small change to the base directory test case which covers this.
My motivation for doing this is that I've been investigating the possibility of using direct IO for random access reads of float32 vectors when rescoring an initial set of candidates retrieved from scalar quantized approximations.