Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid double buffering direct IO index input slices with BufferedIndexInput #14103

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ChrisHegarty
Copy link
Contributor

@ChrisHegarty ChrisHegarty commented Jan 6, 2025

This commit avoids double buffering direct IO index input slices with BufferedIndexInput.

Currently BufferedIndexInput is used for slicing, since it will handle the initial offset and length, but this adds an extra layer of buffering - the buffer in buffered index input as well as the buffer in direct IO index input. This change reflows direct IO index input so that it can handle an offset and length, so can be its own implementation for slices.

Existing tests covered this, but I found case where a clone of a slice was not covered. I added a small change to the base directory test case which covers this.

My motivation for doing this is that I've been investigating the possibility of using direct IO for random access reads of float32 vectors when rescoring an initial set of candidates retrieved from scalar quantized approximations.

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tackling this @ChrisHegarty -- I don't feel qualified to review the code changes too closely (they are / this class is somewhat scary).

It's a neat idea to try this for KNN 2nd phase rescoring, I guess to prevent that (heavy) IO from polluting OS's buffer cache for other hot index pages?

And with a fast enough device (SSD) maybe the uncached IO penalty of retrieving N full precision vectors for rescoring is acceptable latency in the query hot path ... if you can read these N full precision vectors using IO concurrency that might also be a big win, though I think @jpountz's recent cool IO hinting changes rely on OS buffer cache (?) ... but maybe many virtual threads each doing the blocking (O_DIRECT) IO could work somehow?

@ChrisHegarty
Copy link
Contributor Author

ChrisHegarty commented Jan 7, 2025

To help move this forward, I'm going to separate out the changes into several smaller more targeted PRs, tracked by #14106. This PR now only tracks the changes required for slicing.

@ChrisHegarty ChrisHegarty changed the title Optimize DirectIOIndexInput Avoids double buffering direct IO index input slices with BufferedIndexInput Jan 9, 2025
@ChrisHegarty ChrisHegarty mentioned this pull request Jan 9, 2025
3 tasks
@ChrisHegarty ChrisHegarty changed the title Avoids double buffering direct IO index input slices with BufferedIndexInput Avoid double buffering direct IO index input slices with BufferedIndexInput Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants