Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bit arrays for predicate matching in search. #8684

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

isoos
Copy link
Collaborator

@isoos isoos commented Apr 1, 2025

  • Package search index improvements (tracking issue) #8671
  • There are two further potential improvements, but I would implement them in a subsequent PRs: (a) reusing the bitarray instance, and (b) not creating/using a PackageScore instance when the BitArray would suffice.
  • Measured around 5-10% improvement for predicate-only search, and around 1-4% for more complex text matching + predicates.

@isoos isoos requested review from jonasfj and sigurdm April 1, 2025 10:49
ServiceSearchQuery query,
IndexedScore<String> packageScores,
) {
// TODO: implement pooling of this object similarly to [ScorePool].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to do this before landing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to do it in a subsequent PR, probably coupled with the other TODO.

(i) => now.difference(_documents[i].updated) > updatedDuration);
}

// TODO: find a better way to handle predicate-only filtering and scoring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we open an issue for this?

@@ -48,6 +48,7 @@ dependencies:
# pana version to be pinned
pana: '0.22.20'
# 3rd-party packages with pinned versions
bit_array: 2.3.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonasfj are we ok with this dependency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the 80% of package:bit_array is probably not needed for pub-dev, and I have no issue with vendoring it.

Also: the package is using 32-bit ints for cross-compatibility in browsers, and it is possible we should benchmark using 64-bit ints for the server-only use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants