You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ORC-1610: Reduce the number of hash computation in CuckooSetBytes
### What changes were proposed in this pull request?
Add boundary conditions on "length" with the min/max length stored in the hashes.
### Why are the changes needed?
https://issues.apache.org/jira/browse/HIVE-24205
> This would significantly reduce the number of hash computation that needs to happen.
```
main insert:00:00:00.689
main lookup:00:00:01.124
PR insert:00:00:00.628
PR lookup:00:00:01.055
```
```java
Test
public void testLen() {
int maxSize = 200000;
Random gen = new Random();
String[] strings = new String[maxSize];
for (int i = 0; i < maxSize; i++) {
strings[i] = RandomStringUtils.random(Math.abs(gen.nextInt(1000)));
}
byte[][] values = getByteArrays(strings);
StopWatch mainSW = new StopWatch();
// load set
mainSW.start();
CuckooSetBytes main = new CuckooSetBytes(strings.length);
main.fastLookup = false;
for (byte[] v : values) {
main.insert(v);
}
mainSW.split();
System.out.println("main insert:" + mainSW);
// test that the values we added are there
for (byte[] v : values) {
assertTrue(main.lookup(v, 0, v.length));
}
mainSW.stop();
System.out.println("main lookup:" + mainSW);
StopWatch prSW = new StopWatch();
prSW.start();
CuckooSetBytes pr = new CuckooSetBytes(strings.length);
pr.fastLookup = true;
for (byte[] v : values) {
pr.insert(v);
}
prSW.split();
System.out.println("PR insert:" + prSW);
for (byte[] v : values) {
assertTrue(pr.lookup(v, 0, v.length));
}
prSW.stop();
System.out.println("PR lookup:" + prSW);
}
```
### How was this patch tested?
GA
### Was this patch authored or co-authored using generative AI tooling?
No
Closes#1785 from cxzl25/ORC-1610.
Authored-by: sychen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
0 commit comments