Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: faster bitpacking filter for selectivities from 5% to 80% #2068

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

danking
Copy link
Member

@danking danking commented Jan 24, 2025

On an Apple M3 Max, the new benchmark in this PR demonstrates that the switchpoint is in [0.02, 0.04]. 8-bit, 16-bit, 32-bit, and 64-bit arrays switch at 0.02, 0.03, 0.04, and 0.04, respectively. I later ran the benchmarks on a Cascade Lake cloud machine and found that the 8-bit, 16-bit, 32-bit, and 64-bit arrays switch at 0.03, 0.03, 0.075 and 0.09, respectively. In this PR, I use the Cascade Lake values, but I don't have a great answer for how to pick these. Regardless, it is clear that 0.8 is not the correct choice.

Google Sheet with one run on my Apple M3 Max. A sheet with a run on a c2-standard-4 "Cascade Lake"

@danking danking added the benchmark Run benchmarks on this branch label Jan 24, 2025
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Jan 24, 2025
Copy link
Contributor

Benchmarks: random_access

Table of Results
name PR 54939b7 base 1d36132 ratio (PR/base) unit
random-access/vortex-tokio-local-disk 2.65975e+06 2.62011e+06 1.01513 ns
random-access/vortex-local-fs 3.48053e+06 3.30901e+06 1.05184 ns
random-access/parquet-tokio-local-disk 2.22848e+08 2.21792e+08 1.00476 ns

Copy link
Contributor

Benchmarks: datafusion

Table of Results
name PR 54939b7 base 1d36132 ratio (PR/base) unit
arrow/planning 944448 955104 0.988842 ns
arrow/exec 1.97268e+06 2.01344e+06 0.979753 ns
vortex-compressed/planning 584257 586488 0.996197 ns
vortex-compressed/exec 2.66778e+06 2.71034e+06 0.984296 ns
vortex-uncompressed/planning 582316 585229 0.995023 ns
vortex-uncompressed/exec 1.54471e+06 1.55866e+06 0.991045 ns

Copy link
Contributor

Benchmarks: TPC-H

Table of Results
name PR 54939b7 base 1d36132 ratio (PR/base) unit
tpch_q01/arrow 547800499 5.49082e+08 0.997667 ns
tpch_q01/parquet 761213273 7.72487e+08 0.985406 ns
tpch_q01/vortex-file-compressed 529961750 5.3619e+08 0.988384 ns
tpch_q02/arrow 143821621 1.42769e+08 1.00738 ns
tpch_q02/parquet 172094379 1.77336e+08 0.970442 ns
tpch_q02/vortex-file-compressed 149011415 1.50431e+08 0.990566 ns
tpch_q03/arrow 171379880 1.75393e+08 0.977117 ns
tpch_q03/parquet 376653704 3.76328e+08 1.00087 ns
tpch_q03/vortex-file-compressed 238082621 2.36388e+08 1.00717 ns
tpch_q04/arrow 183226396 1.76527e+08 1.03795 ns
tpch_q04/parquet 215805183 2.24674e+08 0.960527 ns
tpch_q04/vortex-file-compressed 173762703 1.75805e+08 0.988384 ns
tpch_q05/arrow 330982097 3.2567e+08 1.01631 ns
tpch_q05/parquet 512604462 5.05141e+08 1.01478 ns
tpch_q05/vortex-file-compressed 384863327 3.78238e+08 1.01752 ns
tpch_q06/arrow 26567583 2.59607e+07 1.02338 ns
tpch_q06/parquet 151462030 1.50801e+08 1.00438 ns
tpch_q06/vortex-file-compressed 67367186 6.52746e+07 1.03206 ns
tpch_q07/arrow 631669487 6.31055e+08 1.00097 ns
tpch_q07/parquet 774660723 7.70841e+08 1.00496 ns
tpch_q07/vortex-file-compressed 632176675 6.50893e+08 0.971244 ns
tpch_q08/arrow 267848545 2.73412e+08 0.979652 ns
tpch_q08/parquet 546685616 5.57444e+08 0.9807 ns
tpch_q08/vortex-file-compressed 363525381 3.61058e+08 1.00684 ns
tpch_q09/arrow 479558608 4.82854e+08 0.993176 ns
tpch_q09/parquet 790404477 7.98464e+08 0.989906 ns
tpch_q09/vortex-file-compressed 585974209 6.0579e+08 0.967289 ns
tpch_q10/arrow 264222219 2.60901e+08 1.01273 ns
tpch_q10/parquet 505876154 5.09425e+08 0.993033 ns
tpch_q10/vortex-file-compressed 297643548 2.83051e+08 1.05155 ns
tpch_q11/arrow 138966488 1.38797e+08 1.00122 ns
tpch_q11/parquet 146607649 1.47443e+08 0.994333 ns
tpch_q11/vortex-file-compressed 136526851 1.34036e+08 1.01858 ns
tpch_q12/arrow 182628380 1.82964e+08 0.998168 ns
tpch_q12/parquet 328121589 3.25584e+08 1.00779 ns
tpch_q12/vortex-file-compressed 258061494 2.60808e+08 0.989467 ns
tpch_q13/arrow 164013711 1.68035e+08 0.976067 ns
tpch_q13/parquet 323595452 3.1406e+08 1.03036 ns
tpch_q13/vortex-file-compressed 181248608 1.79007e+08 1.01252 ns
tpch_q14/arrow 36563942 3.75003e+07 0.975032 ns
tpch_q14/parquet 233058875 2.36181e+08 0.986779 ns
tpch_q14/vortex-file-compressed 77304550 7.9851e+07 0.96811 ns
tpch_q15/arrow 68945453 6.82557e+07 1.01011 ns
tpch_q15/parquet 328139005 3.27624e+08 1.00157 ns
tpch_q15/vortex-file-compressed 133367156 1.34481e+08 0.991718 ns
tpch_q16/arrow 99755875 9.88606e+07 1.00906 ns
tpch_q16/parquet 115005139 1.11792e+08 1.02874 ns
tpch_q16/vortex-file-compressed 105526763 1.03176e+08 1.02279 ns
tpch_q17/arrow 613821975 6.20673e+08 0.988962 ns
tpch_q17/parquet 692465231 7.0725e+08 0.979096 ns
tpch_q17/vortex-file-compressed 612399764 6.19487e+08 0.98856 ns
tpch_q18/arrow 1117044491 1.13064e+09 0.987979 ns
tpch_q18/parquet 1348111685 1.33451e+09 1.0102 ns
tpch_q18/vortex-file-compressed 1163857866 1.17145e+09 0.993521 ns
tpch_q19/arrow 151891883 1.50214e+08 1.01117 ns
tpch_q19/parquet 417241174 4.18875e+08 0.9961 ns
tpch_q19/vortex-file-compressed 152355487 1.50768e+08 1.01053 ns
tpch_q20/arrow 179165681 1.80828e+08 0.990805 ns
tpch_q20/parquet 325351565 3.19009e+08 1.01988 ns
tpch_q20/vortex-file-compressed 217544175 2.16685e+08 1.00397 ns
tpch_q21/arrow 994769082 9.96145e+08 0.998619 ns
tpch_q21/parquet 1128795204 1.12573e+09 1.00272 ns
tpch_q21/vortex-file-compressed 991105311 9.90732e+08 1.00038 ns
tpch_q22/arrow 78976075 7.7492e+07 1.01915 ns
tpch_q22/parquet 109807235 1.08957e+08 1.0078 ns
tpch_q22/vortex-file-compressed 86777284 8.50379e+07 1.02045 ns

Copy link
Contributor

Benchmarks: Clickbench

Table of Results
name PR 54939b7 base 1d36132 ratio (PR/base) unit
clickbench_q00/parquet 1812361 1.84216e+06 0.983823 ns
clickbench_q01/parquet 59447320 6.10587e+07 0.97361 ns
clickbench_q02/parquet 118267975 1.17718e+08 1.00467 ns
clickbench_q03/parquet 82645269 8.19706e+07 1.00823 ns
clickbench_q04/parquet 648780349 6.64052e+08 0.977002 ns
clickbench_q05/parquet 820618160 8.30202e+08 0.988456 ns
clickbench_q06/parquet 1904141 1.94516e+06 0.978915 ns
clickbench_q07/parquet 62664363 6.26823e+07 0.999714 ns
clickbench_q08/parquet 746431796 7.45145e+08 1.00173 ns
clickbench_q09/parquet 1028415769 1.04093e+09 0.98798 ns
clickbench_q10/parquet 246965887 2.53184e+08 0.975441 ns
clickbench_q11/parquet 295846666 3.05789e+08 0.967487 ns
clickbench_q12/parquet 826393876 8.15683e+08 1.01313 ns
clickbench_q13/parquet 1061178366 1.06036e+09 1.00077 ns
clickbench_q14/parquet 829058598 8.40992e+08 0.98581 ns
clickbench_q15/parquet 725710604 7.73472e+08 0.93825 ns
clickbench_q16/parquet 1612486562 1.65755e+09 0.972813 ns
clickbench_q17/parquet 1428987576 1.43566e+09 0.995353 ns
clickbench_q18/parquet 2970447433 3.00494e+09 0.988521 ns
clickbench_q19/parquet 64936909 6.41428e+07 1.01238 ns
clickbench_q20/parquet 1184904908 1.19311e+09 0.993121 ns
clickbench_q21/parquet 1453477813 1.42357e+09 1.02101 ns
clickbench_q22/parquet 2426486410 2.44051e+09 0.994253 ns
clickbench_q23/parquet 8272938593 8.32308e+09 0.993976 ns
clickbench_q24/parquet 526426539 5.30821e+08 0.991721 ns
clickbench_q25/parquet 510890179 5.12806e+08 0.996264 ns
clickbench_q26/parquet 580709104 5.90394e+08 0.983596 ns
clickbench_q27/parquet 1626232842 1.61425e+09 1.00742 ns
clickbench_q28/parquet 11319739660 1.15588e+10 0.979314 ns
clickbench_q29/parquet 421821374 4.37618e+08 0.963903 ns
clickbench_q30/parquet 767968749 7.82011e+08 0.982043 ns
clickbench_q31/parquet 794996569 8.33563e+08 0.953733 ns
clickbench_q32/parquet 2659866590 2.8165e+09 0.944387 ns
clickbench_q33/parquet 2813819666 2.88288e+09 0.976046 ns
clickbench_q34/parquet 2818529647 2.81636e+09 1.00077 ns
clickbench_q35/parquet 833671424 8.61993e+08 0.967144 ns
clickbench_q36/parquet 170182032 1.75881e+08 0.967599 ns
clickbench_q37/parquet 85880458 8.66685e+07 0.990908 ns
clickbench_q38/parquet 113395095 1.14515e+08 0.99022 ns
clickbench_q39/parquet 320625841 3.23325e+08 0.991652 ns
clickbench_q40/parquet 48962806 5.106e+07 0.958927 ns
clickbench_q41/parquet 47311039 4.98389e+07 0.949279 ns
clickbench_q42/parquet 65708886 6.78308e+07 0.968717 ns
clickbench_q00/vortex-file-compressed 2043072 2.04523e+06 0.998946 ns
clickbench_q01/vortex-file-compressed 27501491 2.77991e+07 0.989295 ns
clickbench_q02/vortex-file-compressed 89859808 8.96519e+07 1.00232 ns
clickbench_q03/vortex-file-compressed 79485079 8.04188e+07 0.988389 ns
clickbench_q04/vortex-file-compressed 605319976 6.3389e+08 0.954929 ns
clickbench_q05/vortex-file-compressed 630397605 6.45317e+08 0.976881 ns
clickbench_q06/vortex-file-compressed 2110641 2.11042e+06 1.00011 ns
clickbench_q07/vortex-file-compressed 56224969 5.80498e+07 0.968565 ns
clickbench_q08/vortex-file-compressed 745525601 7.59196e+08 0.981994 ns
clickbench_q09/vortex-file-compressed 916160796 9.59783e+08 0.95455 ns
clickbench_q10/vortex-file-compressed 230194629 2.54635e+08 0.904018 ns
clickbench_q11/vortex-file-compressed 269788970 3.09823e+08 0.870785 ns
clickbench_q12/vortex-file-compressed 562511111 5.90131e+08 0.953196 ns
clickbench_q13/vortex-file-compressed 882211816 9.07128e+08 0.972533 ns
clickbench_q14/vortex-file-compressed 564138975 6.00085e+08 0.940099 ns
clickbench_q15/vortex-file-compressed 738437051 7.40349e+08 0.997418 ns
clickbench_q16/vortex-file-compressed 1431269965 1.40387e+09 1.01952 ns
clickbench_q17/vortex-file-compressed 1314483468 1.30415e+09 1.00792 ns
clickbench_q18/vortex-file-compressed 2789600994 2.93385e+09 0.950831 ns
clickbench_q19/vortex-file-compressed 44117154 4.3393e+07 1.01669 ns
clickbench_q20/vortex-file-compressed 478834809 5.0538e+08 0.947475 ns
clickbench_q21/vortex-file-compressed 733207473 7.71493e+08 0.950374 ns
clickbench_q22/vortex-file-compressed 1832383355 1.9305e+09 0.949177 ns
clickbench_q23/vortex-file-compressed 3892108829 4.00298e+09 0.972304 ns
clickbench_q24/vortex-file-compressed 335095688 3.59923e+08 0.931021 ns
clickbench_q25/vortex-file-compressed 299216116 3.22661e+08 0.927338 ns
clickbench_q26/vortex-file-compressed 401315980 4.18232e+08 0.959554 ns
clickbench_q27/vortex-file-compressed 1364156536 1.40692e+09 0.969604 ns
clickbench_q28/vortex-file-compressed 10633951019 1.07256e+10 0.991453 ns
clickbench_q29/vortex-file-compressed 717764676 6.78528e+08 1.05783 ns
clickbench_q30/vortex-file-compressed 562869444 5.9261e+08 0.949814 ns
clickbench_q31/vortex-file-compressed 605614495 6.20059e+08 0.976704 ns
clickbench_q32/vortex-file-compressed 2680164324 2.79847e+09 0.957724 ns
clickbench_q33/vortex-file-compressed 2163751107 2.22569e+09 0.972171 ns
clickbench_q34/vortex-file-compressed 2165919759 2.21627e+09 0.97728 ns
clickbench_q35/vortex-file-compressed 927803725 9.46139e+08 0.980621 ns
clickbench_q36/vortex-file-compressed 47981663 4.57112e+07 1.04967 ns
clickbench_q37/vortex-file-compressed 48843804 4.25824e+07 1.14704 ns
clickbench_q38/vortex-file-compressed 39393530 3.84227e+07 1.02527 ns
clickbench_q39/vortex-file-compressed 75886930 7.27625e+07 1.04294 ns
clickbench_q40/vortex-file-compressed 28818927 2.88393e+07 0.999293 ns
clickbench_q41/vortex-file-compressed 29685072 3.0271e+07 0.980645 ns
clickbench_q42/vortex-file-compressed 39372881 3.35341e+07 1.17411 ns

@gatesn
Copy link
Contributor

gatesn commented Jan 24, 2025

It's worth running this on our AVX512 machine too, to see if the switch point depends on SIMD width. M3's only have 128 bits IIRC.

@danking
Copy link
Member Author

danking commented Jan 30, 2025

On a c2-standard-4 (Cascade Lake), the switch points are slightly different. It seems i8: 0.02, i16: 0.03, i32: 0.075, i64: 0.09. This PR uses: i8: 0.02, i16: 0.03, i32: 0.04, i64: 0.04.

512 / 128 = 4. These tests use 10,000 element arrays, so 0.04 is around 400 elements whereas 0.075 and 0.09 are around 750 and 900.

I'm not sure there's a robust way to pick this threshold without benchmarking on the target machine. I'd be happy to push the 4 and 8 byte types up to 0.075 and 0.09. On an Apple M3 this is 20-35% slower but we're talking about 2.0 us vs 1.5 us.

CPU

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) CPU @ 3.10GHz
stepping	: 7
microcode	: 0xffffffff
cpu MHz		: 3100.326
cache size	: 25344 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs taa mmio_stale_data retbleed eibrs_pbrsb bhi
bogomips	: 6200.65
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

…5% and 80%

This new benchmark demonstrates that the switchpoint is in [0.02, 0.04]. 8-bit elements
switch around 0.02, but 32- and 64-bit elements switch around 0.04.

[Google Sheet with one run on my Apple M3 Max](https://docs.google.com/spreadsheets/d/1T4JeSLnpFegA_pRS70iNu4ve9YMjEu-j1vRL7spazoA/edit?gid=624487667#gid=624487667).
@danking danking force-pushed the dk/bitpacking-filter-selection-threshold branch from 02cab19 to 369f8cf Compare January 30, 2025 23:24
@danking
Copy link
Member Author

danking commented Jan 30, 2025

Okay, I went with the Cascade Lake threshold as those are best for our benchmarks. I wish I had a more principled way to write them down or some way to tune to the current CPU.

@danking danking requested a review from robert3005 January 30, 2025 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants