feat: faster bitpacking filter for selectivities from 5% to 80% #2068

danking · 2025-01-24T16:11:18Z

On an Apple M3 Max, the new benchmark in this PR demonstrates that the switchpoint is in [0.02, 0.04]. 8-bit, 16-bit, 32-bit, and 64-bit arrays switch at 0.02, 0.03, 0.04, and 0.04, respectively. I later ran the benchmarks on a Cascade Lake cloud machine and found that the 8-bit, 16-bit, 32-bit, and 64-bit arrays switch at 0.03, 0.03, 0.075 and 0.09, respectively. In this PR, I use the Cascade Lake values, but I don't have a great answer for how to pick these. Regardless, it is clear that 0.8 is not the correct choice.

Google Sheet with one run on my Apple M3 Max. A sheet with a run on a c2-standard-4 "Cascade Lake"

github-actions · 2025-01-24T16:13:35Z

Benchmarks: random_access

Table of Results

name	PR `54939b7`	base `1d36132`	ratio (PR/base)	unit
random-access/vortex-tokio-local-disk	2.65975e+06	2.62011e+06	1.01513	ns
random-access/vortex-local-fs	3.48053e+06	3.30901e+06	1.05184	ns
random-access/parquet-tokio-local-disk	2.22848e+08	2.21792e+08	1.00476	ns

github-actions · 2025-01-24T16:14:24Z

Benchmarks: datafusion

Table of Results

name	PR `54939b7`	base `1d36132`	ratio (PR/base)	unit
arrow/planning	944448	955104	0.988842	ns
arrow/exec	1.97268e+06	2.01344e+06	0.979753	ns
vortex-compressed/planning	584257	586488	0.996197	ns
vortex-compressed/exec	2.66778e+06	2.71034e+06	0.984296	ns
vortex-uncompressed/planning	582316	585229	0.995023	ns
vortex-uncompressed/exec	1.54471e+06	1.55866e+06	0.991045	ns

github-actions · 2025-01-24T16:23:30Z

Benchmarks: TPC-H

Table of Results

name	PR `54939b7`	base `1d36132`	ratio (PR/base)	unit
tpch_q01/arrow	547800499	5.49082e+08	0.997667	ns
tpch_q01/parquet	761213273	7.72487e+08	0.985406	ns
tpch_q01/vortex-file-compressed	529961750	5.3619e+08	0.988384	ns
tpch_q02/arrow	143821621	1.42769e+08	1.00738	ns
tpch_q02/parquet	172094379	1.77336e+08	0.970442	ns
tpch_q02/vortex-file-compressed	149011415	1.50431e+08	0.990566	ns
tpch_q03/arrow	171379880	1.75393e+08	0.977117	ns
tpch_q03/parquet	376653704	3.76328e+08	1.00087	ns
tpch_q03/vortex-file-compressed	238082621	2.36388e+08	1.00717	ns
tpch_q04/arrow	183226396	1.76527e+08	1.03795	ns
tpch_q04/parquet	215805183	2.24674e+08	0.960527	ns
tpch_q04/vortex-file-compressed	173762703	1.75805e+08	0.988384	ns
tpch_q05/arrow	330982097	3.2567e+08	1.01631	ns
tpch_q05/parquet	512604462	5.05141e+08	1.01478	ns
tpch_q05/vortex-file-compressed	384863327	3.78238e+08	1.01752	ns
tpch_q06/arrow	26567583	2.59607e+07	1.02338	ns
tpch_q06/parquet	151462030	1.50801e+08	1.00438	ns
tpch_q06/vortex-file-compressed	67367186	6.52746e+07	1.03206	ns
tpch_q07/arrow	631669487	6.31055e+08	1.00097	ns
tpch_q07/parquet	774660723	7.70841e+08	1.00496	ns
tpch_q07/vortex-file-compressed	632176675	6.50893e+08	0.971244	ns
tpch_q08/arrow	267848545	2.73412e+08	0.979652	ns
tpch_q08/parquet	546685616	5.57444e+08	0.9807	ns
tpch_q08/vortex-file-compressed	363525381	3.61058e+08	1.00684	ns
tpch_q09/arrow	479558608	4.82854e+08	0.993176	ns
tpch_q09/parquet	790404477	7.98464e+08	0.989906	ns
tpch_q09/vortex-file-compressed	585974209	6.0579e+08	0.967289	ns
tpch_q10/arrow	264222219	2.60901e+08	1.01273	ns
tpch_q10/parquet	505876154	5.09425e+08	0.993033	ns
tpch_q10/vortex-file-compressed	297643548	2.83051e+08	1.05155	ns
tpch_q11/arrow	138966488	1.38797e+08	1.00122	ns
tpch_q11/parquet	146607649	1.47443e+08	0.994333	ns
tpch_q11/vortex-file-compressed	136526851	1.34036e+08	1.01858	ns
tpch_q12/arrow	182628380	1.82964e+08	0.998168	ns
tpch_q12/parquet	328121589	3.25584e+08	1.00779	ns
tpch_q12/vortex-file-compressed	258061494	2.60808e+08	0.989467	ns
tpch_q13/arrow	164013711	1.68035e+08	0.976067	ns
tpch_q13/parquet	323595452	3.1406e+08	1.03036	ns
tpch_q13/vortex-file-compressed	181248608	1.79007e+08	1.01252	ns
tpch_q14/arrow	36563942	3.75003e+07	0.975032	ns
tpch_q14/parquet	233058875	2.36181e+08	0.986779	ns
tpch_q14/vortex-file-compressed	77304550	7.9851e+07	0.96811	ns
tpch_q15/arrow	68945453	6.82557e+07	1.01011	ns
tpch_q15/parquet	328139005	3.27624e+08	1.00157	ns
tpch_q15/vortex-file-compressed	133367156	1.34481e+08	0.991718	ns
tpch_q16/arrow	99755875	9.88606e+07	1.00906	ns
tpch_q16/parquet	115005139	1.11792e+08	1.02874	ns
tpch_q16/vortex-file-compressed	105526763	1.03176e+08	1.02279	ns
tpch_q17/arrow	613821975	6.20673e+08	0.988962	ns
tpch_q17/parquet	692465231	7.0725e+08	0.979096	ns
tpch_q17/vortex-file-compressed	612399764	6.19487e+08	0.98856	ns
tpch_q18/arrow	1117044491	1.13064e+09	0.987979	ns
tpch_q18/parquet	1348111685	1.33451e+09	1.0102	ns
tpch_q18/vortex-file-compressed	1163857866	1.17145e+09	0.993521	ns
tpch_q19/arrow	151891883	1.50214e+08	1.01117	ns
tpch_q19/parquet	417241174	4.18875e+08	0.9961	ns
tpch_q19/vortex-file-compressed	152355487	1.50768e+08	1.01053	ns
tpch_q20/arrow	179165681	1.80828e+08	0.990805	ns
tpch_q20/parquet	325351565	3.19009e+08	1.01988	ns
tpch_q20/vortex-file-compressed	217544175	2.16685e+08	1.00397	ns
tpch_q21/arrow	994769082	9.96145e+08	0.998619	ns
tpch_q21/parquet	1128795204	1.12573e+09	1.00272	ns
tpch_q21/vortex-file-compressed	991105311	9.90732e+08	1.00038	ns
tpch_q22/arrow	78976075	7.7492e+07	1.01915	ns
tpch_q22/parquet	109807235	1.08957e+08	1.0078	ns
tpch_q22/vortex-file-compressed	86777284	8.50379e+07	1.02045	ns

github-actions · 2025-01-24T16:31:52Z

Benchmarks: Clickbench

Table of Results

name	PR `54939b7`	base `1d36132`	ratio (PR/base)	unit
clickbench_q00/parquet	1812361	1.84216e+06	0.983823	ns
clickbench_q01/parquet	59447320	6.10587e+07	0.97361	ns
clickbench_q02/parquet	118267975	1.17718e+08	1.00467	ns
clickbench_q03/parquet	82645269	8.19706e+07	1.00823	ns
clickbench_q04/parquet	648780349	6.64052e+08	0.977002	ns
clickbench_q05/parquet	820618160	8.30202e+08	0.988456	ns
clickbench_q06/parquet	1904141	1.94516e+06	0.978915	ns
clickbench_q07/parquet	62664363	6.26823e+07	0.999714	ns
clickbench_q08/parquet	746431796	7.45145e+08	1.00173	ns
clickbench_q09/parquet	1028415769	1.04093e+09	0.98798	ns
clickbench_q10/parquet	246965887	2.53184e+08	0.975441	ns
clickbench_q11/parquet	295846666	3.05789e+08	0.967487	ns
clickbench_q12/parquet	826393876	8.15683e+08	1.01313	ns
clickbench_q13/parquet	1061178366	1.06036e+09	1.00077	ns
clickbench_q14/parquet	829058598	8.40992e+08	0.98581	ns
clickbench_q15/parquet	725710604	7.73472e+08	0.93825	ns
clickbench_q16/parquet	1612486562	1.65755e+09	0.972813	ns
clickbench_q17/parquet	1428987576	1.43566e+09	0.995353	ns
clickbench_q18/parquet	2970447433	3.00494e+09	0.988521	ns
clickbench_q19/parquet	64936909	6.41428e+07	1.01238	ns
clickbench_q20/parquet	1184904908	1.19311e+09	0.993121	ns
clickbench_q21/parquet	1453477813	1.42357e+09	1.02101	ns
clickbench_q22/parquet	2426486410	2.44051e+09	0.994253	ns
clickbench_q23/parquet	8272938593	8.32308e+09	0.993976	ns
clickbench_q24/parquet	526426539	5.30821e+08	0.991721	ns
clickbench_q25/parquet	510890179	5.12806e+08	0.996264	ns
clickbench_q26/parquet	580709104	5.90394e+08	0.983596	ns
clickbench_q27/parquet	1626232842	1.61425e+09	1.00742	ns
clickbench_q28/parquet	11319739660	1.15588e+10	0.979314	ns
clickbench_q29/parquet	421821374	4.37618e+08	0.963903	ns
clickbench_q30/parquet	767968749	7.82011e+08	0.982043	ns
clickbench_q31/parquet	794996569	8.33563e+08	0.953733	ns
clickbench_q32/parquet	2659866590	2.8165e+09	0.944387	ns
clickbench_q33/parquet	2813819666	2.88288e+09	0.976046	ns
clickbench_q34/parquet	2818529647	2.81636e+09	1.00077	ns
clickbench_q35/parquet	833671424	8.61993e+08	0.967144	ns
clickbench_q36/parquet	170182032	1.75881e+08	0.967599	ns
clickbench_q37/parquet	85880458	8.66685e+07	0.990908	ns
clickbench_q38/parquet	113395095	1.14515e+08	0.99022	ns
clickbench_q39/parquet	320625841	3.23325e+08	0.991652	ns
clickbench_q40/parquet	48962806	5.106e+07	0.958927	ns
clickbench_q41/parquet	47311039	4.98389e+07	0.949279	ns
clickbench_q42/parquet	65708886	6.78308e+07	0.968717	ns
clickbench_q00/vortex-file-compressed	2043072	2.04523e+06	0.998946	ns
clickbench_q01/vortex-file-compressed	27501491	2.77991e+07	0.989295	ns
clickbench_q02/vortex-file-compressed	89859808	8.96519e+07	1.00232	ns
clickbench_q03/vortex-file-compressed	79485079	8.04188e+07	0.988389	ns
clickbench_q04/vortex-file-compressed	605319976	6.3389e+08	0.954929	ns
clickbench_q05/vortex-file-compressed	630397605	6.45317e+08	0.976881	ns
clickbench_q06/vortex-file-compressed	2110641	2.11042e+06	1.00011	ns
clickbench_q07/vortex-file-compressed	56224969	5.80498e+07	0.968565	ns
clickbench_q08/vortex-file-compressed	745525601	7.59196e+08	0.981994	ns
clickbench_q09/vortex-file-compressed	916160796	9.59783e+08	0.95455	ns
clickbench_q10/vortex-file-compressed	230194629	2.54635e+08	0.904018	ns
clickbench_q11/vortex-file-compressed	269788970	3.09823e+08	0.870785	ns
clickbench_q12/vortex-file-compressed	562511111	5.90131e+08	0.953196	ns
clickbench_q13/vortex-file-compressed	882211816	9.07128e+08	0.972533	ns
clickbench_q14/vortex-file-compressed	564138975	6.00085e+08	0.940099	ns
clickbench_q15/vortex-file-compressed	738437051	7.40349e+08	0.997418	ns
clickbench_q16/vortex-file-compressed	1431269965	1.40387e+09	1.01952	ns
clickbench_q17/vortex-file-compressed	1314483468	1.30415e+09	1.00792	ns
clickbench_q18/vortex-file-compressed	2789600994	2.93385e+09	0.950831	ns
clickbench_q19/vortex-file-compressed	44117154	4.3393e+07	1.01669	ns
clickbench_q20/vortex-file-compressed	478834809	5.0538e+08	0.947475	ns
clickbench_q21/vortex-file-compressed	733207473	7.71493e+08	0.950374	ns
clickbench_q22/vortex-file-compressed	1832383355	1.9305e+09	0.949177	ns
clickbench_q23/vortex-file-compressed	3892108829	4.00298e+09	0.972304	ns
clickbench_q24/vortex-file-compressed	335095688	3.59923e+08	0.931021	ns
clickbench_q25/vortex-file-compressed	299216116	3.22661e+08	0.927338	ns
clickbench_q26/vortex-file-compressed	401315980	4.18232e+08	0.959554	ns
clickbench_q27/vortex-file-compressed	1364156536	1.40692e+09	0.969604	ns
clickbench_q28/vortex-file-compressed	10633951019	1.07256e+10	0.991453	ns
clickbench_q29/vortex-file-compressed	717764676	6.78528e+08	1.05783	ns
clickbench_q30/vortex-file-compressed	562869444	5.9261e+08	0.949814	ns
clickbench_q31/vortex-file-compressed	605614495	6.20059e+08	0.976704	ns
clickbench_q32/vortex-file-compressed	2680164324	2.79847e+09	0.957724	ns
clickbench_q33/vortex-file-compressed	2163751107	2.22569e+09	0.972171	ns
clickbench_q34/vortex-file-compressed	2165919759	2.21627e+09	0.97728	ns
clickbench_q35/vortex-file-compressed	927803725	9.46139e+08	0.980621	ns
clickbench_q36/vortex-file-compressed	47981663	4.57112e+07	1.04967	ns
clickbench_q37/vortex-file-compressed	48843804	4.25824e+07	1.14704	ns
clickbench_q38/vortex-file-compressed	39393530	3.84227e+07	1.02527	ns
clickbench_q39/vortex-file-compressed	75886930	7.27625e+07	1.04294	ns
clickbench_q40/vortex-file-compressed	28818927	2.88393e+07	0.999293	ns
clickbench_q41/vortex-file-compressed	29685072	3.0271e+07	0.980645	ns
clickbench_q42/vortex-file-compressed	39372881	3.35341e+07	1.17411	ns

gatesn · 2025-01-24T16:36:21Z

It's worth running this on our AVX512 machine too, to see if the switch point depends on SIMD width. M3's only have 128 bits IIRC.

danking · 2025-01-30T18:57:46Z

On a c2-standard-4 (Cascade Lake), the switch points are slightly different. It seems i8: 0.02, i16: 0.03, i32: 0.075, i64: 0.09. This PR uses: i8: 0.02, i16: 0.03, i32: 0.04, i64: 0.04.

512 / 128 = 4. These tests use 10,000 element arrays, so 0.04 is around 400 elements whereas 0.075 and 0.09 are around 750 and 900.

I'm not sure there's a robust way to pick this threshold without benchmarking on the target machine. I'd be happy to push the 4 and 8 byte types up to 0.075 and 0.09. On an Apple M3 this is 20-35% slower but we're talking about 2.0 us vs 1.5 us.

CPU

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) CPU @ 3.10GHz
stepping	: 7
microcode	: 0xffffffff
cpu MHz		: 3100.326
cache size	: 25344 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
bugs		: spectre_v1 spectre_v2 spec_store_bypass swapgs taa mmio_stale_data retbleed eibrs_pbrsb bhi
bogomips	: 6200.65
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

…5% and 80% This new benchmark demonstrates that the switchpoint is in [0.02, 0.04]. 8-bit elements switch around 0.02, but 32- and 64-bit elements switch around 0.04. [Google Sheet with one run on my Apple M3 Max](https://docs.google.com/spreadsheets/d/1T4JeSLnpFegA_pRS70iNu4ve9YMjEu-j1vRL7spazoA/edit?gid=624487667#gid=624487667).

danking · 2025-01-30T23:25:45Z

Okay, I went with the Cascade Lake threshold as those are best for our benchmarks. I wish I had a more principled way to write them down or some way to tune to the current CPU.

danking added the benchmark Run benchmarks on this branch label Jan 24, 2025

github-actions bot removed the benchmark Run benchmarks on this branch label Jan 24, 2025

danking added 2 commits January 30, 2025 18:19

use Cascade Lake thresholds and fix compilation errors

369f8cf

danking force-pushed the dk/bitpacking-filter-selection-threshold branch from 02cab19 to 369f8cf Compare January 30, 2025 23:24

danking added 6 commits January 30, 2025 18:30

fix import after merge

017319c

revert unnecessary changes

eff7d11

revert unnecessary changes

bec1e61

revert unnecessary changes

c426d85

fix names

08c047f

fix

e09c8b8

danking requested a review from robert3005 January 30, 2025 23:40

1-byte is also 0.03

b6418d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: faster bitpacking filter for selectivities from 5% to 80% #2068

feat: faster bitpacking filter for selectivities from 5% to 80% #2068

danking commented Jan 24, 2025 •

edited

Loading

github-actions bot commented Jan 24, 2025

github-actions bot commented Jan 24, 2025

github-actions bot commented Jan 24, 2025

github-actions bot commented Jan 24, 2025

gatesn commented Jan 24, 2025

danking commented Jan 30, 2025 •

edited

Loading

danking commented Jan 30, 2025

feat: faster bitpacking filter for selectivities from 5% to 80% #2068

Are you sure you want to change the base?

feat: faster bitpacking filter for selectivities from 5% to 80% #2068

Conversation

danking commented Jan 24, 2025 • edited Loading

github-actions bot commented Jan 24, 2025

Benchmarks: random_access

github-actions bot commented Jan 24, 2025

Benchmarks: datafusion

github-actions bot commented Jan 24, 2025

Benchmarks: TPC-H

github-actions bot commented Jan 24, 2025

Benchmarks: Clickbench

gatesn commented Jan 24, 2025

danking commented Jan 30, 2025 • edited Loading

CPU

danking commented Jan 30, 2025

danking commented Jan 24, 2025 •

edited

Loading

danking commented Jan 30, 2025 •

edited

Loading