This project is developed based on Naru's code:https://github.com/naru-project/naru, huge thanks to its authors
- We recommend Python 3.10.9 with Win10 or Ubuntu
pip3 install -r requirements.txt
python3 ./MySampler/setup.py
- Download DMV dataset used by Naru:https://github.com/naru-project/naru
- Download Kddcup98 and Census used by UAE:https://github.com/pagegitss/UAE
- put
Vehicle__Snowmobile__and_Boat_Registrations.csv, cup98.csv, census.csv
into./datasets
- run
python3 generate_all_workload_gpu.py
- run
python3 generate_train_workload_gpu_npred.py
for queries used to evaluate scalability
- Note that
100000queries
withseed42
is for training, and2000queries
withseed42
is In-Workload Queries for testing, and2000queries
withseed1234
is Random Queries for testing
- For DMV, run
python3 train_model.py --num-queries=100000 --dataset=dmv --epochs=50 --warmups=12000 --bs=2048 --expand-factor=4 --layers=0 --direct-io --input-encoding=binary --output-encoding=one_hot --multi_pred_embedding=mlp --use-workloads --tag=dmv_mlp_binary_Workloads --gpu-id=0
- For Kddcup98, run
python3 train_model.py --num-queries=100000 --dataset=cup98 --epochs=50 --warmups=12000 --bs=100 --expand-factor=4 --layers=2 --fc-hiddens=128 --residual --direct-io --input-encoding=binary --output-encoding=one_hot --multi_pred_embedding=mlp --use-workloads --tag=cup98_mlp_binary_Workloads --gpu-id=0
- For Census, run
python3 train_model.py --num-queries=100000 --dataset=census --epochs=50 --warmups=12000 --bs=100 --expand-factor=4 --layers=2 --fc-hiddens=128 --residual --direct-io --input-encoding=binary --output-encoding=one_hot --multi_pred_embedding=mlp --use-workloads --tag=census_mlp_binary_Workloads --gpu-id=0
- For Duet's data-driven version, remove the
--use-workloads
option
- run
python3 run_eval_npred.py
- run
python3 draw_nfilter_curve.py
to draw the scalability plot with the same format as our paper
- We give the code to evaluate the error of all epochs and the result of the epoch when the model achieve minium loss
- Take DMV as example, run
python3 eval_model.py --dataset=dmv --load_queries=dmv-2000queries-oracle-cards-seed1234.pkl --glob=dmv-16.3MB-data19.550-made-hidden512_256_512_128_1024-emb32-directIo-binaryInone_hotOut-inputNoEmbIfLeq-mlp-seed0 --layers=0 --direct_io --input_encoding=binary --output_encoding=one_hot --multi_pred_embedding=mlp --tag=dmv_mlp_binary_noWorkloads --gpu_id=0 --end_epoch=50
- For the option
load_queries
, change the seed from 1234 to 42 to switch workloads from Random Queries to In-Workload Queries of the test workload - For the option
glob
, use the model's name as the format above - For the rest options, set them according to the training options above