Skip to content

A set of native implementation of common bioinformatics algorithms to be used as Arrow-Datafusion or SeQuiLa (Apache Spark) extensions.

License

Notifications You must be signed in to change notification settings

biodatageeks/sequila-native

Folders and files

NameName
Last commit message
Last commit date

Latest commit

e38fe19 · Jan 13, 2025

History

92 Commits
Nov 19, 2024
Jan 12, 2025
Jan 5, 2025
Jan 13, 2025
Nov 3, 2024
Nov 27, 2024
Feb 12, 2024
Jan 12, 2025
Jan 12, 2025
Feb 7, 2024
Nov 28, 2024

Repository files navigation

sequila-native

A set of native implementation of common bioinformatics algorithms to be used as Arrow-DataFusion or SeQuiLa (Apache Spark) extensions.

RUSTFLAGS="-C target-cpu=native" RUST_LOG=info cargo run --release

Run a sql file

RUST_LOG=info cargo run -p sequila-cli -- --file queries/q1-coitrees.sql

Perf

https://docs.rs/crate/flamegraph/0.6.5

On ArchLinux

sudo pacman -S perf gcc-libs glibc
cargo install flamegraph
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

cargo build --release
flamegraph -- target/release/sequila-cli -f queries/q1-coitrees.sql

Recommended parameters

SET sequila.prefer_interval_join TO true;
SET sequila.interval_join_algorithm TO coitrees;
SET datafusion.optimizer.repartition_joins TO false;
SET datafusion.execution.coalesce_batches TO false;

-- for controlling parallism level (only for bechmarking purposes otherwise use defaults)
SET datafusion.execution.target_partitions=1;    

How to run benchmark locally:

  1. Download and unpack test dataset.
  2. Export env variable with path to the root folder with benchmark data, e.g.:
export BENCH_DATA_ROOT=/Users/mwiewior/research/databio/ 
  1. Run benchmark
RUSTFLAGS="-Ctarget-cpu=native" cargo bench --bench databio_benchmark -- --quick

About

A set of native implementation of common bioinformatics algorithms to be used as Arrow-Datafusion or SeQuiLa (Apache Spark) extensions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages