Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve rank fusion implementation in Anserini #2728

Open
lintool opened this issue Feb 17, 2025 · 1 comment
Open

Improve rank fusion implementation in Anserini #2728

lintool opened this issue Feb 17, 2025 · 1 comment

Comments

@lintool
Copy link
Member

lintool commented Feb 17, 2025

Rank fusion is the process of combining two or more ranked lists to produce a better ranked result. This is also called "hybrid search" sometimes.

I need help improving rank fusion features in Anserini. This would be a good URA project.

Start with the following:

Building indexes from scratch requires downloading the huge parquet tarball, so let's use the prebuilt indexes instead.
This downloads the prebuilt indexes:

bin/run.sh io.anserini.search.SearchCollection -index beir-v1.0.0-robust04.flat -topics beir-robust04 -output runs/run.beir.flat.robust04.txt -bm25 -removeQuery
bin/run.sh io.anserini.search.SearchFlatDenseVectors -index beir-v1.0.0-robust04.bge-base-en-v1.5.flat -topics beir-robust04.bge-base-en-v1.5 -output runs/run.beir.bge-base-en-v1.5.flat.cached_q.robust04.txt -threads 16 -removeQuery

Copy them into Anserini's indexes/ folder:

cp -r ~/.cache/pyserini/indexes/lucene-inverted.beir-v1.0.0-robust04.flat.20221116.505594.d508fc770002a99a5dc3da3d0fa001b7/ ./indexes/lucene-inverted.beir-v1.0.0-robust04.flat
cp -r ~/.cache/pyserini/indexes/lucene-flat.beir-v1.0.0-robust04.bge-base-en-v1.5.20240618.6cf601.7750b4abbc60fe821c5948a81296f1d0/ ./indexes/lucene-flat.beir-v1.0.0-robust04.bge-base-en-v1.5

These regression scripts will now run:

python src/main/python/run_regression.py --verify --search --regression beir-v1.0.0-robust04.flat
python src/main/python/run_regression.py --verify --search --regression beir-v1.0.0-robust04.bge-base-en-v1.5.parquet.flat.cached

python src/main/python/run_fusion_regression.py --regression beir-v1.0.0-robust04
@lilyjge
Copy link
Contributor

lilyjge commented Feb 23, 2025

I'm working on it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants