- Ustalov, D. Reliable, Reproducible, and Really Fast Leaderboards with Evalica. 2024. arXiv: 2412.11314 [cs.CL].
requirements.txt
- Chatbot Arena's Dump (August 2024): https://storage.googleapis.com/arena_external_data/public/clean_battle_20240814_public.json
- LLMFAO Dataset: https://raw.githubusercontent.com/dustalov/llmfao/refs/heads/master/crowd-comparisons.csv →
llmfao.csv
Table 1: chatbot_arena.csv
python3 -m chatbot_arena
Table 2: rust_python.csv
python3 -m rust_python
Figure 3: scale.csv
python3 -m scale_data
python3 -m scale_compute