problems
: benchmark problems (as Julia files with a particular structure, seeproblems/0_trivial.jl
)jlbench
: Python API for the benchmarkscripts
: project build scripts
# In the root dir of the project.
# Create a Python virtual environment:
python -m venv .venv
# Activate the venv (adjust for your shell):
source .venv/bin/activate
# Install the dependencies:
pip install -r requirements.txt
# In the root dir of the project.
julia --project=executor -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'
First initialize the jlbench
environment as above.
All the scripts are thin wrappers around Python.
Try passing --help
to see the options,
before running Python directly remember to activate the venv.
scripts/bench-build.sh
To see how your problem parses (prints human-readable YAML to stdout):
scripts/bench-build.sh --format yaml $YOUR_PROBLEM_FILE
scripts/bench-local-exec.sh
To directly execute a single problem file (no logs or reporting):
scripts/bench-local-exec.sh $YOUR_PROBLEM_FILE
The executor
dir describes the environment in which the problems are executed.
There's a Julia project file and a Dockerfile,
the idea is that if we need anything else (e.g., a Python dependency, like in one task we discussed)
we can add it there instead of the main project.
The commands below use docker
, you may need to prefix them with sudo
depending on your system.
(Consider adding a docker
script like this one to a dir on your path:
#!/usr/bin/env bash
sudo -p '[sudo docker] Password:' docker "$@"
)
This takes a moment (it doesn't help to initialize the local environment).
scripts/executor-docker-build.sh
Use Julia:
julia --project=executor -e 'using Pkg; Pkg.add("MyPackage")'
Or interactively:
julia --project=executor
]
add MyPackage
Drop into a julia
shell:
docker run --rm -it jlbench-executor
Run a local problem:
docker run --rm -i jlbench-executor - < $YOUR_PROBLEM_FILE
Drop into a bash shell:
docker run --rm -it --entrypoint /bin/bash jlbench-executor
Best do this in with jlbench
env activated.
You'll need to build a JSONL file with the problems first.
Check out prl_ml
locally and install it:
pip install -e ../prl_ml
prl_ml
doesn't list its own deps.
I added the necessary deps to requirements.txt
.
Use the generated JSONL problems and sample solutions from gpt-4o-mini
:
# set OPENAI_API_KEY
python3 -m prl_ml.batched_ml_generation.gpt4o_chatcoder \
--output-dir out/raw-responses-gpt-4o-mini \
--model-name gpt-4o-mini \
--completion-limit 1 \
--temperature 0.2 \
--extra-columns tests \
--dataset 'jsonl:./out/problems.jsonl'
See https://nuprl.github.io/prl_ml/batched_lm_generation/ for more details on the Python command, including how to use a different model.
python3 -m prl_ml.batched_lm_generation.completion_extraction \
out/raw-responses-gpt-4o-mini \
out/experiment-gpt-4o-mini
This script wraps a command taken from prl_ml
.
It evaluates answers in a single "experiment" directory (generated as above)
and it writes .result.json.gz
files with the evaluation results to the same directory.
sudo scripts/experiment-evaluate.sh \
--tests-fields tests \
out/experiment-gpt-4o-mini