Skip to content

nuprl/JuliaLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Repo structure

  • problems: benchmark problems (as Julia files with a particular structure, see problems/0_trivial.jl)
  • jlbench: Python API for the benchmark
  • scripts: project build scripts

Initializing the jlbench environment

# In the root dir of the project.
# Create a Python virtual environment:
python -m venv .venv
# Activate the venv (adjust for your shell):
source .venv/bin/activate
# Install the dependencies:
pip install -r requirements.txt

Initializing the local executor environment

# In the root dir of the project.
julia --project=executor -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'

Bench scripts

First initialize the jlbench environment as above. All the scripts are thin wrappers around Python. Try passing --help to see the options, before running Python directly remember to activate the venv.

Building the bench problems

scripts/bench-build.sh

To see how your problem parses (prints human-readable YAML to stdout):

scripts/bench-build.sh --format yaml $YOUR_PROBLEM_FILE

Locally executing the problems

scripts/bench-local-exec.sh

To directly execute a single problem file (no logs or reporting):

scripts/bench-local-exec.sh $YOUR_PROBLEM_FILE

Executor container scripts

The executor dir describes the environment in which the problems are executed. There's a Julia project file and a Dockerfile, the idea is that if we need anything else (e.g., a Python dependency, like in one task we discussed) we can add it there instead of the main project.

The commands below use docker, you may need to prefix them with sudo depending on your system. (Consider adding a docker script like this one to a dir on your path:

#!/usr/bin/env bash
sudo -p '[sudo docker] Password:' docker "$@"

)

Building the Docker executor image

This takes a moment (it doesn't help to initialize the local environment).

scripts/executor-docker-build.sh

Adding Julia dependencies

Use Julia:

julia --project=executor -e 'using Pkg; Pkg.add("MyPackage")'

Or interactively:

julia --project=executor
]
add MyPackage

Running the executor container

Drop into a julia shell:

docker run --rm -it jlbench-executor

Run a local problem:

docker run --rm -i jlbench-executor - < $YOUR_PROBLEM_FILE

Drop into a bash shell:

docker run --rm -it --entrypoint /bin/bash jlbench-executor

Sampling from gpt-4o-mini via prl_ml

Setup

Best do this in with jlbench env activated. You'll need to build a JSONL file with the problems first.

Check out prl_ml locally and install it:

pip install -e ../prl_ml

prl_ml doesn't list its own deps. I added the necessary deps to requirements.txt.

Sampling the responses

Use the generated JSONL problems and sample solutions from gpt-4o-mini:

# set OPENAI_API_KEY
python3 -m prl_ml.batched_ml_generation.gpt4o_chatcoder \
     --output-dir out/raw-responses-gpt-4o-mini \
     --model-name gpt-4o-mini \
     --completion-limit 1 \
     --temperature 0.2 \
     --extra-columns tests \
     --dataset 'jsonl:./out/problems.jsonl'

See https://nuprl.github.io/prl_ml/batched_lm_generation/ for more details on the Python command, including how to use a different model.

Extracting the answers

python3 -m prl_ml.batched_lm_generation.completion_extraction \
    out/raw-responses-gpt-4o-mini \
    out/experiment-gpt-4o-mini

Evaluating the answers

This script wraps a command taken from prl_ml. It evaluates answers in a single "experiment" directory (generated as above) and it writes .result.json.gz files with the evaluation results to the same directory.

sudo scripts/experiment-evaluate.sh \
    --tests-fields tests \
    out/experiment-gpt-4o-mini