Repo structure

problems: benchmark problems (as Julia files with a particular structure, see problems/0_trivial.jl)
jlbench: Python API for the benchmark
scripts: project build scripts

Initializing the `jlbench` environment

# In the root dir of the project.
# Create a Python virtual environment:
python -m venv .venv
# Activate the venv (adjust for your shell):
source .venv/bin/activate
# Install the dependencies:
pip install -r requirements.txt

Initializing the local executor environment

# In the root dir of the project.
julia --project=executor -e 'using Pkg; Pkg.instantiate(); Pkg.precompile()'

Bench scripts

First initialize the jlbench environment as above. All the scripts are thin wrappers around Python. Try passing --help to see the options, before running Python directly remember to activate the venv.

Building the bench problems

scripts/bench-build.sh

To see how your problem parses (prints human-readable YAML to stdout):

scripts/bench-build.sh --format yaml $YOUR_PROBLEM_FILE

Locally executing the problems

scripts/bench-local-exec.sh

To directly execute a single problem file (no logs or reporting):

scripts/bench-local-exec.sh $YOUR_PROBLEM_FILE

Executor container scripts

The executor dir describes the environment in which the problems are executed. There's a Julia project file and a Dockerfile, the idea is that if we need anything else (e.g., a Python dependency, like in one task we discussed) we can add it there instead of the main project.

The commands below use docker, you may need to prefix them with sudo depending on your system. (Consider adding a docker script like this one to a dir on your path:

#!/usr/bin/env bash
sudo -p '[sudo docker] Password:' docker "$@"

)

Building the Docker executor image

This takes a moment (it doesn't help to initialize the local environment).

scripts/executor-docker-build.sh

Adding Julia dependencies

Use Julia:

julia --project=executor -e 'using Pkg; Pkg.add("MyPackage")'

Or interactively:

julia --project=executor
]
add MyPackage

Running the executor container

Drop into a julia shell:

docker run --rm -it jlbench-executor

Run a local problem:

docker run --rm -i jlbench-executor - < $YOUR_PROBLEM_FILE

Drop into a bash shell:

docker run --rm -it --entrypoint /bin/bash jlbench-executor

Sampling from `gpt-4o-mini` via `prl_ml`

Setup

Best do this in with jlbench env activated. You'll need to build a JSONL file with the problems first.

Check out prl_ml locally and install it:

pip install -e ../prl_ml

prl_ml doesn't list its own deps. I added the necessary deps to requirements.txt.

Sampling the responses

Use the generated JSONL problems and sample solutions from gpt-4o-mini:

# set OPENAI_API_KEY
python3 -m prl_ml.batched_ml_generation.gpt4o_chatcoder \
     --output-dir out/raw-responses-gpt-4o-mini \
     --model-name gpt-4o-mini \
     --completion-limit 1 \
     --temperature 0.2 \
     --extra-columns tests \
     --dataset 'jsonl:./out/problems.jsonl'

See https://nuprl.github.io/prl_ml/batched_lm_generation/ for more details on the Python command, including how to use a different model.

Extracting the answers

python3 -m prl_ml.batched_lm_generation.completion_extraction \
    out/raw-responses-gpt-4o-mini \
    out/experiment-gpt-4o-mini

Evaluating the answers

This script wraps a command taken from prl_ml. It evaluates answers in a single "experiment" directory (generated as above) and it writes .result.json.gz files with the evaluation results to the same directory.

sudo scripts/experiment-evaluate.sh \
    --tests-fields tests \
    out/experiment-gpt-4o-mini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repo structure

Initializing the `jlbench` environment

Initializing the local executor environment

Bench scripts

Building the bench problems

Locally executing the problems

Executor container scripts

Building the Docker executor image

Adding Julia dependencies

Running the executor container

Sampling from `gpt-4o-mini` via `prl_ml`

Setup

Sampling the responses

Extracting the answers

Evaluating the answers

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
executor		executor
jlbench		jlbench
problems		problems
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

nuprl/JuliaLM

Folders and files

Latest commit

History

Repository files navigation

Repo structure

Initializing the jlbench environment

Initializing the local executor environment

Bench scripts

Building the bench problems

Locally executing the problems

Executor container scripts

Building the Docker executor image

Adding Julia dependencies

Running the executor container

Sampling from gpt-4o-mini via prl_ml

Setup

Sampling the responses

Extracting the answers

Evaluating the answers

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Initializing the `jlbench` environment

Sampling from `gpt-4o-mini` via `prl_ml`

Packages