Replication Guide

This document is a guide to replicate the experiments for exLong.

Artifacts

Here are the artifacts required to replicate our experiments and results:

Dataset:

Training
- Training dataset: The processed dataset (prompts) to train the exLong models.
Evaluation
- ne2e-test.tar.gz: Evaluate the exLong under the developer-oriented use case. (data are in directory rq1-eval/). The exLong's results are in Table IV and Table V.
- machine-view.tar.gz: Evaluate the exLong under machine-oriented use case.(data are in directory rq2/). The exLong's results are in Table IX and Figure 4.
- netest-diversity.tar.gz: Ablation study on how the different nEBTs affect model's performance. (data are in netest-diversity/). Results are in Table VII.

Model checkpoints:

The trained exLong checkpoints which can be directly used for running inference/evaluation.

exLong-with-name (7B and 13B): exLong model used in Table IV, Table VI and Table VIII.
exLong-no-name (7B): exLong model in Table V.
exLong-with-name w.o. stack trace (7B): exLong no stack trace model used in Table VI.
exLong-with-name w.o. stack trace & guard expr (7B): exLong no stack trace & no guard expr model used in Table VI.
exLong-with-name w.o. stack trace & guard expr & EBT (7B): exLong no stack trace & no guard expr & no EBT model used in Table VI.
exLong-with-name w.o. stack trace & guard expr & EBT (13B): exLong 13B no stack trace & no guard expr & no EBT model used in Table VIII.

Set Up

Environment and dependencies

First make sure you have installed all the required dependencies described in README.md. Note that if you want to train exLong from Code Llama, extra dependencies need to be installed.
Experiments setup Set up the default directory structure and prepare the dataset as described here

Experiments

Train the exLong models (you should use axolotl in order for training)
- with EBT test name in the prompt
Note: 'conditionnestack2e' is the setup name for exLong
```
cd python/
accelerate launch -m axolotl.cli.train configs/axolotl/axolotl-conditionnestack2e-with-name-7b.yaml
```
You will see checkpoints in directory _work/exp/conditionnestack2e-with-name-ft/lora-codellama-7b/
- Training exLong w.o. EBT name
```
cd python/
accelerate launch -m axolotl.cli.train configs/axolotl/axolotl-conditionnestack2e-no-name-7b.yaml
```
You will see checkpoints in directory _work/exp/conditionnestack2e-no-name-ft/lora-codellama-7b/

Run inference and evaluation

2.1 Developer-oriented use case

Run inference on developer-oriented use case (4th row in Table IV, Table V)

cd python/
# Run evaluation on the selected 434 examples in the test set
python -m etestgen.codellama.CodeLLaMA --config_file configs/codellama-7b-conditionnestack2e-with-name-ft.yaml run_gen --split real-test

You will see checkpoints, model outputs in directory _work/exp/conditionnestack2e-with-name-ft/lora-codellama-7b/real-test-set-model-outputs.jsonl

Run evaluation

similarity metrics

python -m etestgen.llm.eval --eval_set test --config_file configs/codellama-7b-conditionnestack2e-with-name-ft.yaml eval_llm_sim

functional correctness metrics

python -m etestgen.llm.eval --eval_set test --config_file configs/codellama-7b-conditionnestack2e-with-name-ft.yaml eval_runtime_metrics

2.2 Machine-oriented Use Case

Run inference on machine-oriented use case (1st row in Table IX)

cd python/
python -m etestgen.codellama.CodeLLaMA --config_file configs/eval-codellama-7b-machine-view-conditionnestack2e-all-no-name.yaml run_gen
# Evaluation1: all covered projects
python -m etestgen.llm.eval --config_file configs/eval-codellama-7b-machine-view-conditionnestack2e-all-no-name.yaml eval_runtime_metrics
# You will see eval results in `results/model-results/conditionnestack2e-all-no-name-ft-lora-codellama-7b-eval-rq2-runtime-metrics.json`
# Evaluation2: intersection projects
python -m etestgen.llm.eval --eval_set rq2 --config_file configs/eval-codellama-7b-machine-view-conditionnestack2e-all-no-name.yaml eval_subset_llm_results --subset_id_file ../results/tool-results/intersect-ids.json
# You will see eval results in `results/model-results/conditionnestack2e-all-no-name-ft-lora-codellama-7b-eval-rq2-intersect-runtime-metrics.json`

You will see model generations in directory _work/exp/conditionnestack2e-all-no-name-ft/lora-codellama-7b/rq2-model-outputs.jsonl

Run evaluation

similarity metrics

python -m etestgen.llm.eval --eval_set test --config_file configs/eval-codellama-7b-machine-view-conditionnestack2e-all-no-name.yaml eval_llm_sim

functional correctness metrics

python -m etestgen.llm.eval --eval_set test --config_file configs/eval-codellama-7b-machine-view-conditionnestack2e-all-no-name.yaml eval_runtime_metrics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REPLICATION.md

REPLICATION.md

Replication Guide

Artifacts

Dataset:

Model checkpoints:

Set Up

Experiments

Files

REPLICATION.md

Latest commit

History

REPLICATION.md

File metadata and controls

Replication Guide

Artifacts

Dataset:

Model checkpoints:

Set Up

Experiments