Skip to content

Learning to Generate STRUCTURED Output with Schema Reinforcement Learning

License

Notifications You must be signed in to change notification settings

thunlp/SchemaReinforcementLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Schema Reinforcement Learning

DataModelPerformanceTrainingEvaluationPaperCitation

Welcome to the official repo for Schema Reinforcement Learning, containing the dataset, training scripts, and evaluation code in our paper.

What's New

  • [2025/02/27] Our paper is now released on arXiv! Check it here!
  • [2025/02/26] SchemaBench is now released!

Data

SchemaBench is intended solely for research and educational purposes and should not be construed as reflecting the opinions or views of the creators, owners, or contributors of this dataset. Below is the statistics of the schemas used in SchemaBench:



We crawled 40K+ real-world schema files from JSON Schema Store and GitHub, and constructed SchemaBench. Below we present our data cleaning and construction pipeline with common cases.



Data Release

Please download our dataset using the following link: Google Drive or Tsinghua Cloud. Simply copy those data files into the same folders in the repo. The file structure is as follows:

├── /schemabench/
│  └── /data/
│     ├── /custom/                         // Custom Formats
│     ├── /schema/                         // Complex Schema
│     ├── custom_append.jsonl
│     └── translation_test.jsonl           // Escape Translation
├── /train/
│  └── /data/
│     ├── mix_train_no_collected_json.json // SFT - w/o collected json
│     ├── mix_train.json                   // SFT - w/ collected json
│     ├── train_with_tool_ToS.parquet      // SRL - training set
│     └── val_with_tool_ToS.parquet        // SRL - validation set

Please make sure you have downloaded all data files and put them into the right directory.

Model🤗

We release the LLaMA-3.2 3B SRL for anyone who wants to use it.

Performance📈

We evaluate the performance of several models on the SchemaBench. The results are shown below:

Model Schema-only Generation Schema-constrained Reasoning
Complex Custom Escape Overall GSM8K MATH500 MMLU ARC-C
GPT-4o 84.47 61.56 37.14 61.06 97.80 41.40 86.16 97.01
GPT-4o-mini 68.86 46.17 16.89 43.98 86.13 31.80 49.41 77.65
Qwen-2.5 7B 72.42 43.60 11.11 42.38 94.54 38.60 74.43 91.21
MiniCPM-3 4B 53.88 20.29 9.13 27.77 69.22 33.40 66.58 88.31
LLaMA-3.1 8B 64.26 33.07 12.02 36.45 95.91 85.60 71.83 84.98
LLaMA-3.1 8B SFT 74.56 46.64 60.58 60.59 89.46 63.80 66.97 84.56
- w/o Collected JSON 70.84 42.06 60.35 57.75 78.39 46.00 58.87 75.68
LLaMA-3.1 8B SRL 90.48 78.67 69.86 79.67 90.90 88.00 70.74 84.81
LLaMA-3.2 3B 49.84 27.31 8.37 28.51 80.97 35.40 62.38 79.27
LLaMA-3.2 3B SFT 71.71 45.52 52.21 56.48 82.94 44.40 61.50 78.41
- w/o Collected JSON 72.42 42.83 54.82 56.69 78.85 36.20 59.11 75.68
LLaMA-3.2 3B SRL 82.25 66.13 69.10 72.50 84.23 43.20 57.99 78.24

Training

✨Here is an overview of our training pipeline.



Install

Clone this repository and navigate to the SchemaBench folder.

git clone [email protected]:thunlp/SchemaReinforcementLearning.git
cd SchemaReinforcementLearning

Initialize the environment (python==3.11)

bash scripts/init_env.sh

Data Preparation

Download the data files from the Data Release section and put them into the right directory.

Fine-Tuning

bash scripts/train_sft.sh

If you want to use the SFT data without collected JSON, please run the following command:

bash scripts/train_sft_no_collected_json.sh

Schema Reinforcement Learning (SRL)

We use a modified version of PRIME for SRL, which is already included in this repo as a submodule. To train the SRL model, please run the following command:

bash scripts/train_srl.sh

You can find your trained models in the train/results directory by default.

Evaluation

Before evaluating performance on the SchemaBench, you should initialize the config file for local models' inference. We use CodeLinker for inference, which currently support any OpenAI compatible server for the evaluation. To initialize the config file, first:

cp private_example.toml private.toml

Then fill in the private.toml with your api key and base url if needed. After that, you can run the following evaluation script:

bash scripts/test_schemabench.sh

If you need to run the evaluation on a subset of the SchemaBench, you can modify the test_category in test_schemabench.sh script. Currently you can choose from ['all', 'schema', 'reasoning'] and all single sub-tasks.

Citation

If you find our work helpful, please consider citing our paper:

@misc{lu2025learninggeneratestructuredoutput,
      title={Learning to Generate Structured Output with Schema Reinforcement Learning}, 
      author={Yaxi Lu and Haolun Li and Xin Cong and Zhong Zhang and Yesai Wu and Yankai Lin and Zhiyuan Liu and Fangming Liu and Maosong Sun},
      year={2025},
      eprint={2502.18878},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.18878}, 
}

About

Learning to Generate STRUCTURED Output with Schema Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published