This folder contains implementation of interactive EditSQL parser, which uses EditSQL as a base semantic parser in our MISP framework:
- Please follow 2. General Environment Setup and set up the environment/data;
- For testing interactive EditSQL on the fly (our EMNLP'19 setting), see 3. MISP with EditSQL;
- For learning EditSQL from user interaction (our EMNLP'20 setting), see 4. Learning EditSQL from user interaction (EMNLP'20).
The implementation is adapted from the EditSQL repository. Please cite the following papers if you use the code:
@inproceedings{yao2020imitation,
title={An Imitation Game for Learning Semantic Parsers from User Interaction},
author={Yao, Ziyu and Tang, Yiqi and Yih, Wen-tau and Sun, Huan and Su, Yu},
booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year={2020}
}
@inproceedings{yao2019model,
title={Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study},
author={Yao, Ziyu and Su, Yu and Sun, Huan and Yih, Wen-tau},
booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
pages={5450--5461},
year={2019}
}
@InProceedings{zhang2019editing,
author = "Rui Zhang, Tao Yu, He Yang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, Dragomir Radev",
title = "Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
year = "2019",
address = "Hong Kong, China"
}
-
Please install the Anaconda environment from
gpu-py3.yml
:conda env create -f gpu-py3.yml
-
Download the Glove word embedding from here (use
wget
for command line) and put it asEditSQL/word_emb/glove.840B.300d.txt
. -
Download Pretrained BERT model from here as
EditSQL/model/bert/data/annotated_wikisql_and_PyTorch_bert_param/pytorch_model_uncased_L-12_H-768_A-12.bin
. If using command line:
gdown https://drive.google.com/u/0/uc?id=1f_LEWVgrtZLRuoiExJa5fNzTS8-WcAX9
We have the pre-processed and cleaned Spider data available: data_clean.tar.
Please download and uncompress it via tar -xvf data_clean.tar
as a folder EditSQL/data_clean
.
Note that the training set has been cleaned with its size reduced (see our paper, Appendix B.3 for details).
We explain how to build and test EditSQL under MISP following our EMNLP'19 setting.
To train EditSQL on the full training set, please revise SETTING=''
(empty string) in scripts/editsql/pretrain.sh.
In the main directory, run:
bash scripts/editsql/pretrain.sh
To test EditSQL (trained on the full training set) regularly, in scripts/editsql/test.sh,
please revise SETTING=''
(empty string) to ensure the LOGDIR
loads the desired model checkpoint.
In the main directory, run:
bash scripts/editsql/test.sh
To test EditSQL (trained on the full training set) with human interaction under the MISP framework, in scripts/editsql/test_with_interaction.sh,
revise SETTING='full_train'
to ensure the LOGDIR
loads the desired model checkpoint.
In the main directory, run:
bash scripts/editsql/test_with_interaction.sh
Before interactive learning, we pretrain the EditSQL parser with 10% of the full training set.
Please ensure SETTING='_10p'
in scripts/editsql/pretrain.sh.
Then in the main directory, run:
bash scripts/editsql/pretrain.sh
When the training is finished, please rename and move the best model checkpoint from EditSQL/logs_clean/logs_spider_editsql_10p/pretraining/save_X
to EditSQL/logs_clean/logs_spider_editsql_10p/model_best.pt
.
You can also use our pretrained checkpoint: logs_clean.tar.
Please download and uncompress the content as EditSQL/logs_clean/ogs_spider_editsql_10p/model_best.pt
.
To test the pretrained parser without user interaction, see 3.2 Model testing without interaction.
To test the pretrained parser with simulated user interaction, see 3.3 Model testing with simulated user interaction.
Make sure SETTING=online_pretrain_10p
is set in the scripts.
The training script for each algorithm can be found below. Please run them in the main directory.
Algorithm | Script |
---|---|
MISP_NEIL | scripts/editsql/misp_neil.sh |
Full Expert | scripts/editsql/full_expert.sh |
Self Train | scripts/editsql/self_train_0.5.sh |
MISP_NEIL* | scripts/editsql/misp_neil_perfect.sh |