Add support for sequence labeling #2718
Open
+403
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for sequence labeling tasks: chunking through IOB scheme and tagging. This is something mentioned in #1675, and could be useful for the community.
The main issue is that there is no widespread agreement about how to prompt language models to perform these tasks. In an attempt of standardizing all those, it seems that just wrapping chunks/words with <>-delimited tags is a common choice in the literature (some references at the end). That is how the code of this PR handles sequence labeling. Basically, a dataset should be prepared accordingly, out of lm-evaluation-harness, to contain input texts and in-text annotated outputs, like this:
Then the language model is prompted to write an output text given the input text and few-shot examples to elicit the expected format. From the outputs, the IOB/tagging labels are extracted to run seqeval and get metrics for sequence labeling evaluation (currently just
overall_f1
).I created a guide in
docs/sequence_labeling.md
to illustrate how to prepare the datasets (moving from IOB/tagging annotation format to in-text annotated outputs) and how to create new sequence labeling tasks, all the details are there.References about prompting for sequence labeling:
Shuhe Wang, Xiaofei Sun, Xiaoya Li, Rongbin Ouyang, Fei Wu, Tianwei Zhang, Jiwei Li, & Guoyin Wang. (2023). GPT-NER: Named Entity Recognition via Large Language Models.
Naguib, M., Tannier, X., & Nevéol, A. (2024). Few-shot clinical entity recognition in English, French and Spanish: masked language models outperform generative model prompting. In Findings of the Association for Computational Linguistics: EMNLP 2024 (pp. 6829–6852). Association for Computational Linguistics.
Yan Hu, Qingyu Chen, Jingcheng Du, Xueqing Peng, Vipina Kuttichi Keloth, Xu Zuo, Yujia Zhou, Zehan Li, Xiaoqian Jiang, Zhiyong Lu, Kirk Roberts, & Hua Xu. (2024). Improving Large Language Models for Clinical Named Entity Recognition via Prompt Engineering.
Mingchen Li, & Rui Zhang. (2024). How far is Language Model from 100% Few-shot Named Entity Recognition in Medical Domain.
Yan, F., Yu, P., & Chen, X. (2024). LTNER: Large Language Model Tagging for Named Entity Recognition with Contextualized Entity Marking. In Pattern Recognition: 27th International Conference, ICPR 2024, Kolkata, India, December 1–5, 2024, Proceedings, Part XIX (pp. 399–411). Springer-Verlag.
Laskar, M., Bari, M., Rahman, M., Bhuiyan, M., Joty, S., & Huang, J. (2023). A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets. In Findings of the Association for Computational Linguistics: ACL 2023 (pp. 431–469). Association for Computational Linguistics.
Machado, M., & Ruiz, E. (2024). Evaluating large language models for the tasks of PoS tagging within the Universal Dependency framework. In Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1 (pp. 454–460). Association for Computational Lingustics.
Stussi, E., & Ströbel, P. (2024). Part-of-Speech Tagging of 16th-Century Latin with GPT. In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024) (pp. 196–206). Association for Computational Linguistics.