Train long-context LLM with "paraphrasing the original text"

Abstract

Many models face challenges in long-context tasks, often showing a "lost in the middle" issue. To tackle this challenge, we introduce a novel approach called "Paraphrasing the Original Text". Through a specialized supervised fine-tuning stage that incorporates paraphrasing information into training samples, we improves the model's retrieval capabilities for long-context scenarios. Our approach is efficient, requiring minimal overhead with fine-tuning needed on just 9k samples with 1 epoch.

💻 Code

Use QLora method to training the model with our dataset: train_with_paraphrasing.py
Merge lora weights to the original model: merge_lora.py

🦙 Trained models

continuously updating...

model	link
llama3-8b-chinese-chat-32k	link
Qwen-14b-chat-yarn-32k	link
Qwen1.5-4b-chat-paraph	link

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
model_train		model_train
LICENSE		LICENSE
README.md		README.md
merge_lora.py		merge_lora.py
train_with_paraphrasing.py		train_with_paraphrasing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Train long-context LLM with "paraphrasing the original text"

Abstract

💻 Code

🦙 Trained models

About

Releases

Packages

Languages

License

yuyijiong/train_with_paraphrasing

Folders and files

Latest commit

History

Repository files navigation

Train long-context LLM with "paraphrasing the original text"

Abstract

💻 Code

🦙 Trained models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages