Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
English:
Title: Add financial-mmlu-ko Task for Korean Financial Multiple-Choice Benchmark
Description:
This pull request adds a new task, financial-mmlu-ko, to the lm-evaluation-harness repository. The task evaluates models on a Korean financial multiple-choice dataset where the number of answer candidates varies per question, so it is implemented using the generate_until output type.
Key Details:
process_results
function extracts numerical answers from model outputs and compares them to the gold answers.This task is not derived from an academic paper but serves as a practical benchmark for evaluating financial domain knowledge in Korean language models. The README has been updated with all necessary details according to the standard template.
Please review and merge this PR.
Thank you!
useage:
lm_eval --model hf --model_args pretrained=LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct,trust_remote_code=True --tasks financial_mmlu_ko --num_fewshot=3 --device cuda:7
hf (pretrained=LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 3, batch_size: 1
lm_eval --model hf --model_args pretrained=meta-llama/Llama-3.1-8B-Instruct --tasks financial_mmlu_ko --num_fewshot=3 --device cuda:7
hf (pretrained=meta-llama/Llama-3.1-8B-Instruct), gen_kwargs: (None), limit: None, num_fewshot: 3, batch_size: 1 [152/1783]