Add Task (Financial mmlu ko) #2699

choics2623 · 2025-02-14T09:20:36Z

English:

Title: Add financial-mmlu-ko Task for Korean Financial Multiple-Choice Benchmark

Description:
This pull request adds a new task, financial-mmlu-ko, to the lm-evaluation-harness repository. The task evaluates models on a Korean financial multiple-choice dataset where the number of answer candidates varies per question, so it is implemented using the generate_until output type.

Key Details:

Dataset Sources:
- Korean Wikipedia Finance Category
- Bank of Korea Economic Research Reports
- 경제배움e - 퀴즈로 배우는 시사.경제
Dataset Composition:
- 104 manually curated questions
- 315 GPT-4 generated questions (subsequently verified by experts)
Implementation:
- A custom process_results function extracts numerical answers from model outputs and compares them to the gold answers.
- Few-shot examples can be provided as system prompts or context.
Integration:
- Task integration was contributed by choics2623
- The dataset was created by allganize on Hugging Face.

This task is not derived from an academic paper but serves as a practical benchmark for evaluating financial domain knowledge in Korean language models. The README has been updated with all necessary details according to the standard template.

Please review and merge this PR.

Thank you!

useage:
lm_eval --model hf --model_args pretrained=LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct,trust_remote_code=True --tasks financial_mmlu_ko --num_fewshot=3 --device cuda:7
hf (pretrained=LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 3, batch_size: 1

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
financial_mmlu_ko	1	none	3	acc	↑	0.7429	±	0.0205

lm_eval --model hf --model_args pretrained=meta-llama/Llama-3.1-8B-Instruct --tasks financial_mmlu_ko --num_fewshot=3 --device cuda:7
hf (pretrained=meta-llama/Llama-3.1-8B-Instruct), gen_kwargs: (None), limit: None, num_fewshot: 3, batch_size: 1 [152/1783]

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
financial_mmlu_ko	1	none	3	acc	↑	0.6813	±	0.0219

CLAassistant · 2025-02-14T09:20:43Z

All committers have signed the CLA.

choics2623 requested review from baberabb and lintangsutawika as code owners February 14, 2025 09:20

remove cuda device assertion (EleutherAI#2680)

99970a8

choics2623 force-pushed the financial_mmlu_ko branch from a62784e to 885a7ba Compare February 17, 2025 00:59

choics2623 added 4 commits February 20, 2025 00:45

Add financial_mmlu_ko task

9fca922

financial_mmlu_ko task v1.0, 계정이름 바꿈

378fcc1

edited lm_eval/task/README.md 계정 이름 바꿈

8cd6f3e

Apply pre-commit fixes 계정 이름 바꿈

daf0195

choics2623 force-pushed the financial_mmlu_ko branch from 885a7ba to daf0195 Compare February 20, 2025 00:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Task (Financial mmlu ko) #2699

Add Task (Financial mmlu ko) #2699

choics2623 commented Feb 14, 2025

CLAassistant commented Feb 14, 2025 •

edited

Loading

Add Task (Financial mmlu ko) #2699

Are you sure you want to change the base?

Add Task (Financial mmlu ko) #2699

Conversation

choics2623 commented Feb 14, 2025

CLAassistant commented Feb 14, 2025 • edited Loading

CLAassistant commented Feb 14, 2025 •

edited

Loading