Skip to content

Pull requests: EleutherAI/lm-evaluation-harness

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Capture gen_kwargs from CLI in squad_completion
#2727 opened Feb 23, 2025 by ksurya Loading…
Groundcocoa
#2724 opened Feb 22, 2025 by HarshKohli Loading…
Add cocoteros_es task in spanish_bench
#2721 opened Feb 21, 2025 by sgs97ua Loading…
Add support for sequence labeling
#2718 opened Feb 20, 2025 by jogonba2 Loading…
New healthcare benchmark: careqa
#2714 opened Feb 19, 2025 by PabloAgustin Loading…
Add AIBE task and utilities
#2712 opened Feb 18, 2025 by parimalthakre01 Loading…
Add Task (Financial mmlu ko)
#2699 opened Feb 14, 2025 by choics2623 Loading…
add o3-mini support
#2697 opened Feb 14, 2025 by HelloJocelynLu Loading…
add audio modality (qwen2 audio only)
#2689 opened Feb 12, 2025 by artemorloff Loading…
Add generation variants of some tasks
#2688 opened Feb 11, 2025 by baberabb Loading…
Convert gen tasks to multiple_choice
#2670 opened Feb 4, 2025 by baberabb Draft
[hf-multimodal] pass kwargs to self.processor
#2667 opened Jan 31, 2025 by baberabb Loading…
Add from dataframe
#2655 opened Jan 25, 2025 by AMindToThink Loading…
humaneval instruct
#2650 opened Jan 22, 2025 by baberabb Loading…
Easily evaluate models steered by SAEs
#2641 opened Jan 21, 2025 by AMindToThink Loading…
Include all test files in sdist
#2634 opened Jan 19, 2025 by booxter Loading…
Add loncxt tasks
#2629 opened Jan 17, 2025 by baberabb Draft
Added EU20 task suite
#2620 opened Jan 10, 2025 by KlaudiaTH Loading…
change to single process for bootstrap_stderr
#2593 opened Dec 23, 2024 by zhuyuhua-v Loading…
Added caseHOLD task
#2570 opened Dec 16, 2024 by zolastro Loading…
ProTip! Filter pull requests by the default branch with base:main.