EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 2.1k
Star 7.9k

Code
Issues 367
Pull requests 108
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

reproduce llama 3 evals

#2557 opened Dec 10, 2024 by baberabb

Open 6

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

367 Open 889 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Batching and generate_until special tokens

#2723 opened Feb 21, 2025 by sjmielke

Get acc_norm for HF models in log_samples

#2722 opened Feb 21, 2025 by Kartik21

Encountering assert len(indices) == len(inputs) error when using Qwen2vl for MMMU evaluation

#2720 opened Feb 21, 2025 by Ben81828

How to preprocess a document with the assistance of a tokenizer from a specific Model

#2717 opened Feb 20, 2025 by p1nksnow

Different models on same tasks gives same results when cache is active bug

Something isn't working.

#2715 opened Feb 19, 2025 by salvatore-cipolla

Importing a local module in a task included with include_path

#2713 opened Feb 19, 2025 by joaormfsilva

[Accuracy gap with official model card due to wrong parsing]

#2707 opened Feb 17, 2025 by Monstertail

Inconsistent Behavior with max_tokens, Post-Processing, and Cache Options

#2702 opened Feb 15, 2025 by ntlm1686

vLLM CUDA OOM for loglikelihood, but not for generate_until asking questions

For asking for clarification / support on library usage.

#2698 opened Feb 14, 2025 by lsjlsj5846

Feature request: allow peft revision separate from base model revision

#2696 opened Feb 13, 2025 by iuliaturc

Support Arabic Dataset

#2693 opened Feb 13, 2025 by ziadwaelai

Strip the input for the three tasks: FDA, SWDE, and SQuAD_completion. validation

For validation of task implementations.

#2690 opened Feb 12, 2025 by Doraemonzzz

Add o3-mini support

#2685 opened Feb 11, 2025 by HelloJocelynLu

Eval support for DeepSeek-R1 like reasoning models

#2682 opened Feb 9, 2025 by Nithanaroy

ValueError: Trying to set a tensor of shape torch.Size([896, 768]) in "weight" (which has shape torch.Size([896, 4864])), this looks incorrect

#2677 opened Feb 7, 2025 by aqe670

add_bos_token causes very unstable results for quantized llama3-70B asking questions

For asking for clarification / support on library usage.

#2676 opened Feb 7, 2025 by wenhuach21

Use AWS Bedrock Models

#2669 opened Feb 3, 2025 by nrcoleman

Support processor_kwargs for hf-multimodal

#2666 opened Jan 30, 2025 by nikg4

Issue running Squadv2 in LM-Evaluation-Harness

#2664 opened Jan 30, 2025 by pythonLoader

maximum sequence length

#2657 opened Jan 27, 2025 by Raghadalr02

List of num_fewshots

#2656 opened Jan 25, 2025 by AMindToThink

Question about humaneval

#2648 opened Jan 22, 2025 by Shiguang-Guo

error with .nemo model. invalid strategy name: strategy=<nemo.collections.nlp.parts.nlp_overrides.NLPDDPStrategy object at 0x7fd7b9f79f60>. It must be either a string or an instance of lightning.pytorch.strategies.Strategy

#2647 opened Jan 21, 2025 by songwang41

Llama-3.1-8B-Instruct performance on minerva_math

#2646 opened Jan 21, 2025 by sky-fly97

add test for main.py

#2639 opened Jan 20, 2025 by baberabb

Previous 1 2 3 4 5 … 14 15 Next

Previous Next

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly