We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We need to add the following dataset from Lighteval.
abstract_narrative_understanding: Tests understanding of abstract narratives.
analogical_similarity: Tests analogical reasoning.
arithmetic_bb: Basic arithmetic reasoning.
cause_and_effect: Tests understanding of causal relationships.
chess_state_tracking: Logical reasoning in a structured domain.
common_morpheme: Tests understanding of word structures.
contextual_parametric_knowledge_conflicts: Tests contextual understanding.
dyck_languages: Tests understanding of nested structures.
elementary_math_qa: Elementary math question answering.
formal_fallacies_syllogisms_negation: Logical fallacy detection.
general_knowledge: General knowledge question answering.
geometric_shapes: Understanding of geometric concepts.
logical_deduction: Logical deduction tasks.
mathematical_induction: Mathematical induction reasoning.
metaphor_understanding: Understanding of metaphors.
natural_instructions: Natural language instruction following.
object_counting: Counting objects in a description.
qa_wikidata: Question answering using Wikidata.
reasoning_about_colored_objects: Reasoning about colored objects.
simple_arithmetic_json: Simple arithmetic tasks.
tracking_shuffled_objects: Tracking objects in a shuffled sequence.
unit_conversion: Unit conversion tasks.
Harness Tasks (harness): bbh:logical_deduction_three_objects : Logical deduction with three objects.
bbh:movie_recommendation : Movie recommendation based on preferences.
bbh:navigate : Navigation tasks based on descriptions.
bbh:ruin_names : Understanding of ruin names.
bbh:salient_translation_error_detection : Detecting salient translation errors.
bbh:snarks : Understanding of snarky statements.
bbh:sports_understanding : Understanding of sports concepts.
bbh:temporal_sequences : Understanding of temporal sequences.
bbh:tracking_shuffled_objects_three_objects : Tracking shuffled objects.
HELM Tasks (helm): bigbench:auto_debugging : Debugging code based on descriptions.
bigbench:code_line_description : Describing lines of code.
bigbench:conceptual_combinations : Understanding conceptual combinations.
bigbench:conlang_translation : Translating constructed languages.
bigbench:emoji_movie : Identifying movies from emoji descriptions.
bigbench:linguistics_puzzles : Solving linguistic puzzles.
bigbench:logical_deduction-three_objects : Logical deduction with three objects.
bigbench:misconceptions_russian : Identifying misconceptions in Russian.
bigbench:novel_concepts : Understanding novel concepts.
bigbench:symbol_interpretation : Interpreting symbolic representations.
bigbench:vitaminc_fact_verification : Fact verification.
bigbench:winowhy : Understanding why questions.
Leaderboard Tasks (leaderboard): arc:challenge : AI2 Reasoning Challenge.
gsm8k: General Science questions.
hellaswag: HellaSwag: Can a Machine Tell a Good Story?
mmlu:high_school_mathematics : Mathematics at the high school level.
mmlu:high_school_physics : Physics at the high school level.
mmlu:high_school_biology : Biology at the high school level.
mmlu:high_school_chemistry : Chemistry at the high school level.
mmlu:high_school_computer_science : Computer Science at the high school level.
mmlu:high_school_psychology : Psychology at the high school level.
mmlu:high_school_us_history : US History at the high school level.
mmlu:high_school_world_history : World History at the high school level.
truthfulqa:mc : TruthfulQA with multiple choice answers.
winogrande: Winograd schema tasks.
LightEval Tasks (lighteval): agieval:aqua-rat : AQUA-RAT: Arithmetic questions.
agieval:gaokao-mathqa : Gaokao math questions.
blimp:adjunct_island : BLiMP syntactic tasks.
blimp:animate_subject_passive : BLiMP syntactic tasks.
blimp:causative : BLiMP syntactic tasks.
blimp:complex_NP_island : BLiMP syntactic tasks.
blimp:determiner_noun_agreement_1 : BLiMP syntactic tasks.
blimp:drop_argument : BLiMP syntactic tasks.
blimp:ellipsis_n_bar_1 : BLiMP syntactic tasks.
blimp:existential_there_object_raising : BLiMP syntactic tasks.
blimp:inchoative : BLiMP syntactic tasks.
blimp:left_branch_island_echo_question : BLiMP syntactic tasks.
blimp:matrix_question_npi_licensor_present : BLiMP syntactic tasks.
blimp:npi_present_1 : BLiMP syntactic tasks.
blimp:passive_1 : BLiMP syntactic tasks.
blimp:principle_A_c_command : BLiMP syntactic tasks.
blimp:regular_plural_subject_verb_agreement_1 : BLiMP syntactic tasks.
blimp:sentential_negation_npi_licensor_present : BLiMP syntactic tasks.
blimp:superlative_quantifiers_1 : BLiMP syntactic tasks.
blimp:tough_vs_raising_1 : BLiMP syntactic tasks.
blimp:wh_island : BLiMP syntactic tasks.
blimp:wh_questions_subject_gap : BLiMP syntactic tasks.
coqa: CoQA: Conversational Question Answering.
lambada:openai : LAMBADA language modeling task.
math:algebra : Math algebra questions.
math:geometry : Math geometry questions.
math:prealgebra : Math pre-algebra questions.
mathqa: Math question answering.
piqa: P IQ-A: Commonsense reasoning.
super_glue:boolq : SuperGLUE boolean questions.
super_glue:cb : SuperGLUE comprehension boolean questions.
super_glue:copa : SuperGLUE causal reasoning.
super_glue:rte : SuperGLUE recognizing textual entailment.
super_glue:wic : SuperGLUE word in context.
super_glue:wsc : SuperGLUE winograd schema challenges.
truthfulqa:gen : TruthfulQA generative.
Original Tasks (original):
arc:c:simple : ARC-Easy: Simple science questions.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
We need to add the following dataset from Lighteval.
abstract_narrative_understanding: Tests understanding of abstract narratives.
analogical_similarity: Tests analogical reasoning.
arithmetic_bb: Basic arithmetic reasoning.
cause_and_effect: Tests understanding of causal relationships.
chess_state_tracking: Logical reasoning in a structured domain.
common_morpheme: Tests understanding of word structures.
contextual_parametric_knowledge_conflicts: Tests contextual understanding.
dyck_languages: Tests understanding of nested structures.
elementary_math_qa: Elementary math question answering.
formal_fallacies_syllogisms_negation: Logical fallacy detection.
general_knowledge: General knowledge question answering.
geometric_shapes: Understanding of geometric concepts.
logical_deduction: Logical deduction tasks.
mathematical_induction: Mathematical induction reasoning.
metaphor_understanding: Understanding of metaphors.
natural_instructions: Natural language instruction following.
object_counting: Counting objects in a description.
qa_wikidata: Question answering using Wikidata.
reasoning_about_colored_objects: Reasoning about colored objects.
simple_arithmetic_json: Simple arithmetic tasks.
tracking_shuffled_objects: Tracking objects in a shuffled sequence.
unit_conversion: Unit conversion tasks.
Harness Tasks (harness):
bbh:logical_deduction_three_objects : Logical deduction with three objects.
bbh:movie_recommendation : Movie recommendation based on preferences.
bbh:navigate : Navigation tasks based on descriptions.
bbh:ruin_names : Understanding of ruin names.
bbh:salient_translation_error_detection : Detecting salient translation errors.
bbh:snarks : Understanding of snarky statements.
bbh:sports_understanding : Understanding of sports concepts.
bbh:temporal_sequences : Understanding of temporal sequences.
bbh:tracking_shuffled_objects_three_objects : Tracking shuffled objects.
HELM Tasks (helm):
bigbench:auto_debugging : Debugging code based on descriptions.
bigbench:code_line_description : Describing lines of code.
bigbench:conceptual_combinations : Understanding conceptual combinations.
bigbench:conlang_translation : Translating constructed languages.
bigbench:emoji_movie : Identifying movies from emoji descriptions.
bigbench:linguistics_puzzles : Solving linguistic puzzles.
bigbench:logical_deduction-three_objects : Logical deduction with three objects.
bigbench:misconceptions_russian : Identifying misconceptions in Russian.
bigbench:novel_concepts : Understanding novel concepts.
bigbench:symbol_interpretation : Interpreting symbolic representations.
bigbench:vitaminc_fact_verification : Fact verification.
bigbench:winowhy : Understanding why questions.
Leaderboard Tasks (leaderboard):
arc:challenge : AI2 Reasoning Challenge.
gsm8k: General Science questions.
hellaswag: HellaSwag: Can a Machine Tell a Good Story?
mmlu:high_school_mathematics : Mathematics at the high school level.
mmlu:high_school_physics : Physics at the high school level.
mmlu:high_school_biology : Biology at the high school level.
mmlu:high_school_chemistry : Chemistry at the high school level.
mmlu:high_school_computer_science : Computer Science at the high school level.
mmlu:high_school_psychology : Psychology at the high school level.
mmlu:high_school_us_history : US History at the high school level.
mmlu:high_school_world_history : World History at the high school level.
truthfulqa:mc : TruthfulQA with multiple choice answers.
winogrande: Winograd schema tasks.
LightEval Tasks (lighteval):
agieval:aqua-rat : AQUA-RAT: Arithmetic questions.
agieval:gaokao-mathqa : Gaokao math questions.
blimp:adjunct_island : BLiMP syntactic tasks.
blimp:animate_subject_passive : BLiMP syntactic tasks.
blimp:causative : BLiMP syntactic tasks.
blimp:complex_NP_island : BLiMP syntactic tasks.
blimp:determiner_noun_agreement_1 : BLiMP syntactic tasks.
blimp:drop_argument : BLiMP syntactic tasks.
blimp:ellipsis_n_bar_1 : BLiMP syntactic tasks.
blimp:existential_there_object_raising : BLiMP syntactic tasks.
blimp:inchoative : BLiMP syntactic tasks.
blimp:left_branch_island_echo_question : BLiMP syntactic tasks.
blimp:matrix_question_npi_licensor_present : BLiMP syntactic tasks.
blimp:npi_present_1 : BLiMP syntactic tasks.
blimp:passive_1 : BLiMP syntactic tasks.
blimp:principle_A_c_command : BLiMP syntactic tasks.
blimp:regular_plural_subject_verb_agreement_1 : BLiMP syntactic tasks.
blimp:sentential_negation_npi_licensor_present : BLiMP syntactic tasks.
blimp:superlative_quantifiers_1 : BLiMP syntactic tasks.
blimp:tough_vs_raising_1 : BLiMP syntactic tasks.
blimp:wh_island : BLiMP syntactic tasks.
blimp:wh_questions_subject_gap : BLiMP syntactic tasks.
coqa: CoQA: Conversational Question Answering.
gsm8k: General Science questions.
lambada:openai : LAMBADA language modeling task.
math:algebra : Math algebra questions.
math:geometry : Math geometry questions.
math:prealgebra : Math pre-algebra questions.
mathqa: Math question answering.
piqa: P IQ-A: Commonsense reasoning.
super_glue:boolq : SuperGLUE boolean questions.
super_glue:cb : SuperGLUE comprehension boolean questions.
super_glue:copa : SuperGLUE causal reasoning.
super_glue:rte : SuperGLUE recognizing textual entailment.
super_glue:wic : SuperGLUE word in context.
super_glue:wsc : SuperGLUE winograd schema challenges.
truthfulqa:gen : TruthfulQA generative.
Original Tasks (original):
arc:c:simple : ARC-Easy: Simple science questions.
The text was updated successfully, but these errors were encountered: