Add more datasets #8

sauravpanda · 2025-01-23T06:59:02Z

We need to add the following dataset from Lighteval.

abstract_narrative_understanding: Tests understanding of abstract narratives.

analogical_similarity: Tests analogical reasoning.

arithmetic_bb: Basic arithmetic reasoning.

cause_and_effect: Tests understanding of causal relationships.

chess_state_tracking: Logical reasoning in a structured domain.

common_morpheme: Tests understanding of word structures.

contextual_parametric_knowledge_conflicts: Tests contextual understanding.

dyck_languages: Tests understanding of nested structures.

elementary_math_qa: Elementary math question answering.

formal_fallacies_syllogisms_negation: Logical fallacy detection.

general_knowledge: General knowledge question answering.

geometric_shapes: Understanding of geometric concepts.

logical_deduction: Logical deduction tasks.

mathematical_induction: Mathematical induction reasoning.

metaphor_understanding: Understanding of metaphors.

natural_instructions: Natural language instruction following.

object_counting: Counting objects in a description.

qa_wikidata: Question answering using Wikidata.

reasoning_about_colored_objects: Reasoning about colored objects.

simple_arithmetic_json: Simple arithmetic tasks.

tracking_shuffled_objects: Tracking objects in a shuffled sequence.

unit_conversion: Unit conversion tasks.

Harness Tasks (harness):
bbh:logical_deduction_three_objects : Logical deduction with three objects.

bbh:movie_recommendation : Movie recommendation based on preferences.

bbh:navigate : Navigation tasks based on descriptions.

bbh:ruin_names : Understanding of ruin names.

bbh:salient_translation_error_detection : Detecting salient translation errors.

bbh:snarks : Understanding of snarky statements.

bbh:sports_understanding : Understanding of sports concepts.

bbh:temporal_sequences : Understanding of temporal sequences.

bbh:tracking_shuffled_objects_three_objects : Tracking shuffled objects.

HELM Tasks (helm):
bigbench:auto_debugging : Debugging code based on descriptions.

bigbench:code_line_description : Describing lines of code.

bigbench:conceptual_combinations : Understanding conceptual combinations.

bigbench:conlang_translation : Translating constructed languages.

bigbench:emoji_movie : Identifying movies from emoji descriptions.

bigbench:linguistics_puzzles : Solving linguistic puzzles.

bigbench:logical_deduction-three_objects : Logical deduction with three objects.

bigbench:misconceptions_russian : Identifying misconceptions in Russian.

bigbench:novel_concepts : Understanding novel concepts.

bigbench:symbol_interpretation : Interpreting symbolic representations.

bigbench:vitaminc_fact_verification : Fact verification.

bigbench:winowhy : Understanding why questions.

Leaderboard Tasks (leaderboard):
arc:challenge : AI2 Reasoning Challenge.

gsm8k: General Science questions.

hellaswag: HellaSwag: Can a Machine Tell a Good Story?

mmlu:high_school_mathematics : Mathematics at the high school level.

mmlu:high_school_physics : Physics at the high school level.

mmlu:high_school_biology : Biology at the high school level.

mmlu:high_school_chemistry : Chemistry at the high school level.

mmlu:high_school_computer_science : Computer Science at the high school level.

mmlu:high_school_psychology : Psychology at the high school level.

mmlu:high_school_us_history : US History at the high school level.

mmlu:high_school_world_history : World History at the high school level.

truthfulqa:mc : TruthfulQA with multiple choice answers.

winogrande: Winograd schema tasks.

LightEval Tasks (lighteval):
agieval:aqua-rat : AQUA-RAT: Arithmetic questions.

agieval:gaokao-mathqa : Gaokao math questions.

blimp:adjunct_island : BLiMP syntactic tasks.

blimp:animate_subject_passive : BLiMP syntactic tasks.

blimp:causative : BLiMP syntactic tasks.

blimp:complex_NP_island : BLiMP syntactic tasks.

blimp:determiner_noun_agreement_1 : BLiMP syntactic tasks.

blimp:drop_argument : BLiMP syntactic tasks.

blimp:ellipsis_n_bar_1 : BLiMP syntactic tasks.

blimp:existential_there_object_raising : BLiMP syntactic tasks.

blimp:inchoative : BLiMP syntactic tasks.

blimp:left_branch_island_echo_question : BLiMP syntactic tasks.

blimp:matrix_question_npi_licensor_present : BLiMP syntactic tasks.

blimp:npi_present_1 : BLiMP syntactic tasks.

blimp:passive_1 : BLiMP syntactic tasks.

blimp:principle_A_c_command : BLiMP syntactic tasks.

blimp:regular_plural_subject_verb_agreement_1 : BLiMP syntactic tasks.

blimp:sentential_negation_npi_licensor_present : BLiMP syntactic tasks.

blimp:superlative_quantifiers_1 : BLiMP syntactic tasks.

blimp:tough_vs_raising_1 : BLiMP syntactic tasks.

blimp:wh_island : BLiMP syntactic tasks.

blimp:wh_questions_subject_gap : BLiMP syntactic tasks.

coqa: CoQA: Conversational Question Answering.

gsm8k: General Science questions.

lambada:openai : LAMBADA language modeling task.

math:algebra : Math algebra questions.

math:geometry : Math geometry questions.

math:prealgebra : Math pre-algebra questions.

mathqa: Math question answering.

piqa: P IQ-A: Commonsense reasoning.

super_glue:boolq : SuperGLUE boolean questions.

super_glue:cb : SuperGLUE comprehension boolean questions.

super_glue:copa : SuperGLUE causal reasoning.

super_glue:rte : SuperGLUE recognizing textual entailment.

super_glue:wic : SuperGLUE word in context.

super_glue:wsc : SuperGLUE winograd schema challenges.

truthfulqa:gen : TruthfulQA generative.

Original Tasks (original):

arc:c:simple : ARC-Easy: Simple science questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more datasets #8

Add more datasets #8

sauravpanda commented Jan 23, 2025

Add more datasets #8

Add more datasets #8

Comments

sauravpanda commented Jan 23, 2025