Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more datasets #8

Open
sauravpanda opened this issue Jan 23, 2025 · 0 comments
Open

Add more datasets #8

sauravpanda opened this issue Jan 23, 2025 · 0 comments

Comments

@sauravpanda
Copy link
Member

We need to add the following dataset from Lighteval.

abstract_narrative_understanding: Tests understanding of abstract narratives.

analogical_similarity: Tests analogical reasoning.

arithmetic_bb: Basic arithmetic reasoning.

cause_and_effect: Tests understanding of causal relationships.

chess_state_tracking: Logical reasoning in a structured domain.

common_morpheme: Tests understanding of word structures.

contextual_parametric_knowledge_conflicts: Tests contextual understanding.

dyck_languages: Tests understanding of nested structures.

elementary_math_qa: Elementary math question answering.

formal_fallacies_syllogisms_negation: Logical fallacy detection.

general_knowledge: General knowledge question answering.

geometric_shapes: Understanding of geometric concepts.

logical_deduction: Logical deduction tasks.

mathematical_induction: Mathematical induction reasoning.

metaphor_understanding: Understanding of metaphors.

natural_instructions: Natural language instruction following.

object_counting: Counting objects in a description.

qa_wikidata: Question answering using Wikidata.

reasoning_about_colored_objects: Reasoning about colored objects.

simple_arithmetic_json: Simple arithmetic tasks.

tracking_shuffled_objects: Tracking objects in a shuffled sequence.

unit_conversion: Unit conversion tasks.

Harness Tasks (harness):
bbh:logical_deduction_three_objects : Logical deduction with three objects.

bbh:movie_recommendation : Movie recommendation based on preferences.

bbh:navigate : Navigation tasks based on descriptions.

bbh:ruin_names : Understanding of ruin names.

bbh:salient_translation_error_detection : Detecting salient translation errors.

bbh:snarks : Understanding of snarky statements.

bbh:sports_understanding : Understanding of sports concepts.

bbh:temporal_sequences : Understanding of temporal sequences.

bbh:tracking_shuffled_objects_three_objects : Tracking shuffled objects.

HELM Tasks (helm):
bigbench:auto_debugging : Debugging code based on descriptions.

bigbench:code_line_description : Describing lines of code.

bigbench:conceptual_combinations : Understanding conceptual combinations.

bigbench:conlang_translation : Translating constructed languages.

bigbench:emoji_movie : Identifying movies from emoji descriptions.

bigbench:linguistics_puzzles : Solving linguistic puzzles.

bigbench:logical_deduction-three_objects : Logical deduction with three objects.

bigbench:misconceptions_russian : Identifying misconceptions in Russian.

bigbench:novel_concepts : Understanding novel concepts.

bigbench:symbol_interpretation : Interpreting symbolic representations.

bigbench:vitaminc_fact_verification : Fact verification.

bigbench:winowhy : Understanding why questions.

Leaderboard Tasks (leaderboard):
arc:challenge : AI2 Reasoning Challenge.

gsm8k: General Science questions.

hellaswag: HellaSwag: Can a Machine Tell a Good Story?

mmlu:high_school_mathematics : Mathematics at the high school level.

mmlu:high_school_physics : Physics at the high school level.

mmlu:high_school_biology : Biology at the high school level.

mmlu:high_school_chemistry : Chemistry at the high school level.

mmlu:high_school_computer_science : Computer Science at the high school level.

mmlu:high_school_psychology : Psychology at the high school level.

mmlu:high_school_us_history : US History at the high school level.

mmlu:high_school_world_history : World History at the high school level.

truthfulqa:mc : TruthfulQA with multiple choice answers.

winogrande: Winograd schema tasks.

LightEval Tasks (lighteval):
agieval:aqua-rat : AQUA-RAT: Arithmetic questions.

agieval:gaokao-mathqa : Gaokao math questions.

blimp:adjunct_island : BLiMP syntactic tasks.

blimp:animate_subject_passive : BLiMP syntactic tasks.

blimp:causative : BLiMP syntactic tasks.

blimp:complex_NP_island : BLiMP syntactic tasks.

blimp:determiner_noun_agreement_1 : BLiMP syntactic tasks.

blimp:drop_argument : BLiMP syntactic tasks.

blimp:ellipsis_n_bar_1 : BLiMP syntactic tasks.

blimp:existential_there_object_raising : BLiMP syntactic tasks.

blimp:inchoative : BLiMP syntactic tasks.

blimp:left_branch_island_echo_question : BLiMP syntactic tasks.

blimp:matrix_question_npi_licensor_present : BLiMP syntactic tasks.

blimp:npi_present_1 : BLiMP syntactic tasks.

blimp:passive_1 : BLiMP syntactic tasks.

blimp:principle_A_c_command : BLiMP syntactic tasks.

blimp:regular_plural_subject_verb_agreement_1 : BLiMP syntactic tasks.

blimp:sentential_negation_npi_licensor_present : BLiMP syntactic tasks.

blimp:superlative_quantifiers_1 : BLiMP syntactic tasks.

blimp:tough_vs_raising_1 : BLiMP syntactic tasks.

blimp:wh_island : BLiMP syntactic tasks.

blimp:wh_questions_subject_gap : BLiMP syntactic tasks.

coqa: CoQA: Conversational Question Answering.

gsm8k: General Science questions.

lambada:openai : LAMBADA language modeling task.

math:algebra : Math algebra questions.

math:geometry : Math geometry questions.

math:prealgebra : Math pre-algebra questions.

mathqa: Math question answering.

piqa: P IQ-A: Commonsense reasoning.

super_glue:boolq : SuperGLUE boolean questions.

super_glue:cb : SuperGLUE comprehension boolean questions.

super_glue:copa : SuperGLUE causal reasoning.

super_glue:rte : SuperGLUE recognizing textual entailment.

super_glue:wic : SuperGLUE word in context.

super_glue:wsc : SuperGLUE winograd schema challenges.

truthfulqa:gen : TruthfulQA generative.

Original Tasks (original):

arc:c:simple : ARC-Easy: Simple science questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant