Add o3-mini support #2685

HelloJocelynLu · 2025-02-11T06:57:46Z

Hi developers,

I've encountered an issue while trying to run o3-mini on a benchmark using lm-evaluation-harness. The error states: "Unsupported parameter: 'temperature' is not supported with this model."

The problem occurs because the OpenAIChatCompletion API automatically includes the 'temperature' parameter, which isn't compatible with o3-mini. This can be seen in the code here:

lm-evaluation-harness/lm_eval/models/openai_completions.py

Lines 275 to 291 in a40fe42

    
           temperature = gen_kwargs.pop("temperature", 0) 
        
           stop = handle_stop_sequences(gen_kwargs.pop("until", ["<|endoftext|>"]), eos) 
        
           if not isinstance(stop, (list, tuple)): 
        
               stop = [stop] 
        
           output = { 
        
               "messages": messages, 
        
               "model": self.model, 
        
               "max_completion_tokens": max_tokens, 
        
               "temperature": temperature, 
        
               "stop": stop[:4], 
        
               "seed": seed, 
        
               **gen_kwargs, 
        
           } 
        
           if "o1" in self.model: 
        
               output.pop("stop") 
        
               output["temperature"] = 1 
        
           return output

To reproduce the issue:

lm_eval --model openai-chat-completions --model_args model=o3-mini --tasks mmlu_flan_cot_zeroshot_college_chemistry --limit 2 --output output/openai-4o --apply_chat_template --log_samples # fails
lm_eval --model openai-chat-completions --model_args model=o1-preview --tasks mmlu_flan_cot_zeroshot_college_chemistry --limit 2 --output output/openai-4o --apply_chat_template --log_samples # works

Maybe we need something like:

if "o3" in self.model: 
     output.pop("temperature")

Other discussion: ai-christianson/RA.Aid#70

baberabb · 2025-02-13T11:28:23Z

Hi! Would you be interested in submitting a PR? we can modify the condition here to include all the o models

HelloJocelynLu mentioned this issue Feb 11, 2025

[Example] Add Chemistry Task from Global-MMLU Dataset deepprinciple/lm-evaluation-harness#2

Merged

HelloJocelynLu linked a pull request Feb 14, 2025 that will close this issue

add o3-mini support #2697

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add o3-mini support #2685

Add o3-mini support #2685

HelloJocelynLu commented Feb 11, 2025

baberabb commented Feb 13, 2025

Add o3-mini support #2685

Add o3-mini support #2685

Comments

HelloJocelynLu commented Feb 11, 2025

baberabb commented Feb 13, 2025