Skip to content

Commit

Permalink
Leaderboard Update, adding new model firefunction-v2-FC (#476)
Browse files Browse the repository at this point in the history
This PR adds the new model `firefunction-v2-FC` to the leaderboard,
thanks to the support from @pgarbacki in #470.
This PR only adds an entry in the leaderboard; the score for all other
models on the leaderboard will remain unchanged.
  • Loading branch information
HuanzhiMao authored Jun 22, 2024
1 parent c32c7a0 commit 633bd5c
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 37 deletions.
69 changes: 35 additions & 34 deletions data.csv
Original file line number Diff line number Diff line change
Expand Up @@ -11,37 +11,38 @@ Rank,Overall Acc,Model,Model Link,Organization,License,AST Summary,Exec Summary,
10,83.88%,Meta-Llama-3-70B-Instruct (Prompt),https://llama.meta.com/llama3,Meta,Meta Llama 3 Community,87.74%,85.32%,81.45%,91.00%,50.00%,68.00%,93.00%,91.50%,85.00%,91.76%,95.00%,87.14%,88.00%,84.00%,77.50%,69.17%,1.1,0.18,N/A,N/A
11,82.94%,GPT-4o-2024-05-13 (FC),https://openai.com/index/hello-gpt-4o/,OpenAI,Proprietary,85.23%,80.37%,78.91%,88.25%,56.00%,50.00%,90.00%,87.50%,84.50%,86.47%,94.00%,75.71%,78.00%,82.00%,75.00%,81.25%,2.33,2.09,2.52,6.93
12,82.88%,GPT-4-turbo-2024-04-09 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,85.56%,78.61%,74.73%,91.25%,33.00%,26.00%,90.00%,89.50%,88.00%,82.94%,93.00%,68.57%,88.00%,76.00%,67.50%,88.75%,4.78,5.48,6.37,18.51
13,81.82%,Claude-3-Sonnet-20240229 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,87.40%,86.76%,83.09%,92.00%,53.00%,72.00%,89.00%,88.00%,89.50%,93.53%,96.00%,90.00%,92.00%,84.00%,77.50%,51.25%,2.13,1.95,1.16,3.08
14,81.35%,Mistral-Medium-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,83.76%,73.47%,80.55%,90.25%,54.00%,56.00%,92.00%,84.00%,78.50%,65.88%,96.00%,22.86%,76.00%,82.00%,70.00%,88.33%,1.76,2.75,2.15,6.38
15,80.53%,GPT-4o-2024-05-13 (Prompt),https://openai.com/index/hello-gpt-4o/,OpenAI,Proprietary,77.15%,77.62%,85.09%,90.75%,68.00%,74.00%,84.00%,78.50%,61.00%,90.00%,95.00%,82.86%,78.00%,70.00%,72.50%,82.50%,2.67,1.15,0.78,2.67
16,80.47%,Functionary-Medium-v2.4 (FC),https://huggingface.co/meetkai/functionary-medium-v2.4,MeetKai,MIT,85.61%,75.71%,79.45%,88.00%,55.00%,60.00%,90.50%,87.50%,85.00%,68.82%,85.00%,45.71%,84.00%,80.00%,70.00%,74.17%,N/A,2.49,2.69,7.45
17,80.35%,Gemini-1.5-Flash-Preview-0514 (FC),https://deepmind.google/technologies/gemini/flash/,Google,Proprietary,81.48%,74.57%,80.91%,91.00%,58.00%,46.00%,93.50%,78.00%,73.50%,81.76%,94.00%,64.29%,90.00%,54.00%,72.50%,79.58%,0.07,1.0,0.49,1.54
18,80.35%,Command-R-Plus (Prompt) (Optimized),https://txt.cohere.com/command-r-plus-microsoft-azure,Cohere For AI,cc-by-nc-4.0,83.60%,86.74%,82.91%,89.75%,64.00%,66.00%,88.50%,81.00%,82.00%,92.94%,97.00%,87.14%,90.00%,84.00%,80.00%,54.17%,1.9,1.27,0.93,3.24
19,80.29%,Command-R-Plus (Prompt) (Original),https://txt.cohere.com/command-r-plus-microsoft-azure,Cohere For AI,cc-by-nc-4.0,83.89%,86.24%,82.55%,89.75%,63.00%,64.00%,90.00%,80.00%,83.00%,92.94%,98.00%,85.71%,88.00%,84.00%,80.00%,53.75%,1.9,1.32,0.94,3.25
20,79.94%,Functionary-Small-v2.4 (FC),https://huggingface.co/meetkai/functionary-small-v2.4,MeetKai,MIT,83.55%,76.31%,82.18%,91.75%,56.00%,58.00%,88.50%,82.00%,81.50%,78.24%,96.00%,52.86%,82.00%,80.00%,65.00%,67.92%,N/A,2.43,2.55,7.18
21,79.76%,Command-R-Plus (FC) (Optimized),https://txt.cohere.com/command-r-plus-microsoft-azure,Cohere For AI,cc-by-nc-4.0,85.15%,77.17%,79.09%,89.75%,46.00%,60.00%,91.00%,88.00%,82.50%,81.18%,95.00%,61.43%,86.00%,74.00%,67.50%,63.75%,1.12,1.9,1.34,4.0
22,77.47%,Claude-3-Opus-20240229 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,73.06%,71.27%,82.73%,89.50%,61.00%,72.00%,91.50%,58.00%,60.00%,90.59%,97.00%,81.43%,94.00%,38.00%,62.50%,82.50%,30.87,12.92,3.95,20.48
23,76.47%,Claude-instant-1.2 (Prompt),https://www.anthropic.com/news/releasing-claude-instant-1-2,Anthropic,Proprietary,78.95%,77.93%,79.82%,87.00%,56.00%,70.00%,85.50%,83.00%,67.50%,84.71%,94.00%,71.43%,80.00%,82.00%,65.00%,57.50%,0.45,1.21,0.69,2.22
24,76.47%,Claude-3.5-Sonnet-20240620 (FC),https://www.anthropic.com/news/claude-3-5-sonnet,Anthropic,Proprietary,72.69%,59.51%,85.27%,91.50%,67.00%,72.00%,92.00%,59.00%,54.50%,97.06%,98.00%,95.71%,88.00%,18.00%,35.00%,78.33%,4.73,3.59,2.37,7.98
25,74.29%,Claude-3-Haiku-20240307 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,79.10%,70.49%,84.91%,93.50%,55.00%,76.00%,91.50%,84.50%,55.50%,92.94%,100.00%,82.86%,94.00%,70.00%,25.00%,34.58%,0.18,1.0,0.49,1.72
26,71.41%,Claude-2.1 (Prompt),https://www.anthropic.com/news/claude-2-1,Anthropic,Proprietary,66.05%,62.17%,80.18%,88.75%,54.00%,64.00%,76.00%,55.50%,52.50%,71.18%,90.00%,44.29%,84.00%,46.00%,47.50%,83.33%,4.81,3.27,2.13,7.38
27,70.94%,Command-R-Plus (FC) (Original),https://txt.cohere.com/command-r-plus-microsoft-azure,Cohere For AI,cc-by-nc-4.0,80.85%,73.19%,74.91%,84.50%,45.00%,58.00%,90.00%,82.00%,76.50%,81.76%,92.00%,67.14%,88.00%,68.00%,55.00%,24.17%,1.09,1.9,0.99,3.99
28,68.76%,Mistral-large-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,64.73%,60.01%,66.91%,89.50%,5.00%,10.00%,94.50%,25.50%,72.00%,83.53%,99.00%,61.43%,96.00%,8.00%,52.50%,84.17%,2.47,3.02,2.94,8.85
29,67.29%,Nexusflow-Raven-v2 (FC),https://huggingface.co/Nexusflow/NexusRaven-V2-13B,Nexusflow,Apache 2.0,65.19%,73.74%,75.27%,80.75%,56.00%,70.00%,86.00%,41.50%,58.00%,66.47%,95.00%,25.71%,92.00%,74.00%,62.50%,57.50%,N/A,2.17,1.45,5.09
30,67.00%,Gemini-1.0-Pro-001 (FC),https://deepmind.google/technologies/gemini/#introduction,Google,Proprietary,56.77%,56.74%,79.09%,93.00%,42.00%,42.00%,92.50%,30.00%,25.50%,86.47%,89.00%,82.86%,84.00%,44.00%,12.50%,80.00%,0.13,1.27,1.0,3.39
31,65.88%,DBRX-Instruct (Prompt),https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm,Databricks,Databricks Open Model,66.62%,74.92%,64.00%,75.75%,30.00%,38.00%,71.50%,72.00%,59.00%,71.18%,80.00%,58.57%,86.00%,80.00%,62.50%,55.83%,1.25,0.64,0.41,1.34
32,65.18%,Snowflake/snowflake-arctic-instruct (Prompt),https://huggingface.co/Snowflake/snowflake-arctic-instruct,Snowflake,apache-2.0,61.09%,80.04%,62.36%,67.50%,42.00%,62.00%,69.00%,59.00%,54.00%,87.65%,91.00%,82.86%,86.00%,74.00%,72.50%,59.58%,N/A,0.98,0.56,2.13
33,64.35%,Mistral-large-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,71.49%,64.93%,81.45%,89.50%,62.00%,56.00%,93.50%,31.50%,79.50%,94.71%,95.00%,94.29%,92.00%,8.00%,65.00%,0.00%,1.97,2.07,1.33,4.97
34,63.88%,GPT-3.5-Turbo-0125 (FC),https://platform.openai.com/docs/models/gpt-3-5-turbo,OpenAI,Proprietary,74.74%,81.38%,61.45%,63.50%,53.00%,62.00%,66.00%,90.50%,81.00%,93.53%,95.00%,91.43%,80.00%,82.00%,70.00%,2.08%,0.19,1.27,0.74,2.47
35,60.41%,Mistral-small-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,65.32%,52.62%,81.27%,90.50%,56.00%,58.00%,96.00%,39.00%,45.00%,96.47%,100.00%,91.43%,92.00%,12.00%,10.00%,0.00%,0.48,1.14,0.81,2.52
36,60.12%,Meta-Llama-3-8B-Instruct (Prompt),https://llama.meta.com/llama3,Meta,Meta Llama 3 Community,62.65%,69.95%,55.09%,58.50%,45.00%,48.00%,73.50%,58.00%,64.00%,75.29%,79.00%,70.00%,74.00%,68.00%,62.50%,43.33%,0.24,0.04,N/A,N/A
37,59.71%,Hermes-2-Pro-Mistral-7B (FC),https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,NousResearch,apache-2.0,70.25%,55.62%,72.00%,81.75%,42.00%,54.00%,80.50%,67.00%,61.50%,56.47%,78.00%,25.71%,70.00%,56.00%,40.00%,10.83%,0.49,0.08,N/A,N/A
38,59.24%,Claude-3-Sonnet-20240229 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,44.18%,43.32%,76.73%,86.00%,49.00%,58.00%,88.00%,6.00%,6.00%,85.29%,96.00%,70.00%,88.00%,0.00%,0.00%,81.67%,3.44,3.25,1.46,6.85
39,53.82%,Claude-3-Haiku-20240307 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,45.05%,46.79%,86.18%,95.50%,60.00%,64.00%,93.50%,0.50%,0.00%,91.18%,96.00%,84.29%,94.00%,2.00%,0.00%,20.83%,0.29,1.49,0.61,2.4
40,53.65%,FireFunction-v1 (FC),https://huggingface.co/fireworks-ai/firefunction-v1,Fireworks,Apache 2.0,40.75%,39.79%,70.00%,90.25%,13.00%,22.00%,93.00%,0.00%,0.00%,71.18%,95.00%,37.14%,88.00%,0.00%,0.00%,73.33%,N/A,1.69,1.53,4.61
41,53.59%,GPT-4-0613 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,39.20%,38.53%,63.82%,86.50%,4.00%,2.00%,93.00%,0.00%,0.00%,64.12%,95.00%,20.00%,90.00%,0.00%,0.00%,91.67%,10.37,3.49,3.27,10.88
42,52.65%,Mistral-tiny-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,49.66%,36.16%,55.64%,70.00%,26.00%,0.00%,56.50%,47.50%,39.00%,27.65%,46.00%,1.43%,20.00%,62.00%,35.00%,83.75%,0.13,1.45,1.41,4.39
43,43.71%,Gemma-7b-it (Prompt),https://blog.google/technology/developers/gemma-open-models/,Google,gemma-terms-of-use,41.05%,31.75%,42.18%,47.75%,29.00%,24.00%,48.00%,30.00%,44.00%,30.00%,44.00%,10.00%,32.00%,40.00%,25.00%,70.83%,0.37,0.06,N/A,N/A
44,40.76%,Mistral-Small-2402 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,40.33%,38.03%,5.82%,6.00%,6.00%,4.00%,8.00%,79.00%,68.50%,34.12%,6.00%,74.29%,20.00%,68.00%,30.00%,98.33%,0.64,1.11,0.95,3.03
45,40.41%,Deepseek-v1.5 (Prompt),https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5,Deepseek,Deepseek License,38.44%,30.89%,39.27%,50.00%,4.00%,24.00%,49.00%,37.00%,28.50%,37.06%,38.00%,35.71%,38.00%,36.00%,12.50%,57.08%,3.24,0.53,N/A,N/A
46,23.71%,Mistral-small-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,2.62%,34.37%,2.00%,2.75%,0.00%,0.00%,2.50%,3.00%,3.00%,56.47%,79.00%,24.29%,70.00%,6.00%,5.00%,99.58%,0.97,3.06,1.8,6.23
13,81.88%,FireFunction-v2 (FC),https://huggingface.co/fireworks-ai/firefunction-v2,Fireworks,Apache 2.0,86.44%,80.26%,85.27%,94.50%,59.00%,64.00%,91.00%,89.50%,80.00%,93.53%,94.00%,92.86%,88.00%,72.00%,67.50%,56.67%,N/A,1.04,0.81,1.97
14,81.82%,Claude-3-Sonnet-20240229 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,87.40%,86.76%,83.09%,92.00%,53.00%,72.00%,89.00%,88.00%,89.50%,93.53%,96.00%,90.00%,92.00%,84.00%,77.50%,51.25%,2.13,1.95,1.16,3.08
15,81.35%,Mistral-Medium-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,83.76%,73.47%,80.55%,90.25%,54.00%,56.00%,92.00%,84.00%,78.50%,65.88%,96.00%,22.86%,76.00%,82.00%,70.00%,88.33%,1.76,2.75,2.15,6.38
16,80.53%,GPT-4o-2024-05-13 (Prompt),https://openai.com/index/hello-gpt-4o/,OpenAI,Proprietary,77.15%,77.62%,85.09%,90.75%,68.00%,74.00%,84.00%,78.50%,61.00%,90.00%,95.00%,82.86%,78.00%,70.00%,72.50%,82.50%,2.67,1.15,0.78,2.67
17,80.47%,Functionary-Medium-v2.4 (FC),https://huggingface.co/meetkai/functionary-medium-v2.4,MeetKai,MIT,85.61%,75.71%,79.45%,88.00%,55.00%,60.00%,90.50%,87.50%,85.00%,68.82%,85.00%,45.71%,84.00%,80.00%,70.00%,74.17%,N/A,2.49,2.69,7.45
18,80.35%,Gemini-1.5-Flash-Preview-0514 (FC),https://deepmind.google/technologies/gemini/flash/,Google,Proprietary,81.48%,74.57%,80.91%,91.00%,58.00%,46.00%,93.50%,78.00%,73.50%,81.76%,94.00%,64.29%,90.00%,54.00%,72.50%,79.58%,0.07,1.0,0.49,1.54
19,80.35%,Command-R-Plus (Prompt) (Optimized),https://txt.cohere.com/command-r-plus-microsoft-azure,Cohere For AI,cc-by-nc-4.0,83.60%,86.74%,82.91%,89.75%,64.00%,66.00%,88.50%,81.00%,82.00%,92.94%,97.00%,87.14%,90.00%,84.00%,80.00%,54.17%,1.9,1.27,0.93,3.24
20,80.29%,Command-R-Plus (Prompt) (Original),https://txt.cohere.com/command-r-plus-microsoft-azure,Cohere For AI,cc-by-nc-4.0,83.89%,86.24%,82.55%,89.75%,63.00%,64.00%,90.00%,80.00%,83.00%,92.94%,98.00%,85.71%,88.00%,84.00%,80.00%,53.75%,1.9,1.32,0.94,3.25
21,79.94%,Functionary-Small-v2.4 (FC),https://huggingface.co/meetkai/functionary-small-v2.4,MeetKai,MIT,83.55%,76.31%,82.18%,91.75%,56.00%,58.00%,88.50%,82.00%,81.50%,78.24%,96.00%,52.86%,82.00%,80.00%,65.00%,67.92%,N/A,2.43,2.55,7.18
22,79.76%,Command-R-Plus (FC) (Optimized),https://txt.cohere.com/command-r-plus-microsoft-azure,Cohere For AI,cc-by-nc-4.0,85.15%,77.17%,79.09%,89.75%,46.00%,60.00%,91.00%,88.00%,82.50%,81.18%,95.00%,61.43%,86.00%,74.00%,67.50%,63.75%,1.12,1.9,1.34,4.0
23,77.47%,Claude-3-Opus-20240229 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,73.06%,71.27%,82.73%,89.50%,61.00%,72.00%,91.50%,58.00%,60.00%,90.59%,97.00%,81.43%,94.00%,38.00%,62.50%,82.50%,30.87,12.92,3.95,20.48
24,76.47%,Claude-instant-1.2 (Prompt),https://www.anthropic.com/news/releasing-claude-instant-1-2,Anthropic,Proprietary,78.95%,77.93%,79.82%,87.00%,56.00%,70.00%,85.50%,83.00%,67.50%,84.71%,94.00%,71.43%,80.00%,82.00%,65.00%,57.50%,0.45,1.21,0.69,2.22
25,76.47%,Claude-3.5-Sonnet-20240620 (FC),https://www.anthropic.com/news/claude-3-5-sonnet,Anthropic,Proprietary,72.69%,59.51%,85.27%,91.50%,67.00%,72.00%,92.00%,59.00%,54.50%,97.06%,98.00%,95.71%,88.00%,18.00%,35.00%,78.33%,4.73,3.59,2.37,7.98
26,74.29%,Claude-3-Haiku-20240307 (Prompt),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,79.10%,70.49%,84.91%,93.50%,55.00%,76.00%,91.50%,84.50%,55.50%,92.94%,100.00%,82.86%,94.00%,70.00%,25.00%,34.58%,0.18,1.0,0.49,1.72
27,71.41%,Claude-2.1 (Prompt),https://www.anthropic.com/news/claude-2-1,Anthropic,Proprietary,66.05%,62.17%,80.18%,88.75%,54.00%,64.00%,76.00%,55.50%,52.50%,71.18%,90.00%,44.29%,84.00%,46.00%,47.50%,83.33%,4.81,3.27,2.13,7.38
28,70.94%,Command-R-Plus (FC) (Original),https://txt.cohere.com/command-r-plus-microsoft-azure,Cohere For AI,cc-by-nc-4.0,80.85%,73.19%,74.91%,84.50%,45.00%,58.00%,90.00%,82.00%,76.50%,81.76%,92.00%,67.14%,88.00%,68.00%,55.00%,24.17%,1.09,1.9,0.99,3.99
29,68.76%,Mistral-large-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,64.73%,60.01%,66.91%,89.50%,5.00%,10.00%,94.50%,25.50%,72.00%,83.53%,99.00%,61.43%,96.00%,8.00%,52.50%,84.17%,2.47,3.02,2.94,8.85
30,67.35%,Nexusflow-Raven-v2 (FC),https://huggingface.co/Nexusflow/NexusRaven-V2-13B,Nexusflow,Apache 2.0,65.19%,73.89%,75.27%,80.75%,56.00%,70.00%,86.00%,41.50%,58.00%,67.06%,95.00%,27.14%,92.00%,74.00%,62.50%,57.50%,N/A,2.07,1.23,4.48
31,67.00%,Gemini-1.0-Pro-001 (FC),https://deepmind.google/technologies/gemini/#introduction,Google,Proprietary,56.77%,56.74%,79.09%,93.00%,42.00%,42.00%,92.50%,30.00%,25.50%,86.47%,89.00%,82.86%,84.00%,44.00%,12.50%,80.00%,0.13,1.27,1.0,3.39
32,65.88%,DBRX-Instruct (Prompt),https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm,Databricks,Databricks Open Model,66.62%,74.92%,64.00%,75.75%,30.00%,38.00%,71.50%,72.00%,59.00%,71.18%,80.00%,58.57%,86.00%,80.00%,62.50%,55.83%,1.25,0.64,0.41,1.34
33,65.18%,Snowflake/snowflake-arctic-instruct (Prompt),https://huggingface.co/Snowflake/snowflake-arctic-instruct,Snowflake,apache-2.0,61.09%,80.04%,62.36%,67.50%,42.00%,62.00%,69.00%,59.00%,54.00%,87.65%,91.00%,82.86%,86.00%,74.00%,72.50%,59.58%,N/A,0.98,0.56,2.13
34,64.35%,Mistral-large-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,71.49%,64.93%,81.45%,89.50%,62.00%,56.00%,93.50%,31.50%,79.50%,94.71%,95.00%,94.29%,92.00%,8.00%,65.00%,0.00%,1.97,2.07,1.33,4.97
35,63.88%,GPT-3.5-Turbo-0125 (FC),https://platform.openai.com/docs/models/gpt-3-5-turbo,OpenAI,Proprietary,74.74%,81.38%,61.45%,63.50%,53.00%,62.00%,66.00%,90.50%,81.00%,93.53%,95.00%,91.43%,80.00%,82.00%,70.00%,2.08%,0.19,1.27,0.74,2.47
36,60.41%,Mistral-small-2402 (FC Any),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,65.32%,52.62%,81.27%,90.50%,56.00%,58.00%,96.00%,39.00%,45.00%,96.47%,100.00%,91.43%,92.00%,12.00%,10.00%,0.00%,0.48,1.14,0.81,2.52
37,60.12%,Meta-Llama-3-8B-Instruct (Prompt),https://llama.meta.com/llama3,Meta,Meta Llama 3 Community,62.65%,69.95%,55.09%,58.50%,45.00%,48.00%,73.50%,58.00%,64.00%,75.29%,79.00%,70.00%,74.00%,68.00%,62.50%,43.33%,0.24,0.04,N/A,N/A
38,59.71%,Hermes-2-Pro-Mistral-7B (FC),https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,NousResearch,apache-2.0,70.25%,55.62%,72.00%,81.75%,42.00%,54.00%,80.50%,67.00%,61.50%,56.47%,78.00%,25.71%,70.00%,56.00%,40.00%,10.83%,0.49,0.08,N/A,N/A
39,59.24%,Claude-3-Sonnet-20240229 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,44.18%,43.32%,76.73%,86.00%,49.00%,58.00%,88.00%,6.00%,6.00%,85.29%,96.00%,70.00%,88.00%,0.00%,0.00%,81.67%,3.44,3.25,1.46,6.85
40,53.82%,Claude-3-Haiku-20240307 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,45.05%,46.79%,86.18%,95.50%,60.00%,64.00%,93.50%,0.50%,0.00%,91.18%,96.00%,84.29%,94.00%,2.00%,0.00%,20.83%,0.29,1.49,0.61,2.4
41,53.65%,FireFunction-v1 (FC),https://huggingface.co/fireworks-ai/firefunction-v1,Fireworks,Apache 2.0,40.75%,39.79%,70.00%,90.25%,13.00%,22.00%,93.00%,0.00%,0.00%,71.18%,95.00%,37.14%,88.00%,0.00%,0.00%,73.33%,N/A,1.69,1.53,4.61
42,53.59%,GPT-4-0613 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,39.20%,38.53%,63.82%,86.50%,4.00%,2.00%,93.00%,0.00%,0.00%,64.12%,95.00%,20.00%,90.00%,0.00%,0.00%,91.67%,10.37,3.49,3.27,10.88
43,52.65%,Mistral-tiny-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,49.66%,36.16%,55.64%,70.00%,26.00%,0.00%,56.50%,47.50%,39.00%,27.65%,46.00%,1.43%,20.00%,62.00%,35.00%,83.75%,0.13,1.45,1.41,4.39
44,43.71%,Gemma-7b-it (Prompt),https://blog.google/technology/developers/gemma-open-models/,Google,gemma-terms-of-use,41.05%,31.75%,42.18%,47.75%,29.00%,24.00%,48.00%,30.00%,44.00%,30.00%,44.00%,10.00%,32.00%,40.00%,25.00%,70.83%,0.37,0.06,N/A,N/A
45,40.76%,Mistral-Small-2402 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,40.33%,38.03%,5.82%,6.00%,6.00%,4.00%,8.00%,79.00%,68.50%,34.12%,6.00%,74.29%,20.00%,68.00%,30.00%,98.33%,0.64,1.11,0.95,3.03
46,40.41%,Deepseek-v1.5 (Prompt),https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5,Deepseek,Deepseek License,38.44%,30.89%,39.27%,50.00%,4.00%,24.00%,49.00%,37.00%,28.50%,37.06%,38.00%,35.71%,38.00%,36.00%,12.50%,57.08%,3.24,0.53,N/A,N/A
47,23.71%,Mistral-small-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,2.62%,34.37%,2.00%,2.75%,0.00%,0.00%,2.50%,3.00%,3.00%,56.47%,79.00%,24.29%,70.00%,6.00%,5.00%,99.58%,0.97,3.06,1.8,6.23
4 changes: 2 additions & 2 deletions leaderboard.html
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ <h2>Leaderboard</h2>
</p>
<div style="margin-bottom: 15px;">
<button id="expand-btn" onclick="toggleExpand()">Expand/Collapse Table</button>
<span style="margin-left: 10px;"><b><i style="font-size: 1.0em;">Last updated: 2024-06-15 <a
<span style="margin-left: 10px;"><b><i style="font-size: 1.0em;">Last updated: 2024-06-18 <a
href="https://github.com/ShishirPatil/gorilla/tree/main/berkeley-function-call-leaderboard#changelog">[Change
Log]</a></i></b></span>
</div>
Expand Down Expand Up @@ -199,7 +199,7 @@ <h2>Error Type Analysis</h2>
(coming soon).
</p>

<div id="a2b1a3d1-eab1-44e8-9b03-eeee1fac4c36" class="plotly-graph-div" style="height:100%; width:100%;"></div>
<div id="2191f1ab-02ae-4c30-a990-cdc03fdd9e53" class="plotly-graph-div" style="height:100%; width:100%;"></div>
</div>

<!-- API Explorer Section -->
Expand Down
2 changes: 1 addition & 1 deletion treemap_2.js

Large diffs are not rendered by default.

0 comments on commit 633bd5c

Please sign in to comment.