Skip to content

Commit

Permalink
[BFCL] Update Leaderboard after adding GLM-4-9B (#475)
Browse files Browse the repository at this point in the history
Co-authored-by: CharlieJCJ <[email protected]>
Co-authored-by: Huanzhi Mao <[email protected]>
  • Loading branch information
3 people authored Jul 7, 2024
1 parent 1209b03 commit 84e1b3e
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 11 deletions.
17 changes: 9 additions & 8 deletions data.csv
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,12 @@ Rank,Overall Acc,Model,Model Link,Organization,License,AST Summary,Exec Summary,
37,60.12%,Meta-Llama-3-8B-Instruct (Prompt),https://llama.meta.com/llama3,Meta,Meta Llama 3 Community,62.65%,69.95%,55.09%,58.50%,45.00%,48.00%,73.50%,58.00%,64.00%,75.29%,79.00%,70.00%,74.00%,68.00%,62.50%,43.33%,0.24,0.04,N/A,N/A
38,59.71%,Hermes-2-Pro-Mistral-7B (FC),https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B,NousResearch,apache-2.0,70.25%,55.62%,72.00%,81.75%,42.00%,54.00%,80.50%,67.00%,61.50%,56.47%,78.00%,25.71%,70.00%,56.00%,40.00%,10.83%,0.49,0.08,N/A,N/A
39,59.24%,Claude-3-Sonnet-20240229 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,44.18%,43.32%,76.73%,86.00%,49.00%,58.00%,88.00%,6.00%,6.00%,85.29%,96.00%,70.00%,88.00%,0.00%,0.00%,81.67%,3.44,3.25,1.46,6.85
40,53.82%,Claude-3-Haiku-20240307 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,45.05%,46.79%,86.18%,95.50%,60.00%,64.00%,93.50%,0.50%,0.00%,91.18%,96.00%,84.29%,94.00%,2.00%,0.00%,20.83%,0.29,1.49,0.61,2.4
41,53.65%,FireFunction-v1 (FC),https://huggingface.co/fireworks-ai/firefunction-v1,Fireworks,Apache 2.0,40.75%,39.79%,70.00%,90.25%,13.00%,22.00%,93.00%,0.00%,0.00%,71.18%,95.00%,37.14%,88.00%,0.00%,0.00%,73.33%,N/A,1.69,1.53,4.61
42,53.59%,GPT-4-0613 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,39.20%,38.53%,63.82%,86.50%,4.00%,2.00%,93.00%,0.00%,0.00%,64.12%,95.00%,20.00%,90.00%,0.00%,0.00%,91.67%,10.37,3.49,3.27,10.88
43,52.65%,Mistral-tiny-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,49.66%,36.16%,55.64%,70.00%,26.00%,0.00%,56.50%,47.50%,39.00%,27.65%,46.00%,1.43%,20.00%,62.00%,35.00%,83.75%,0.13,1.45,1.41,4.39
44,43.71%,Gemma-7b-it (Prompt),https://blog.google/technology/developers/gemma-open-models/,Google,gemma-terms-of-use,41.05%,31.75%,42.18%,47.75%,29.00%,24.00%,48.00%,30.00%,44.00%,30.00%,44.00%,10.00%,32.00%,40.00%,25.00%,70.83%,0.37,0.06,N/A,N/A
45,40.76%,Mistral-Small-2402 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,40.33%,38.03%,5.82%,6.00%,6.00%,4.00%,8.00%,79.00%,68.50%,34.12%,6.00%,74.29%,20.00%,68.00%,30.00%,98.33%,0.64,1.11,0.95,3.03
46,40.41%,Deepseek-v1.5 (Prompt),https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5,Deepseek,Deepseek License,38.44%,30.89%,39.27%,50.00%,4.00%,24.00%,49.00%,37.00%,28.50%,37.06%,38.00%,35.71%,38.00%,36.00%,12.50%,57.08%,3.24,0.53,N/A,N/A
47,23.71%,Mistral-small-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,2.62%,34.37%,2.00%,2.75%,0.00%,0.00%,2.50%,3.00%,3.00%,56.47%,79.00%,24.29%,70.00%,6.00%,5.00%,99.58%,0.97,3.06,1.8,6.23
40,54.88%,GLM-4-9b-Chat (FC),https://huggingface.co/THUDM/glm-4-9b-chat,THUDM,glm-4,38.69%,44.12%,63.27%,87.00%,0.00%,0.00%,91.50%,0.00%,0.00%,86.47%,90.00%,81.43%,90.00%,0.00%,0.00%,87.50%,N/A,0.13,N/A,N/A
41,53.82%,Claude-3-Haiku-20240307 (FC tools-2024-04-04),https://www.anthropic.com/news/claude-3-family,Anthropic,Proprietary,45.05%,46.79%,86.18%,95.50%,60.00%,64.00%,93.50%,0.50%,0.00%,91.18%,96.00%,84.29%,94.00%,2.00%,0.00%,20.83%,0.29,1.49,0.61,2.4
42,53.65%,FireFunction-v1 (FC),https://huggingface.co/fireworks-ai/firefunction-v1,Fireworks,Apache 2.0,40.75%,39.79%,70.00%,90.25%,13.00%,22.00%,93.00%,0.00%,0.00%,71.18%,95.00%,37.14%,88.00%,0.00%,0.00%,73.33%,N/A,1.69,1.53,4.61
43,53.59%,GPT-4-0613 (FC),https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo,OpenAI,Proprietary,39.20%,38.53%,63.82%,86.50%,4.00%,2.00%,93.00%,0.00%,0.00%,64.12%,95.00%,20.00%,90.00%,0.00%,0.00%,91.67%,10.37,3.49,3.27,10.88
44,52.65%,Mistral-tiny-2312 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,49.66%,36.16%,55.64%,70.00%,26.00%,0.00%,56.50%,47.50%,39.00%,27.65%,46.00%,1.43%,20.00%,62.00%,35.00%,83.75%,0.13,1.45,1.41,4.39
45,43.71%,Gemma-7b-it (Prompt),https://blog.google/technology/developers/gemma-open-models/,Google,gemma-terms-of-use,41.05%,31.75%,42.18%,47.75%,29.00%,24.00%,48.00%,30.00%,44.00%,30.00%,44.00%,10.00%,32.00%,40.00%,25.00%,70.83%,0.37,0.06,N/A,N/A
46,40.76%,Mistral-Small-2402 (Prompt),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,40.33%,38.03%,5.82%,6.00%,6.00%,4.00%,8.00%,79.00%,68.50%,34.12%,6.00%,74.29%,20.00%,68.00%,30.00%,98.33%,0.64,1.11,0.95,3.03
47,40.41%,Deepseek-v1.5 (Prompt),https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5,Deepseek,Deepseek License,38.44%,30.89%,39.27%,50.00%,4.00%,24.00%,49.00%,37.00%,28.50%,37.06%,38.00%,35.71%,38.00%,36.00%,12.50%,57.08%,3.24,0.53,N/A,N/A
48,23.71%,Mistral-small-2402 (FC Auto),https://docs.mistral.ai/guides/model-selection/,Mistral AI,Proprietary,2.62%,34.37%,2.00%,2.75%,0.00%,0.00%,2.50%,3.00%,3.00%,56.47%,79.00%,24.29%,70.00%,6.00%,5.00%,99.58%,0.97,3.06,1.8,6.23
4 changes: 2 additions & 2 deletions leaderboard.html
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ <h2>Leaderboard</h2>
</p>
<div style="margin-bottom: 15px;">
<button id="expand-btn" onclick="toggleExpand()">Expand/Collapse Table</button>
<span style="margin-left: 10px;"><b><i style="font-size: 1.0em;">Last updated: 2024-06-22 <a
<span style="margin-left: 10px;"><b><i style="font-size: 1.0em;">Last updated: 2024-06-25 <a
href="https://github.com/ShishirPatil/gorilla/tree/main/berkeley-function-call-leaderboard#changelog">[Change
Log]</a></i></b></span>
</div>
Expand Down Expand Up @@ -199,7 +199,7 @@ <h2>Error Type Analysis</h2>
(coming soon).
</p>

<div id="2191f1ab-02ae-4c30-a990-cdc03fdd9e53" class="plotly-graph-div" style="height:100%; width:100%;"></div>
<div id="6bb4fe79-0bc6-4a6d-8e10-a103cd3a10a2" class="plotly-graph-div" style="height:100%; width:100%;"></div>
</div>

<!-- API Explorer Section -->
Expand Down
2 changes: 1 addition & 1 deletion treemap_2.js

Large diffs are not rendered by default.

0 comments on commit 84e1b3e

Please sign in to comment.