Skip to content

Commit

Permalink
[BFCL Leaderboard] Hot Fix Treemap Visualization (#588)
Browse files Browse the repository at this point in the history
This PR hot fixes some minor issues with the treemap on the leaderboard
and the plotly visualization in the blog post.
It **does not** affect score. 

---------

Co-authored-by: Charlie Cheng-Jie Ji <[email protected]>
  • Loading branch information
HuanzhiMao and CharlieJCJ authored Aug 19, 2024
1 parent 83d7f08 commit 0a40be4
Show file tree
Hide file tree
Showing 7 changed files with 24 additions and 13 deletions.
2 changes: 1 addition & 1 deletion assets/blog12_chart_data.json

Large diffs are not rendered by default.

9 changes: 7 additions & 2 deletions assets/blog12_chart_function.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ function createHistogram(containerId, data, category) {
height: 450,
width: 550,
barmode: 'overlay',
xaxis: { title: category, range: [0, 100] },
xaxis: { title: category, range: [0, 100]},
yaxis: { title: 'Frequency' },
margin: { l: 100, r: 100, b: 50, t: 80, pad: 4 },
legend: { // Adjust legend position
Expand All @@ -25,7 +25,12 @@ function createHistogram(containerId, data, category) {
const plotData = data.slice(0, 2).map(trace => ({
...trace,
x: trace.x.map(x => parseFloat(x)),
opacity: 0.7
opacity: 0.7,
xbins: {
start: 0,
end: 100,
size: 10
},
}));

Plotly.newPlot(containerId, plotData, layout, { responsive: true });
Expand Down
9 changes: 7 additions & 2 deletions blogs/8_berkeley_function_calling_leaderboard.html
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ <h4 class="text-center" style="margin: 0px;">
<p></p>
</h4>
</div>
<b><i style="font-size: 1.0em;">Last updated: 2024-06-06 <a
<b><i style="font-size: 1.0em;">Last updated: 2024-08-19 <a
href="https://github.com/ShishirPatil/gorilla/tree/main/berkeley-function-call-leaderboard#changelog">[Change
Log]</a></i></b>
<br></br>
Expand All @@ -124,7 +124,11 @@ <h4 class="text-center" style="margin: 0px;">
and we also evaluate the model's ability to withhold picking any function when the right
function is not available. And one more thing - the leaderboard now also includes cost and
latency for all the different models!

<br>
<br>
On Aug 19th, 2024, we released the BFCL V2 dataset, featuring enterprise-contributed data,
tackling issues like bias and data contamination, and focuses on dynamic, real-world scenarios.
Check out the <i>BFCL V2 · Live </i><a href="https://gorilla.cs.berkeley.edu/blogs/12_bfcl_v2_live.html">Blog Post</a> for more details.
</p>
<p>
Quick Links:
Expand All @@ -134,6 +138,7 @@ <h4 class="text-center" style="margin: 0px;">
<li>BFCL Evaluation Dataset: <a
href="https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard">
HuggingFace Dataset 🤗</a></li>
<li>BFCL V2 Live: <a href=https://gorilla.cs.berkeley.edu/blogs/12_bfcl_v2_live.html>Blog Post</a></li>
<li>Gradio Demo: <a
href="https://huggingface.co/spaces/gorilla-llm/berkeley-function-calling-leaderboard">
HuggingFace Space 🤗 </a></li>
Expand Down
7 changes: 4 additions & 3 deletions leaderboard.html
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,9 @@ <h2>BFCL Leaderboard</h2>
the <a href="https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html#cost">blog</a>.
</p>
<p>
<b>AST Summary</b> is the unweighted average of the four test categories under AST Evaluation.
<b>Exec Summary</b> is the unweighted average of the four test categories under Exec Evaluation.
<b>AST Summary</b> is the <b>unweighted</b> average of the four test categories under AST Evaluation.
<b>Exec Summary</b> is the <b>unweighted</b> average of the four test categories under Exec Evaluation.
<b>Overall Accuracy</b> is the <b>unweighted</b> average of all the sub-categories.
</p>
<p>
Click on column header to sort. If you would like to add
Expand Down Expand Up @@ -206,7 +207,7 @@ <h2>Error Type Analysis</h2>
(coming soon).
</p>

<div id="9c21ef60-cc47-4c37-9387-87d72992be0a" class="plotly-graph-div" style="height:100%; width:100%;"></div>
<div id="ce9db8d5-4929-41aa-b79e-3a9206ca7089" class="plotly-graph-div" style="height:100%; width:100%;"></div>
</div>

<!-- API Explorer Section -->
Expand Down
6 changes: 3 additions & 3 deletions leaderboard_live.html
Original file line number Diff line number Diff line change
Expand Up @@ -154,8 +154,8 @@ <h2>BFCL Live Leaderboard</h2>
the <a href="https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html#cost">blog</a>.
</p>
<p>
<b>AST Summary</b> is the unweighted average of the four test categories under AST Evaluation.
<b>Exec Summary</b> is the unweighted average of the four test categories under Exec Evaluation.
<b>AST Summary</b> is the <b>weighted</b> average of the four test categories under AST Evaluation.
<b>Overall Accuracy</b> is the <b>weighted</b> average of all the sub-categories.
</p>
<p>
Click on column header to sort. If you would like to add
Expand Down Expand Up @@ -198,7 +198,7 @@ <h2>Error Type Analysis</h2>
(coming soon).
</p>

<div id="a5bbb836-d2ea-4728-a6eb-6d1f64ee741f" class="plotly-graph-div" style="height:100%; width:100%;"></div>
<div id="cb08818b-c85b-431e-9f21-ca304f0e4183" class="plotly-graph-div" style="height:100%; width:100%;"></div>
</div>

<!-- API Explorer Section -->
Expand Down
2 changes: 1 addition & 1 deletion treemap_live.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion treemap_main.js

Large diffs are not rendered by default.

0 comments on commit 0a40be4

Please sign in to comment.