Skip to content

Commit

Permalink
[Agent-Arena] Navigate the blog link of LmSys to Elo Blog. (#670)
Browse files Browse the repository at this point in the history
This is a change made to the blog of Agent Arena - where we redirect
users to a different LMSys blog in our Elo section.
  • Loading branch information
arthbohra authored Oct 3, 2024
1 parent c9e0ea3 commit ae1699b
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion blogs/14_agent_arena.html
Original file line number Diff line number Diff line change
Expand Up @@ -580,7 +580,7 @@ <h3>⚖️ Evaluating Agents with the Extended Bradley-Terry Model</h3>
<div class="body">
<h3>The Extended Bradley-Terry Model</h3>
<p>
Agent Arena uses the <a href="https://blog.lmarena.ai/blog/2024/redteam-arena/">Bradley-Terry extension</a>, which allows us to compare different agents based on their subcomponents, including tools, models, and frameworks. Instead of just evaluating the agents atomically, we also assess the performance of each individual subcomponent. This allows us to more accurately pinpoint where an agent's strength lies. For example, our first agent could be a combination of LangChain, Brave-Search, and GPT-4o-2024-08-06, while the second agent could be LlamaIndex, Wikipedia, and Claude-3-5-Sonnet-20240620.
Agent Arena uses the <a href="https://blog.lmarena.ai/blog/2024/extended-arena/">Bradley-Terry extension</a>, which allows us to compare different agents based on their subcomponents, including tools, models, and frameworks. Instead of just evaluating the agents atomically, we also assess the performance of each individual subcomponent. This allows us to more accurately pinpoint where an agent's strength lies. For example, our first agent could be a combination of LangChain, Brave-Search, and GPT-4o-2024-08-06, while the second agent could be LlamaIndex, Wikipedia, and Claude-3-5-Sonnet-20240620.

Therefore, we propose the following observation model for the Extended Bradley-Terry Model. Given <code>P_1</code>,

Expand Down

0 comments on commit ae1699b

Please sign in to comment.