Skip to content

Commit e168616

Browse files
authored
Add getting started page for magentic one (microsoft#4887)
1 parent e8797a2 commit e168616

File tree

6 files changed

+179
-6
lines changed

6 files changed

+179
-6
lines changed
Loading
Loading

python/packages/autogen-core/docs/src/index.md

+24
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,30 @@ pip install "autogen-agentchat==0.4.0.dev13"
6969
<a class="sd-sphinx-override sd-btn sd-text-wrap sd-btn-secondary reference internal" href="user-guide/agentchat-user-guide/migration-guide.html"><span class="doc">Migration Guide (0.2.x to 0.4.x)</span></a>
7070

7171
:::
72+
73+
:::{grid-item-card}
74+
:shadow: none
75+
:margin: 2 0 0 0
76+
:columns: 12 12 12 12
77+
78+
<div class="sd-card-title sd-font-weight-bold docutils">
79+
80+
{fas}`book;pst-color-primary`
81+
Magentic-One </div>
82+
Magentic-One is a generalist multi-agent system for solving open-ended web and file-based tasks across a variety of domains.
83+
84+
+++
85+
86+
87+
```{button-ref} user-guide/agentchat-user-guide/magentic-one
88+
:color: secondary
89+
90+
Get Started
91+
```
92+
93+
:::
94+
95+
7296
:::{grid-item-card} {fas}`palette;pst-color-primary` Studio
7397
:shadow: none
7498
:margin: 2 0 0 0

python/packages/autogen-core/docs/src/user-guide/agentchat-user-guide/index.md

+7
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,12 @@ How to install AgentChat
3131
Build your first agent
3232
:::
3333

34+
:::{grid-item-card} {fas}`book;pst-color-primary` Magentic-One
35+
:link: ./magentic-one.html
36+
37+
Get started with Magentic-One
38+
:::
39+
3440
:::{grid-item-card} {fas}`graduation-cap;pst-color-primary` Tutorial
3541
:link: ./tutorial/models.html
3642

@@ -56,6 +62,7 @@ How to migrate from AutoGen 0.2.x to 0.4.x.
5662
5763
installation
5864
quickstart
65+
magentic-one
5966
migration-guide
6067
```
6168

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
myst:
3+
html_meta:
4+
"description lang=en": |
5+
User Guide for AgentChat, a high-level API for AutoGen
6+
---
7+
8+
# Magentic-One
9+
10+
[Magentic-One](https://aka.ms/magentic-one-blog) is a generalist multi-agent system for solving open-ended web and file-based tasks across a variety of domains. It represents a significant step forward for multi-agent systems, achieving competitive performance on a number of agentic benchmarks (see the [technical report](https://arxiv.org/abs/2411.04468) for full details).
11+
12+
13+
When originally released in [November 2024](https://aka.ms/magentic-one-blog) Magentic-One was [implemented directly on the `autogen-core` library](https://github.com/microsoft/autogen/tree/main/python/packages/autogen-magentic-one). We have now ported Magentic-One to use `autogen-agentchat`, providing a more modular and easier to use interface.
14+
15+
16+
To this end, the Magentic-One orchestrator {py:class}`~autogen_agentchat.teams.MagenticOneGroupChat` is now simply an AgentChat team, supporting all standard AgentChat agents and features. Likewise, Magentic-One's {py:class}`~autogen_ext.agents.web_surfer.MultimodalWebSurfer`, {py:class}`~autogen_ext.agents.file_surfer.FileSurfer`, and {py:class}`~autogen_ext.agents.magentic_one.MagenticOneCoderAgent` agents are now broadly available as AgentChat agents, to be used in any AgentChat workflows.
17+
18+
19+
Lastly, there is a helper class, {py:class}`~autogen_ext.teams.magentic_one.MagenticOne`, which bundles all of this together as it was in the paper with minimal configuration.
20+
21+
22+
Find additional information about Magentic-one in our [blog post](https://aka.ms/magentic-one-blog) and [technical report](https://arxiv.org/abs/2411.04468).
23+
24+
![](../../images/autogen-magentic-one-example.png)
25+
26+
**Example**: The figure above illustrates Magentic-One multi-agent team completing a complex task from the GAIA benchmark. Magentic-One's Orchestrator agent creates a plan, delegates tasks to other agents, and tracks progress towards the goal, dynamically revising the plan as needed. The Orchestrator can delegate tasks to a FileSurfer agent to read and handle files, a WebSurfer agent to operate a web browser, or a Coder or Computer Terminal agent to write or execute code, respectively.
27+
28+
```{caution}
29+
Using Magentic-One involves interacting with a digital world designed for humans, which carries inherent risks. To minimize these risks, consider the following precautions:
30+
31+
1. **Use Containers**: Run all tasks in docker containers to isolate the agents and prevent direct system attacks.
32+
2. **Virtual Environment**: Use a virtual environment to run the agents and prevent them from accessing sensitive data.
33+
3. **Monitor Logs**: Closely monitor logs during and after execution to detect and mitigate risky behavior.
34+
4. **Human Oversight**: Run the examples with a human in the loop to supervise the agents and prevent unintended consequences.
35+
5. **Limit Access**: Restrict the agents' access to the internet and other resources to prevent unauthorized actions.
36+
6. **Safeguard Data**: Ensure that the agents do not have access to sensitive data or resources that could be compromised. Do not share sensitive information with the agents.
37+
Be aware that agents may occasionally attempt risky actions, such as recruiting humans for help or accepting cookie agreements without human involvement. Always ensure agents are monitored and operate within a controlled environment to prevent unintended consequences. Moreover, be cautious that Magentic-One may be susceptible to prompt injection attacks from webpages.
38+
```
39+
40+
## Getting started
41+
42+
Install the required packages:
43+
```bash
44+
pip install autogen-agentchat==0.4.0.dev13 autogen-ext[magentic-one]==0.4.0.dev13
45+
46+
# If using the MultimodalWebSurfer, you also need to install playwright dependencies:
47+
playwright install --with-deps chromium
48+
```
49+
50+
If you haven't done so already, go through the AgentChat tutorial to learn about the concepts of AgentChat.
51+
52+
Then, you can try swapping out a {py:class}`autogen_agentchat.teams.SelectorGroupChat` with {py:class}`~autogen_agentchat.teams.MagenticOneGroupChat`. For example:
53+
54+
```python
55+
import asyncio
56+
from autogen_ext.models.openai import OpenAIChatCompletionClient
57+
from autogen_agentchat.agents import AssistantAgent
58+
from autogen_agentchat.teams import MagenticOneGroupChat
59+
from autogen_agentchat.ui import Console
60+
61+
62+
async def main() -> None:
63+
model_client = OpenAIChatCompletionClient(model="gpt-4o")
64+
65+
assistant = AssistantAgent(
66+
"Assistant",
67+
model_client=model_client,
68+
)
69+
team = MagenticOneGroupChat([assistant], model_client=model_client)
70+
await Console(team.run_stream(task="Provide a different proof for Fermat's Last Theorem"))
71+
72+
73+
asyncio.run(main())
74+
```
75+
76+
Or, use the Magentic-One agents in a team:
77+
78+
```{caution}
79+
The example code may download files from the internet, execute code, and interact with web pages. Ensure you are in a safe environment before running the example code.
80+
```
81+
82+
```python
83+
import asyncio
84+
from autogen_ext.models.openai import OpenAIChatCompletionClient
85+
from autogen_agentchat.teams import MagenticOneGroupChat
86+
from autogen_agentchat.ui import Console
87+
from autogen_ext.agents.web_surfer import MultimodalWebSurfer
88+
89+
90+
async def main() -> None:
91+
model_client = OpenAIChatCompletionClient(model="gpt-4o")
92+
93+
surfer = MultimodalWebSurfer(
94+
"WebSurfer",
95+
model_client=model_client,
96+
)
97+
team = MagenticOneGroupChat([surfer], model_client=model_client)
98+
await Console(team.run_stream(task="What is the UV index in Melbourne today?"))
99+
100+
101+
asyncio.run(main())
102+
```
103+
104+
105+
## Architecture
106+
107+
![](../../images/autogen-magentic-one-agents.png)
108+
109+
Magentic-One work is based on a multi-agent architecture where a lead Orchestrator agent is responsible for high-level planning, directing other agents and tracking task progress. The Orchestrator begins by creating a plan to tackle the task, gathering needed facts and educated guesses in a Task Ledger that is maintained. At each step of its plan, the Orchestrator creates a Progress Ledger where it self-reflects on task progress and checks whether the task is completed. If the task is not yet completed, it assigns one of Magentic-One other agents a subtask to complete. After the assigned agent completes its subtask, the Orchestrator updates the Progress Ledger and continues in this way until the task is complete. If the Orchestrator finds that progress is not being made for enough steps, it can update the Task Ledger and create a new plan. This is illustrated in the figure above; the Orchestrator work is thus divided into an outer loop where it updates the Task Ledger and an inner loop to update the Progress Ledger.
110+
111+
Overall, Magentic-One consists of the following agents:
112+
- Orchestrator: the lead agent responsible for task decomposition and planning, directing other agents in executing subtasks, tracking overall progress, and taking corrective actions as needed
113+
- WebSurfer: This is an LLM-based agent that is proficient in commanding and managing the state of a Chromium-based web browser. With each incoming request, the WebSurfer performs an action on the browser then reports on the new state of the web page The action space of the WebSurfer includes navigation (e.g. visiting a URL, performing a web search); web page actions (e.g., clicking and typing); and reading actions (e.g., summarizing or answering questions). The WebSurfer relies on the accessibility tree of the browser and on set-of-marks prompting to perform its actions.
114+
- FileSurfer: This is an LLM-based agent that commands a markdown-based file preview application to read local files of most types. The FileSurfer can also perform common navigation tasks such as listing the contents of directories and navigating a folder structure.
115+
- Coder: This is an LLM-based agent specialized through its system prompt for writing code, analyzing information collected from the other agents, or creating new artifacts.
116+
- ComputerTerminal: Finally, ComputerTerminal provides the team with access to a console shell where the Coder’s programs can be executed, and where new programming libraries can be installed.
117+
118+
Together, Magentic-One’s agents provide the Orchestrator with the tools and capabilities that it needs to solve a broad variety of open-ended problems, as well as the ability to autonomously adapt to, and act in, dynamic and ever-changing web and file-system environments.
119+
120+
While the default multimodal LLM we use for all agents is GPT-4o, Magentic-One is model agnostic and can incorporate heterogonous models to support different capabilities or meet different cost requirements when getting tasks done. For example, it can use different LLMs and SLMs and their specialized versions to power different agents. We recommend a strong reasoning model for the Orchestrator agent such as GPT-4o. In a different configuration of Magentic-One, we also experiment with using OpenAI o1-preview for the outer loop of the Orchestrator and for the Coder, while other agents continue to use GPT-4o.
121+
122+
## Citation
123+
124+
```
125+
@misc{fourney2024magenticonegeneralistmultiagentsolving,
126+
title={Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks},
127+
author={Adam Fourney and Gagan Bansal and Hussein Mozannar and Cheng Tan and Eduardo Salinas and Erkang and Zhu and Friederike Niedtner and Grace Proebsting and Griffin Bassman and Jack Gerrits and Jacob Alber and Peter Chang and Ricky Loynd and Robert West and Victor Dibia and Ahmed Awadallah and Ece Kamar and Rafah Hosn and Saleema Amershi},
128+
year={2024},
129+
eprint={2411.04468},
130+
archivePrefix={arXiv},
131+
primaryClass={cs.AI},
132+
url={https://arxiv.org/abs/2411.04468},
133+
}
134+
```

python/packages/autogen-core/docs/src/user-guide/agentchat-user-guide/migration-guide.md

+8-6
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,12 @@ The `v0.4` version contains breaking changes. Please read this guide carefully.
66
We still maintain the `v0.2` version in the `0.2` branch; however,
77
we highly recommend you upgrade to the `v0.4` version.
88

9-
> **Note**: We no longer have admin access to the `pyautogen` PyPI package, and
10-
> the releases from that package are no longer from Microsoft since version 0.2.34.
11-
> To continue use the `v0.2` version of AutoGen, install it using `autogen-agentchat~=0.2`.
12-
> Please read our [clarification statement](https://github.com/microsoft/autogen/discussions/4217) regarding forks.
9+
```{note}
10+
We no longer have admin access to the `pyautogen` PyPI package, and
11+
the releases from that package are no longer from Microsoft since version 0.2.34.
12+
To continue use the `v0.2` version of AutoGen, install it using `autogen-agentchat~=0.2`.
13+
Please read our [clarification statement](https://github.com/microsoft/autogen/discussions/4217) regarding forks.
14+
```
1315

1416
## What is `v0.4`?
1517

@@ -571,8 +573,8 @@ asyncio.run(main())
571573
```
572574

573575
When using tool-equipped agents inside a group chat such as
574-
{py:class}`~autogen_agentchat.teams.RoundRobinGroupChat`,
575-
you simply do the same as above to add tools to the agents, and create a
576+
{py:class}`~autogen_agentchat.teams.RoundRobinGroupChat`,
577+
you simply do the same as above to add tools to the agents, and create a
576578
group chat with the agents.
577579

578580
## Chat Result

0 commit comments

Comments
 (0)