-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agentic memory #5227
Open
rickyloynd-microsoft
wants to merge
82
commits into
main
Choose a base branch
from
agentic_memory
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+2,908
−13
Open
Agentic memory #5227
Changes from all commits
Commits
Show all changes
82 commits
Select commit
Hold shift + click to select a range
442a9d8
initial checkin
rickyloynd-microsoft f8584cd
support for extensive evaluations
rickyloynd-microsoft 607e7ff
Enhance retrieval with task generalization and insight validation
rickyloynd-microsoft b045636
Support TRAPI client.
rickyloynd-microsoft 63b28d7
Restoring earlier results, and general cleanup.
rickyloynd-microsoft b921d83
Merge branch 'refs/heads/main' into agentic_memory
rickyloynd-microsoft 9dfb074
Modify imports after merge from main.
rickyloynd-microsoft 93a5ca4
Log model and token counts.
rickyloynd-microsoft 2cb9344
Only instantiate the client once.
rickyloynd-microsoft 878f458
Fix bug that was duplicating insights across trials.
rickyloynd-microsoft 21562f1
Add the Grader class.
rickyloynd-microsoft 3a40b30
Adjustments for comparison tests.
rickyloynd-microsoft 8622c5e
Test generalization over multiple tasks.
rickyloynd-microsoft 20b26c1
Add teachability and a test for it.
rickyloynd-microsoft 9d47227
Learning from demonstration, in-progress.
rickyloynd-microsoft 52d4e00
In memory retrieval, validate insights separately rather than together.
rickyloynd-microsoft 6b15777
Finish learning from demonstration.
rickyloynd-microsoft a18674c
Added RecordableChatCompletionClient as a guardrail during refactoring.
rickyloynd-microsoft 52e213e
Ran 3 evals with session recording and replay.
rickyloynd-microsoft a440b0a
Add results to recorded sessions, including session length.
rickyloynd-microsoft cab51f1
Use yaml file for eval settings.
rickyloynd-microsoft d91e58c
Simplify paths and other settings.
rickyloynd-microsoft f1d7a2f
Renamed the memory classes.
rickyloynd-microsoft 17d4c42
Apprentice.
rickyloynd-microsoft 19654e8
Moved test into the evaluator, and removed eval.py's other util funct…
rickyloynd-microsoft 7aa20c1
renaming
rickyloynd-microsoft 83a7ddc
Rerouted calls to AgenticMemoryController through FastLearner.
rickyloynd-microsoft 3047c1c
Replace task_assignment_callback with AgentWrapper.
rickyloynd-microsoft 1f20b79
Segregate files into subfolders, eval framework vs. implementation, etc.
rickyloynd-microsoft de4c12b
Rename FastLearner subclass to Apprentice, and import it only as spec…
rickyloynd-microsoft a9d6108
Refactoring, preparatory to removing eval_framework from the branch a…
rickyloynd-microsoft d67e2cc
Remove the outdated final_format_instructions parameter.
rickyloynd-microsoft 6470fd8
Move tasks into yaml files.
rickyloynd-microsoft b025199
Move client support to a subdir.
rickyloynd-microsoft 4f9267c
Move evaluations to a separate dir.
rickyloynd-microsoft db34844
single line
rickyloynd-microsoft c780852
Add baseline evaluation for the no-memory case.
rickyloynd-microsoft fa688f7
Merge branch 'refs/heads/main' into agentic_memory
rickyloynd-microsoft 43bda2f
Support o1 models
rickyloynd-microsoft be081b3
simplification of client creation code
rickyloynd-microsoft 29d1494
simplify folder structure
rickyloynd-microsoft 8e9a550
Move task data strings out of the eval functions.
rickyloynd-microsoft b3fe084
simplify page_log
rickyloynd-microsoft 077615f
simplify page_log
rickyloynd-microsoft 8847168
simplify page_log
rickyloynd-microsoft 4091ab3
conventional logging terminology
rickyloynd-microsoft 3865cff
control logger enabling
rickyloynd-microsoft 6c73674
add logging to string map
rickyloynd-microsoft db5e07b
simplify logging
rickyloynd-microsoft 07cb3f0
simplify logging
rickyloynd-microsoft e88bd69
Merge branch 'refs/heads/main' into agentic_memory
rickyloynd-microsoft 9b3f77d
merge from main
rickyloynd-microsoft a0dee67
Changes made by poe check.
rickyloynd-microsoft 7e359e9
docstrings etc.
rickyloynd-microsoft 9466ea8
docstrings etc.
rickyloynd-microsoft 4ec9bff
docstrings etc.
rickyloynd-microsoft 76c16f9
docstrings etc.
rickyloynd-microsoft a8cd0d7
docstrings etc.
rickyloynd-microsoft ed7fae1
docstrings etc.
rickyloynd-microsoft 93de858
docstrings etc.
rickyloynd-microsoft 1a309f9
docstrings etc.
rickyloynd-microsoft 8993aa1
docstrings etc.
rickyloynd-microsoft fa60d5a
Simplify naming
rickyloynd-microsoft 882d578
Simplify tests
rickyloynd-microsoft 00cbb8c
standardize logging levels
rickyloynd-microsoft 88294d2
Remove Evaluator class
rickyloynd-microsoft 7d0ed63
sample code
rickyloynd-microsoft 5b3876f
readme
rickyloynd-microsoft 21220d4
readme fixes
rickyloynd-microsoft 232ed0f
samples readme
rickyloynd-microsoft 87ee27b
readme files
rickyloynd-microsoft b21d140
readme files
rickyloynd-microsoft 1e88eb6
remove ame
rickyloynd-microsoft a3addc1
readme
rickyloynd-microsoft c6ffa43
comment out api_key lines
rickyloynd-microsoft 8f66612
Optional disabling of prefix caching (to decorrelate repeated runs)
rickyloynd-microsoft 491964f
Merge branch 'refs/heads/main' into agentic_memory
rickyloynd-microsoft 2ed08ae
Remove unnecessary instantiation of Grader
rickyloynd-microsoft f879487
Updated image using git-lfs
rickyloynd-microsoft 60f8ad3
Merge branch 'agentic_memory' of github.com:microsoft/autogen into ag…
rickyloynd-microsoft ed0a4a6
Merge branch 'refs/heads/main' into agentic_memory
rickyloynd-microsoft f0eceef
installation fixes
rickyloynd-microsoft File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
82 changes: 82 additions & 0 deletions
82
python/packages/autogen-ext/src/autogen_ext/agentic_memory/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Agentic Memory | ||
|
||
This AutoGen extension provides an implementation of agentic memory, which we define as a | ||
broad ability for AI agents to accomplish tasks more effectively by learning quickly and continually (over the long term). | ||
This is distinct from what RAG or long context windows can provide. | ||
While still under active research and development, this implementation of agentic memory | ||
can be attached to virtually any unmodified AI agent, and is designed to enable agents that: | ||
|
||
* Remember guidance, corrections, and demonstrations provided by users. | ||
* Succeed more frequently on tasks after finding successful solutions to similar tasks. | ||
* Learn and adapt quickly to changing circumstances to enable workflows that are dynamic and self-healing. | ||
|
||
The implementation is also intended to: | ||
|
||
* Be general purpose, unconstrained by types and schemas required by standard databases. | ||
* Augment rather than interfere with an agent’s special capabilities, such as powerful reasoning, long-horizon autonomy, and tool handling. | ||
* Operate in both foreground and background modes, so that an agent can discuss tasks with a user (in the foreground) | ||
then work productively on those tasks (in the background) while the user does other things. | ||
* Allow for fine-grained transparency and auditing of individual memories by human users or other agents. | ||
* Allow agents to be personalized (to a single user) as well as specialized (to a subject, domain or project). | ||
The benefits of personalization scale linearly with the number of users, but the benefits of domain specialization | ||
can scale quadratically with the number of users working in that domain, as insights gained from interactions with one user | ||
can benefit other users in similar situations. | ||
* Support multiple memory banks dynamically attached to an agent at runtime. | ||
* Enable enforcement of security boundaries at the level of individual memory banks. | ||
* Allow users to download and port memory banks between agents and systems. | ||
|
||
![agentic_memory.png](../../../imgs/agentic_memory.png) | ||
|
||
The block diagram above outlines the key components of our baseline agentic memory architecture, | ||
which augments an agent or team with the agentic memory mechanisms. | ||
|
||
The **Agentic Memory Controller** implements the fast-learning methods described below, | ||
and manages communication with an **Agentic Memory Bank** containing a vector DB and associated structures. | ||
|
||
The **Apprentice** is a placeholder for whatever app wraps the combination of agentic memory plus an arbitrary agent or team. | ||
Some applications will use the Apprentice class, while others will instantiate and use the Agentic Memory Controller directly. | ||
|
||
The agent or team may interact with an **Environment** such as a web browser. | ||
We’ve successfully run agentic memory with a simple AssistantAgent, | ||
the Magentic-One orchestrator, and the GitHub Copilot Chat agent. | ||
|
||
## Memory Creation and Storage | ||
|
||
Each stored memory is an insight (in text form) crafted to help the agent accomplish future tasks that are similar | ||
to some task encountered in the past. If the user provides advice for solving a given task, | ||
the advice is extracted and stored as an insight. If the user demonstrates how to perform a task, | ||
the task and demonstration are stored together as an insight that could be applied to similar but different tasks. | ||
If the agent is given a task (free of side-effects) and some means of determining success or failure, | ||
the memory controller repeats the following learning loop in the background some number of times: | ||
|
||
1. Test the agent on the task a few times to check for a failure. | ||
2. If a failure is found, analyze the agent’s response in order to: | ||
1. Diagnose the failure of reasoning or missing information, | ||
2. Phrase a general piece of advice, such as what a teacher might give to a student, | ||
3. Temporarily append this advice to the task description, | ||
4. Return to step 1. | ||
5. If some piece of advice succeeds in helping the agent solve the task a number of times, add the advice as an insight to memory. | ||
3. For each insight to be stored in memory, an LLM is prompted to generate a set of free-form, multi-word topics related to the insight. Each topic is embedded to a fixed-length vector and stored in a vector DB mapping it to the topic’s related insight. | ||
|
||
## Memory Retrieval and Usage | ||
|
||
When the agent is given a task, the following steps are performed by the memory controller: | ||
1. The task is rephrased into a generalized form. | ||
2. A set of free-form, multi-word query topics are generated from the generalized task. | ||
3. A potentially large number of previously stored topics, those most similar to each query topic, are retrieved from the vector DB along with the insights they map to. | ||
4. These candidate insights are filtered by the aggregate similarity of their stored topics to the query topics. | ||
5. In the final filtering stage, an LLM is prompted to return only those insights that seem potentially useful in solving the task at hand. | ||
|
||
Retrieved insights that pass the filtering steps are listed under a heading like | ||
“Important insights that may help solve tasks like this”, then appended to the task description before it is passed to the agent as usual. | ||
|
||
## Setup and Usage | ||
|
||
Install AutoGen and its extension package as follows: | ||
|
||
`pip install "autogen-ext[agentic-memory]"` | ||
|
||
We provide [sample code](../../../../../samples/agentic_memory) to illustrate the following forms of memory-based fast learning: | ||
* Agent learning from user advice and corrections | ||
* Agent learning from user demonstrations | ||
* Agent learning from its own experience |
7 changes: 7 additions & 0 deletions
7
python/packages/autogen-ext/src/autogen_ext/agentic_memory/__init__.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
from .grader import Grader | ||
from .page_logger import PageLogger | ||
from .apprentice import Apprentice | ||
from .agent_wrapper import AgentWrapper | ||
from .agentic_memory_controller import AgenticMemoryController | ||
|
||
__all__ = ["Apprentice", "PageLogger", "Grader", "AgentWrapper", "AgenticMemoryController"] |
154 changes: 154 additions & 0 deletions
154
python/packages/autogen-ext/src/autogen_ext/agentic_memory/_agentic_memory_bank.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
import os | ||
import pickle | ||
from dataclasses import dataclass | ||
from typing import Dict, List, Optional, Union | ||
|
||
from ._string_similarity_map import StringSimilarityMap | ||
from .page_logger import PageLogger | ||
|
||
|
||
@dataclass | ||
class Insight: | ||
""" | ||
Represents a task-completion insight, which is a string that may help solve a task. | ||
""" | ||
id: str | ||
insight_str: str | ||
task_str: str | ||
topics: List[str] | ||
|
||
|
||
class AgenticMemoryBank: | ||
""" | ||
Stores task-completion insights in a vector DB for later retrieval. | ||
|
||
Args: | ||
- settings: Settings for the memory bank. | ||
- reset: True to clear the DB before starting. | ||
- logger: The PageLogger object to use for logging. | ||
|
||
Methods: | ||
- reset: Forces immediate deletion of all contents, in memory and on disk. | ||
- save_insights: Saves the current insight structures (possibly empty) to disk. | ||
- contains_insights: Returns True if the memory bank contains any insights. | ||
- add_insight: Adds an insight to the memory bank, given topics related to the insight, and optionally the task. | ||
- add_task_with_solution: Adds a task-insight pair to the memory bank, to be retrieved together later. | ||
- get_relevant_insights: Returns any insights from the memory bank that appear sufficiently relevant to the given | ||
""" | ||
def __init__(self, settings: Dict, reset: bool, logger: PageLogger) -> None: | ||
self.settings = settings | ||
self.logger = logger | ||
self.logger.enter_function() | ||
|
||
memory_dir_path = os.path.expanduser(self.settings["path"]) | ||
self.relevance_conversion_threshold = self.settings["relevance_conversion_threshold"] | ||
self.n_results = self.settings["n_results"] | ||
self.distance_threshold = self.settings["distance_threshold"] | ||
|
||
path_to_db_dir = os.path.join(memory_dir_path, "string_map") | ||
self.path_to_dict = os.path.join(memory_dir_path, "uid_insight_dict.pkl") | ||
|
||
self.string_map = StringSimilarityMap(reset=reset, path_to_db_dir=path_to_db_dir, logger=self.logger) | ||
|
||
# Load or create the associated insight dict on disk. | ||
self.uid_insight_dict = {} | ||
self.last_insight_id = 0 | ||
if (not reset) and os.path.exists(self.path_to_dict): | ||
self.logger.info("\nLOADING INSIGHTS FROM DISK {}".format(self.path_to_dict)) | ||
self.logger.info(" Location = {}".format(self.path_to_dict)) | ||
with open(self.path_to_dict, "rb") as f: | ||
self.uid_insight_dict = pickle.load(f) | ||
self.last_insight_id = len(self.uid_insight_dict) | ||
self.logger.info("\n{} INSIGHTS LOADED".format(len(self.uid_insight_dict))) | ||
|
||
# Clear the DB if requested. | ||
if reset: | ||
self._reset_insights() | ||
|
||
self.logger.leave_function() | ||
|
||
def reset(self) -> None: | ||
""" | ||
Forces immediate deletion of all contents, in memory and on disk. | ||
""" | ||
self.string_map.reset_db() | ||
self._reset_insights() | ||
|
||
def _reset_insights(self) -> None: | ||
""" | ||
Forces immediate deletion of the insights, in memory and on disk. | ||
""" | ||
self.uid_insight_dict = {} | ||
self.save_insights() | ||
|
||
def save_insights(self) -> None: | ||
""" | ||
Saves the current insight structures (possibly empty) to disk. | ||
""" | ||
self.string_map.save_string_pairs() | ||
with open(self.path_to_dict, "wb") as file: | ||
pickle.dump(self.uid_insight_dict, file) | ||
|
||
def contains_insights(self) -> bool: | ||
""" | ||
Returns True if the memory bank contains any insights. | ||
""" | ||
return len(self.uid_insight_dict) > 0 | ||
|
||
def _map_topics_to_insight(self, topics: List[str], insight_id: str, insight: Insight) -> None: | ||
""" | ||
Adds a mapping in the vec DB from each topic to the insight. | ||
""" | ||
self.logger.enter_function() | ||
self.logger.info("\nINSIGHT\n{}".format(insight.insight_str)) | ||
for topic in topics: | ||
self.logger.info("\n TOPIC = {}".format(topic)) | ||
self.string_map.add_input_output_pair(topic, insight_id) | ||
self.uid_insight_dict[insight_id] = insight | ||
self.logger.leave_function() | ||
|
||
def add_insight(self, insight_str: str, topics: List[str], task_str: Optional[str] = None) -> None: | ||
""" | ||
Adds an insight to the memory bank, given topics related to the insight, and optionally the task. | ||
""" | ||
self.last_insight_id += 1 | ||
id_str = str(self.last_insight_id) | ||
insight = Insight(id=id_str, insight_str=insight_str, task_str=task_str, topics=topics) | ||
self._map_topics_to_insight(topics, id_str, insight) | ||
|
||
def add_task_with_solution(self, task: str, solution: str, topics: List[str]) -> None: | ||
""" | ||
Adds a task-solution pair to the memory bank, to be retrieved together later as a combined insight. | ||
This is useful when the insight is a demonstration of how to solve a given type of task. | ||
""" | ||
self.last_insight_id += 1 | ||
id_str = str(self.last_insight_id) | ||
# Prepend the insight to the task description for context. | ||
insight_str = "Example task:\n\n{}\n\nExample solution:\n\n{}".format(task, solution) | ||
insight = Insight(id=id_str, insight_str=insight_str, task_str=task, topics=topics) | ||
self._map_topics_to_insight(topics, id_str, insight) | ||
|
||
def get_relevant_insights(self, task_topics: List[str]) -> Dict[str, float]: | ||
""" | ||
Returns any insights from the memory bank that appear sufficiently relevant to the given task topics. | ||
""" | ||
# Process the matching topics to build a dict of insight-relevance pairs. | ||
matches = [] # Each match is a tuple: (topic, insight, distance) | ||
insight_relevance_dict = {} | ||
for topic in task_topics: | ||
matches.extend(self.string_map.get_related_string_pairs(topic, self.n_results, self.distance_threshold)) | ||
for match in matches: | ||
relevance = self.relevance_conversion_threshold - match[2] | ||
insight_id = match[1] | ||
insight_str = self.uid_insight_dict[insight_id].insight_str | ||
if insight_str in insight_relevance_dict: | ||
insight_relevance_dict[insight_str] += relevance | ||
else: | ||
insight_relevance_dict[insight_str] = relevance | ||
|
||
# Filter out insights with overall relevance below zero. | ||
for insight in list(insight_relevance_dict.keys()): | ||
if insight_relevance_dict[insight] < 0: | ||
del insight_relevance_dict[insight] | ||
|
||
return insight_relevance_dict |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using dictionary settings, we should either flatten the settings in the constructor, or use a config class that is a Pydantic basemodel for validation and serializable configs. See existing example in
autogen_agentchat.agents.AssistantAgent
. If this class (and others in this PR) implements the ComponentConfig, you can easily load the configurations from a file to create an object of the class.