Course Related Links:
Authors
- Joffrey Thomas
- Ben Burtenshaw
- Thomas Simonini
# Make sure you have git-lfs installed (https://git-lfs.com)
brew install git-lfs
# For Hugging Face repository/space, now as submodules of this repository
git lfs install
# Login with token (to push code back to Hugging Face repository)
# https://discuss.huggingface.co/t/cant-push-to-new-space/35319/4
huggingface-cli login
# https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/
python -m venv .venv
source .venv/bin/activate
# https://huggingface.co/docs/transformers/installation
pip install -r requirements.txt
pip install -r spaces/Unit_1-First_Agent/requirements.txt
cp .env.example .env
# Setup Hugging Face API key
# https://hf.co/settings/tokens
Week | Unit | Topic | Lectures | Quiz | Assignments | Others |
---|---|---|---|---|---|---|
- | 0 | Welcome to the Course | Welcome To The Agents Course! Introduction to the Course and Q&A - YouTube | - | - | - |
2025/2/10~2/16 | 1 | Introduction to Agents | - | Unit 1 Quiz | First Agent | Unit 1 Notebook, Try Dummy Agent and smolagents |
2025/2/17~2/23 | Bonus | Fine-tune your agent | - | - | - | - |
2025/2/24~3/9 | 2 | 2_frameworks | - | - | - | - |
2025/3/10~3/31 | 3 | 3_use_cases | - | - | - | - |
2025/4/1~4/30 | 4 | 4_final_assignment_with_benchmark | - | - | - | - |
Welcome, guidelines, necessary tools, and course overview.
- Welcome To The Agents Course! Introduction to the Course and Q&A - YouTube (2025/2/13 00:00 UTC+8)
Definition of agents, LLMs, model family tree, and special tokens.
- Introduction to Agents
- Understanding Agents
- What is an Agent, and how does it work?
- How do Agents make decisions using reasoning and planning?
- The Role of LLMs (Large Language Models) in Agents
- How LLMs serve as the “brain” behind an Agent.
- How LLMs structure conversations via the Messages system.
- Tools and Actions
- How Agents use external tools to interact with the environment.
- How to build and integrate tools for your Agent.
- The Agent Workflow:
- Think → Act → Observe.
- Understanding Agents
- What is an Agent?
- An Agent is a system that leverages an AI model to interact with its environment in order to achieve a user-defined objective. It combines reasoning, planning, and the execution of actions (often via external tools) to fulfill tasks.
- The Brain (AI Model)
- LLM (Large Language Model): e.g. GPT4 from OpenAI, LLama from Meta, Gemini from Google, ...
- VLM (Vision Language Model)
- The Body (Capabilities and Tools)
- The Brain (AI Model)
- To summarize, an Agent is a system that uses an AI Model (typically a LLM) as its core reasoning engine, to
- Understand natural language: Interpret and respond to human instructions in a meaningful way.
- Reason and plan: Analyze information, make decisions, and devise strategies to solve problems.
- Interact with its environment: Gather information, take actions, and observe the results of those actions.
- An Agent is a system that leverages an AI model to interact with its environment in order to achieve a user-defined objective. It combines reasoning, planning, and the execution of actions (often via external tools) to fulfill tasks.
- Small Quiz (ungraded) (Quick Quiz 1)
- What are LLMs?
- Messages and Special Tokens
- openai-python/chatml.md at release-v0.28.0 · openai/openai-python (ChatML template format)
- Chat Templates (
chat_template
is usually in the model's tokenizer) - 🤗 Transformers
- What are Tools?
- A Tool should contain:
- A textual description of what the function does.
- A Callable (something to perform an action).
- Arguments with typings.
- (Optional) Outputs with typings.
- The tool description is injected in the system prompt.
- What the tool does
- What exact inputs it expects
- A Tool should contain:
- Quick Self-Check (ungraded) (Quick Quiz 2)
- Understanding AI Agents through the Thought-Action-Observation Cycle
- Agents work in a continuous cycle of: thinking (Thought) → acting (Act) and observing (Observe).
- Thought: The LLM part of the Agent decides what the next step should be.
- Action: The agent takes an action, by calling the tools with the associated arguments.
- Observation: The model reflects on the response from the tool.
- Agents work in a continuous cycle of: thinking (Thought) → acting (Act) and observing (Observe).
- Thought: Internal Reasoning and the Re-Act Approach
- ReAct (papers.cool): “Reasoning” (Think) with “Acting” (Act)
- ReAct is a simple prompting technique that appends “Let’s think step by step” before letting the LLM decode the next tokens.
- We have recently seen a lot of interest for reasoning strategies. This is what's behind models like Deepseek R1 or OpenAI's o1, which have been fine-tuned to "think before answering".
- ReAct (papers.cool): “Reasoning” (Think) with “Acting” (Act)
- Actions: Enabling the Agent to Engage with Its Environment
- One key method for implementing actions is the Stop and Parse Approach. This method ensures that the agent’s output is structured and predictable:
- Generation in a Structured Format: The agent outputs its intended action in a clear, predetermined format (JSON or code).
- Halting Further Generation: Once the action is complete, the agent stops generating additional tokens. This prevents extra or erroneous output.
- Parsing the Output: An external parser reads the formatted action, determines which Tool to call, and extracts the required parameters.
- An alternative approach is using Code Agents. The idea is: instead of outputting a simple JSON object, a Code Agent generates an executable code block—typically in a high-level language like Python.
- Expressiveness: Code can naturally represent complex logic, including loops, conditionals, and nested functions, providing greater flexibility than JSON.
- Modularity and Reusability: Generated code can include functions and modules that are reusable across different actions or tasks.
- Enhanced Debuggability: With a well-defined programming syntax, code errors are often easier to detect and correct.
- Direct Integration: Code Agents can integrate directly with external libraries and APIs, enabling more complex operations such as data processing or real-time decision making.
- One key method for implementing actions is the Stop and Parse Approach. This method ensures that the agent’s output is structured and predictable:
- Observe: Integrating Feedback to Reflect and Adapt
- Observations are how an Agent perceives the consequences of its actions.
- Collects Feedback: Receives data or confirmation that its action was successful (or not).
- Appends Results: Integrates the new information into its existing context, effectively updating its memory.
- Adapts its Strategy: Uses this updated context to refine subsequent thoughts and actions.
- After performing an action, the framework follows these steps in order:
- Parse the action to identify the function(s) to call and the argument(s) to use.
- Execute the action.
- Append the result as an Observation.
- Observations are how an Agent perceives the consequences of its actions.
- Dummy Agent Library (the hallucination issue => OpenAI model & slightly bigger model works)
- dummy_agent_library.ipynb · agents-course/notebooks at main (open in Google Colab) => Modified Version
- The
chat
method is the RECOMMENDED method to use in order to ensure a smooth transition between models - If use
text_generation
method, we need to provide prompt (e.g. special tokens for the specific model) properly
- Let’s Create Our First Agent Using smolagents (failed to use course HfApiModel API endpoint => Currently use OpenAI model (local model might be too weak) or the slight bigger one same as the Dummy Agent part)
- smolagents is a library that focuses on codeAgent, a kind of agent that performs “Actions” through code blocks, and then “Observes” results by executing the code.
- Introducing smolagents: simple agents that write actions in code.
- Agent process - YouTube
- Duplicate this space: First Agent Template - a Hugging Face Space by agents-course
- Modify this incomplete code: app.py · agents-course/First_agent_template at main
- First Agent Template - a Hugging Face Space by daviddwlee84
- Unit 1 Quiz
- Get your certificate
- Conclusion
Fine-tune a Agent to do function calling (aka to be able to call tools based on user prompt)
- Introduction - Hugging Face Agents Course
- What is Function Calling? => Function-calling is a way for an LLM to take actions on its environment
- Function calling and other API updates | OpenAI <= It has first been introduced in GPT-4, and was then reproduced in other models.
- Let’s Fine-Tune your model for function-calling
- Conclusion
Overview of smolagents, LangChain, LangGraph, and LlamaIndex.
SQL, code, retrieval, and on-device agents using various frameworks.
Automated evaluation of agents and leaderboard with student results.
Packages
smolagents
Videos
- Deep Dive into LLMs like ChatGPT - YouTube
Courses