Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/tool calling #6757

Open
wants to merge 11 commits into
base: dev
Choose a base branch
from

Conversation

SeanScripts
Copy link
Contributor

@SeanScripts SeanScripts commented Feb 18, 2025

Checklist:

I've been working on implementing tool/function calling for models that support it. This is a major feature that enables LLMs to use tools to do a lot of things they normally aren't very good at.

  • Added "tools" folder with some example tools as JSON files, and a "presets" folder within it for tool presets as simple text lists.
  • Added a new tab next to Instruction template for Tools
  • Within this new tab, added a few overall options
    • Confirm tool use: Enabled by default and recommended, pauses execution so you can confirm before running a tool by pressing the "continue" button
    • Tools in user message: Changes the position of the tools, as an option in some instruct templates. I haven't really tested this option, but it might help with some models
    • Max consecutive tool uses: When you have "Confirm tool use" disabled, prevents the model from getting in an endless loop of calling the same tool, which happens sometimes with small models
  • Added a tool selection preset with save/delete options, which just saves a text file under "tools/presets/" with the selected tools, so you can select groups of tools
  • Multiselect checkbox group of the available tools to be enabled in the current session. Clicking on of these will display the details in the menu to the right (I wanted something like a radio button for the selected tool vs checkboxes for the enabled ones, and this has the problem of requiring you to enable a tool to select it... Which I think I'd rather avoid given that currently it executes the tool code when it's enabled, to add the function to the globals.) and a dropdown menu for the tool filename, which is much safer
  • On the right-hand menu, there are fields for the selected tool which can also be used to add a new tool, along with options to save/delete the tool:
    • Name
    • Type (function, not sure about other options)
    • Description (to tell the LLM what the tool does and how to use it)
    • Parameters (JSON object describing the input parameters for the tool)
    • Action (Python code to execute to define the function which runs when the tool is used by a model, with the name of the function matching the name of the tool, and returning a string)
  • Added the functionality to be able to run tool calls in instruct mode with the right models, but not anything for the API yet. Tool calls are processed by checking the model output for JSON and seeing if the fields match what is required for the tool call. Some models use different formats or special tokens for tool calling, and I'm using the instruct template to make it hopefully work across different models despite the differences. Verified for Llama 3 and Qwen 2.5 at least.

Examples:
Gradio UI for tools (roll_dice example shown)
tools_tab

(Model is Qwen 2.5 32B Instruct Q4 in these tests)

Dice example, showing before and after you confirm execution:
roll_dice_tool_call_example

Code interpreter example, showing the displayed code:
code_interpreter_tool_call_example

Tutorial on how to test tool use:

  • Load a model that has tools enabled, e.g. Llama 3 or Qwen 2.5 (any size). The backend for loading shouldn't matter.
  • Navigate to the new Tools tab (under Parameters, next to Instruction template)
  • Keep "Confirm tool use" checked (I recommend this for safety, especially if you're testing the code interpreter)
  • Choose a tool selection preset, or check any of the example tools I added, e.g. roll_dice.
  • If you want to add your own custom tool, check the examples for an idea of the format required.
    • Tool name and description are required and it's what the model uses to determine what the tool does and how to use it, though it can also see the parameters to know the format requirements. Adding few-shot examples might be a good thing to add in the future, since small models can't always do zero-shot tool use very well.
    • Tool parameters follow the format from the OpenAI function calling spec. The validation I have in place checks the "properties" field for different input parameters and the "required" field for a list of required parameters. For each property, it checks the "type" field as one of a few standard options like number and string.
    • Tool action is the code that runs when you enable the tool (check the box to add it to the active tools the model can use). This must define the function for your tool, which returns a string (for now, other types could be nice later), and the name of this function must match the tool name. This is arbitrary code that can be executed, so make sure the tools you define and run are safe. For tools that let the LLM execute arbitrary code (like code interpreter), use caution and read the code before letting it run, as backdoors are technically possible. There's not currently any environment isolation here, so tools could be used to read/write/delete your local files (which could also be useful for things like RAG), or make API calls to external sites. Just depends what code you're running, so use common sense and check that the code is reasonable and safe.
  • After selecting the tools you want, assuming the code for them was loaded successfully, start a chat in instruct mode and just ask for something in plain language that would require using the tool, e.g. "Roll a D20". There is no guarantee the model will actually choose to use the tool, but if it's enabled and the model's instruct template includes the tools, and it was tuned to recognize when to use tools, then it often will.
  • If the model generates a tool call (or more than one), it will get replaced by a JSON object following the OpenAI spec, assigning it a tool call ID. At this point, if you have the "confirm tool use" option selected (recommended), the execution will pause, and you can check what tool call will be made. For the code interpreter tool, the code will also be displayed with syntax highlighting.
  • To accept the tool call, press the "continue" button. This will run the tool and display the tool result. Then the LLM will continue generating and generate a response with this tool call and result in the context, so that it can interpret the result. At this point it can also choose to make another tool call and repeat the process.

Models tested: Llama 3.2 Instruct (1B and 7B), Qwen 2.5 Instruct (7B and 32B). The 32B model obviously worked best for this, the small models tend to have issues with knowing when to use the tools and will often call them repeatedly or make up nonexistent ones.

The code interpreter tool looks at the output which was printed during execution, so if the code ends with just a variable without printing it (as if in a notebook), it won't work. This is something that needs to be fixed. Things like matplotlib also won't work, though if you save a plot to a file it will work (just make sure not to overwrite existing files...) The weather tool is just a placeholder, and the random number generation tools are also just examples to show how it works, but one could make useful tools for lots of different things.

This is a work in progress, but the basic functionality is in place to be able to create and run tools in the UI. I could reorganize it better and put some stuff in html_generator.py rather than directly in chat.py, so that it displays in a special way in the UI rather than just appearing as text with some markdown blocks. It also only works in Instruct mode right now. Token bias after detecting the start of a tool call would be nice to ensure it formats the tool call properly, though from what I've seen with these modern models, that hasn't really been an issue, surprisingly.

I still need to add the OpenAI API extension support for this. I've been focusing on getting it working in the UI, but I understand a lot of people want the API support. I haven't really looked at the extension though, so if anyone has any ideas on how to add it now that it's working in the UI, please leave a comment, and let's collaborate.

There are lots of things to add to improve this feature. But it is functional, at least, aside from the API, and people might like to test it and suggest improvements.

I don't have the option to merge to a new branch, so I set this PR to dev, but I suggest making a separate branch for this, as it changes quite a lot about the main generation code and should be tested more.

Relevant issues this (partially) addresses:
#4286
#4455
#6539

Please let me know what you think! :)

oobabooga and others added 11 commits February 14, 2025 23:25
Maybe I should make these commits less massive... There's still some work to be done on cleaning things up and fixing some significant bugs, but it's at least working at a basic level.
Add TODOs for some bugs, add some warnings and additional information about usage
Instead of replacing it with the tool call JSON object, keep both the original and modified tool call. This is helpful for seeing how exactly the tool call was generated, which is useful for seeing things like token probabilities. This may cause duplication in the generated prompt, though
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants