|
180 | 180 | "**NB the default usage response is to return zero values**"
|
181 | 181 | ]
|
182 | 182 | },
|
183 |
| - { |
184 |
| - "cell_type": "markdown", |
185 |
| - "metadata": {}, |
186 |
| - "source": [ |
187 |
| - "## Caching Wrapper\n", |
188 |
| - "\n", |
189 |
| - "`autogen_core` implements a {py:class}`~autogen_core.models.ChatCompletionCache` that can wrap any {py:class}`~autogen_core.models.ChatCompletionClient`. Using this wrapper avoids incurring token usage when querying the underlying client with the same prompt multiple times. \n", |
190 |
| - "\n", |
191 |
| - "{py:class}`~autogen_core.models.ChatCompletionCache` uses a {py:class}`~autogen_core.CacheStore` protocol to allow duck-typing any storage object that has a pair of `get` & `set` methods (such as `redis.Redis` or `diskcache.Cache`)." |
192 |
| - ] |
193 |
| - }, |
194 |
| - { |
195 |
| - "cell_type": "code", |
196 |
| - "execution_count": null, |
197 |
| - "metadata": {}, |
198 |
| - "outputs": [], |
199 |
| - "source": [ |
200 |
| - "from typing import Any, Dict, Optional\n", |
201 |
| - "\n", |
202 |
| - "from autogen_core import CacheStore\n", |
203 |
| - "from autogen_core.models import CHAT_CACHE_VALUE_TYPE, ChatCompletionCache\n", |
204 |
| - "\n", |
205 |
| - "\n", |
206 |
| - "# Simple CacheStore implementation using in-memory dict,\n", |
207 |
| - "# you can also use redis.Redis or diskcache.Cache\n", |
208 |
| - "class DictStore(CacheStore[CHAT_CACHE_VALUE_TYPE]):\n", |
209 |
| - " def __init__(self) -> None:\n", |
210 |
| - " self._store: dict[str, CHAT_CACHE_VALUE_TYPE] = {}\n", |
211 |
| - "\n", |
212 |
| - " def get(self, key: str, default: Optional[CHAT_CACHE_VALUE_TYPE] = None) -> Optional[CHAT_CACHE_VALUE_TYPE]:\n", |
213 |
| - " return self._store.get(key, default)\n", |
214 |
| - "\n", |
215 |
| - " def set(self, key: str, value: CHAT_CACHE_VALUE_TYPE) -> None:\n", |
216 |
| - " self._store[key] = value\n", |
217 |
| - "\n", |
218 |
| - "\n", |
219 |
| - "cached_client = ChatCompletionCache(model_client, DictStore())\n", |
220 |
| - "response = await cached_client.create(messages=messages)\n", |
221 |
| - "\n", |
222 |
| - "cached_response = await cached_client.create(messages=messages)\n", |
223 |
| - "print(cached_response.cached)" |
224 |
| - ] |
225 |
| - }, |
226 |
| - { |
227 |
| - "cell_type": "markdown", |
228 |
| - "metadata": {}, |
229 |
| - "source": [ |
230 |
| - "Inspecting `cached_client.total_usage()` (or `model_client.total_usage()`) before and after a cached response should yield idential counts.\n", |
231 |
| - "\n", |
232 |
| - "Note that the caching is sensitive to the exact arguments provided to `cached_client.create` or `cached_client.create_stream`, so changing `tools` or `json_output` arguments might lead to a cache miss." |
233 |
| - ] |
234 |
| - }, |
235 | 183 | {
|
236 | 184 | "cell_type": "markdown",
|
237 | 185 | "metadata": {},
|
|
373 | 321 | "```"
|
374 | 322 | ]
|
375 | 323 | },
|
| 324 | + { |
| 325 | + "cell_type": "markdown", |
| 326 | + "metadata": {}, |
| 327 | + "source": [ |
| 328 | + "## Caching Wrapper\n", |
| 329 | + "\n", |
| 330 | + "`autogen_core` implements a {py:class}`~autogen_core.models.ChatCompletionCache` that can wrap any {py:class}`~autogen_core.models.ChatCompletionClient`. Using this wrapper avoids incurring token usage when querying the underlying client with the same prompt multiple times. \n", |
| 331 | + "\n", |
| 332 | + "{py:class}`~autogen_core.models.ChatCompletionCache` uses a {py:class}`~autogen_core.CacheStore` protocol to allow duck-typing any storage object that has a pair of `get` & `set` methods (such as `redis.Redis` or `diskcache.Cache`). Here's an example of using `diskcache` for local caching:" |
| 333 | + ] |
| 334 | + }, |
| 335 | + { |
| 336 | + "cell_type": "code", |
| 337 | + "execution_count": null, |
| 338 | + "metadata": {}, |
| 339 | + "outputs": [ |
| 340 | + { |
| 341 | + "name": "stdout", |
| 342 | + "output_type": "stream", |
| 343 | + "text": [ |
| 344 | + "True\n" |
| 345 | + ] |
| 346 | + } |
| 347 | + ], |
| 348 | + "source": [ |
| 349 | + "from typing import Any, Dict, Optional\n", |
| 350 | + "\n", |
| 351 | + "from autogen_core.models import ChatCompletionCache\n", |
| 352 | + "from diskcache import Cache\n", |
| 353 | + "\n", |
| 354 | + "diskcache_client = Cache(\"/tmp/diskcache\")\n", |
| 355 | + "\n", |
| 356 | + "cached_client = ChatCompletionCache(model_client, diskcache_client)\n", |
| 357 | + "response = await cached_client.create(messages=messages)\n", |
| 358 | + "\n", |
| 359 | + "cached_response = await cached_client.create(messages=messages)\n", |
| 360 | + "print(cached_response.cached)" |
| 361 | + ] |
| 362 | + }, |
| 363 | + { |
| 364 | + "cell_type": "markdown", |
| 365 | + "metadata": {}, |
| 366 | + "source": [ |
| 367 | + "Inspecting `cached_client.total_usage()` (or `model_client.total_usage()`) before and after a cached response should yield idential counts.\n", |
| 368 | + "\n", |
| 369 | + "Note that the caching is sensitive to the exact arguments provided to `cached_client.create` or `cached_client.create_stream`, so changing `tools` or `json_output` arguments might lead to a cache miss." |
| 370 | + ] |
| 371 | + }, |
376 | 372 | {
|
377 | 373 | "cell_type": "markdown",
|
378 | 374 | "metadata": {},
|
|
673 | 669 | "name": "python",
|
674 | 670 | "nbconvert_exporter": "python",
|
675 | 671 | "pygments_lexer": "ipython3",
|
676 |
| - "version": "3.12.7" |
| 672 | + "version": "3.12.1" |
677 | 673 | }
|
678 | 674 | },
|
679 | 675 | "nbformat": 4,
|
|
0 commit comments