From a41498255aae9f24ebb9b2c17ed6ee9e7302cc6f Mon Sep 17 00:00:00 2001
From: Ariya Hidayat <ariya.hidayat@gmail.com>
Date: Sat, 28 Dec 2024 17:35:57 -0800
Subject: [PATCH] README: Cortex's instructions

---
 README.md | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 74e28e8..b7bdd58 100644
--- a/README.md
+++ b/README.md
@@ -37,7 +37,7 @@ echo "Translate into German: thank you" | ./ask-llm.py
 
 ## Using Local LLM Servers
 
-Supported local LLM servers include [llama.cpp](https://github.com/ggerganov/llama.cpp), [Jan](https://jan.ai), [Ollama](https://ollama.com), [LocalAI](https://localai.io), [LM Studio](https://lmstudio.ai), and [Msty](https://msty.app).
+Supported local LLM servers include [llama.cpp](https://github.com/ggerganov/llama.cpp), [Jan](https://jan.ai), [Ollama](https://ollama.com), [Cortex](https://cortex.so), [LocalAI](https://localai.io), [LM Studio](https://lmstudio.ai), and [Msty](https://msty.app).
 
 To utilize [llama.cpp](https://github.com/ggerganov/llama.cpp) locally with its inference engine, load a quantized model like [Llama-3.2 3B](https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF) or [Phi-3.5 Mini](https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF). Then set the `LLM_API_BASE_URL` environment variable:
 ```bash
@@ -58,6 +58,12 @@ export LLM_API_BASE_URL=http://127.0.0.1:11434/v1
 export LLM_CHAT_MODEL='llama3.2'
 ```
 
+To use [Cortex](https://cortex.so) local inference, pull a model (such as `llama3.2` or `phi-3.5`, among [many others](https://cortex.so/models/)) and ensure that its API server is running, and then configure these environment variables:
+```bash
+export LLM_API_BASE_URL=http://localhost:39281/v1
+export LLM_CHAT_MODEL='llama3.2:3b-gguf-q4-km'
+```
+
 For [LocalAI](https://localai.io), initiate its container and adjust the environment variable `LLM_API_BASE_URL`:
 ```bash
 docker run -ti -p 8080:8080 localai/localai llama-3.2-3b-instruct:q4_k_m