docs: fix links in docs (ai-dynamo#256)

dmitry-tokarev-nv · nv-anants · web-flow · commit 2d7f9ae2fc34 · 2025-03-18T03:40:31.000-04:00
Co-authored-by: Anant Sharma &lt;anants@nvidia.com&gt;
diff --git a/README.md b/README.md
@@ -41,7 +41,7 @@ The following examples require a few system level packages.
 apt-get update
 DEBIAN_FRONTEND=noninteractive apt-get install -yq python3-dev libucx0
 
-pip install ai-dynamo nixl vllm==0.7.2+dynamo
+pip install ai-dynamo[all]
 ```
 
 > [!NOTE]
diff --git a/components/metrics/README.md b/components/metrics/README.md
@@ -65,7 +65,7 @@ metrics --component my_component --endpoint my_endpoint
 ### Real Worker
 
 To run a more realistic deployment to gathering metrics from,
-see the examples in [deploy/examples/llm](deploy/examples/llm).
+see the examples in [examples/llm](../../examples/llm).
 
 For example, for a VLLM + KV Routing based deployment that
 exposes statistics on an endpoint labeled
@@ -88,7 +88,7 @@ endpoint name used for python-based workers that register a `KvMetricsPublisher`
 
 To visualize the metrics being exposed on the Prometheus endpoint,
 see the Prometheus and Grafana configurations in
-[deploy/metrics](deploy/metrics):
+[deploy/metrics](../../deploy/metrics):
 ```bash
 docker compose -f deploy/docker-compose.yml --profile metrics up -d
 ```
diff --git a/deploy/dynamo/sdk/docs/sdk/README.md b/deploy/dynamo/sdk/docs/sdk/README.md
@@ -11,7 +11,7 @@
 
 # Introduction
 
-Dynamo is a flexible and performant distributed inferencing solution for large-scale deployments. It is an ecosystem of tools, frameworks, and abstractions that makes the design, customization, and deployment of frontier-level models onto datacenter-scale infrastructure easy to reason about and optimized for your specific inferencing workloads. Dynamo's core is written in Rust and contains a set of well-defined Python bindings. Docs and examples for those can be found [here](../../../../README.md).
+Dynamo is a flexible and performant distributed inferencing solution for large-scale deployments. It is an ecosystem of tools, frameworks, and abstractions that makes the design, customization, and deployment of frontier-level models onto datacenter-scale infrastructure easy to reason about and optimized for your specific inferencing workloads. Dynamo's core is written in Rust and contains a set of well-defined Python bindings. Docs and examples for those can be found [here](../../../../../README.md).
 
 Dynamo SDK is a layer on top of the core. It is a Python framework that makes it easy to create inference graphs and deploy them locally and onto a target K8s cluster. The SDK was heavily inspired by [BentoML's](https://github.com/bentoml/BentoML) open source deployment patterns and leverages many of its core primitives. The Dynamo CLI is a companion tool that allows you to spin up an inference pipeline locally, containerize it, and deploy it. You can find a toy hello-world example [here](../../README.md).
 
diff --git a/docs/guides/README.md b/docs/guides/README.md
@@ -64,7 +64,7 @@ Distributed deployment where prefill and decode are done by separate workers tha
 
 ### Prerequisites
 
-Start required services (etcd and NATS) using [Docker Compose](/deploy/docker-compose.yml)
+Start required services (etcd and NATS) using [Docker Compose](../../deploy/docker-compose.yml)
 ```bash
 docker compose -f deploy/docker-compose.yml up -d
 ```
diff --git a/docs/kv_cache_manager.md b/docs/kv_cache_manager.md
@@ -12,7 +12,7 @@ The Dynamo KV Cache Manager feature addresses this challenge by enabling the off
 The Dynamo KV Cache Manager uses advanced caching policies that prioritize placing frequently accessed data in GPU memory, while less accessed data is moved to shared CPU memory, SSDs, or networked object storage. It incorporates eviction policies that strike a balance between over-caching (which can introduce lookup latencies) and under-caching (which leads to missed lookups and KV cache re-computation).
 Additionally, this feature can manage KV cache across multiple GPU nodes, supporting both distributed and disaggregated inference serving, and offers hierarchical caching capabilities, creating offloading strategies at the GPU, node, and cluster levels.
 
-The Dynamo KV Cache Manager is designed to be framework-agnostic to support various backends, including TensorRT-TLLM, vLLM, and SGLang, and to facilitate the scaling of KV cache storage across large, distributed clusters using NVLink, NVIDIA Quantum switches, and NVIDIA Spectrum switches. It integrates with [NIXL](https://github.com/ai-dynamo/nixl/blob/omrik/documentation/docs/nixl.md) to enable data transfers across different worker instances and storage backends.
+The Dynamo KV Cache Manager is designed to be framework-agnostic to support various backends, including TensorRT-TLLM, vLLM, and SGLang, and to facilitate the scaling of KV cache storage across large, distributed clusters using NVLink, NVIDIA Quantum switches, and NVIDIA Spectrum switches. It integrates with [NIXL](https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md) to enable data transfers across different worker instances and storage backends.
 
 ## Design
 
diff --git a/examples/llm/README.md b/examples/llm/README.md
@@ -64,7 +64,7 @@ sequenceDiagram
 
 ### Prerequisites
 
-Start required services (etcd and NATS) using [Docker Compose](/deploy/docker-compose.yml)
+Start required services (etcd and NATS) using [Docker Compose](../../deploy/docker-compose.yml)
 ```bash
 docker compose -f deploy/docker-compose.yml up -d
 ```
diff --git a/launch/README.md b/launch/README.md
@@ -77,7 +77,7 @@ E.g. https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/blob/main/Llama
 
 Download model file:
 ```
-curl -L -o Llama-3.2-3B-Instruct-Q4_K_M.gguf "https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/blob/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf?download=true"
+curl -L -o Llama-3.2-3B-Instruct-Q4_K_M.gguf "https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf?download=true"
 ```
 
 ## Run a model from local file
diff --git a/lib/bindings/python/README.md b/lib/bindings/python/README.md
@@ -50,7 +50,7 @@ maturin develop --uv
 
 ## Pre-requisite
 
-See [README.md](/lib/runtime/README.md).
+See [README.md](../../runtime/README.md#️-prerequisites).
 
 ## Hello World Example
 
diff --git a/lib/runtime/README.md b/lib/runtime/README.md
@@ -44,7 +44,7 @@ cargo test
 
 The simplest way to deploy the pre-requisite services is using
 [docker-compose](https://docs.docker.com/compose/install/linux/),
-defined in the project's root [docker-compose.yml](docker-compose.yml).
+defined in [deploy/docker-compose.yml](../../deploy/docker-compose.yml).
 
 ```
 docker-compose up -d
@@ -109,7 +109,7 @@ Annotated { data: Some("d"), id: None, event: None, comment: None }
 
 #### Python
 
-See the [README.md](/lib/bindings/python/README.md) for details
+See the [README.md](../bindings/python/README.md) for details
 
 The Python and Rust `hello_world` client and server examples are interchangeable,
 so you can start the Python `server.py` and talk to it from the Rust `client`.
diff --git a/support_matrix.md b/support_matrix.md
@@ -39,12 +39,12 @@ If you are using a **GPU**, the following GPU models and architectures are suppo
 | **Dependency**   | **Version** |
 |------------------|-------------|
 |**Base Container**|    25.01    |
-| **vLLM**         |0.7.2+dynamo*|
+|**ai-dynamo-vllm**|    0.7.2*   |
 |**TensorRT-LLM**  |    0.19.0** |
 |**NIXL**          |    0.1.0    |
 
 > **Note**:
-> - *v0.7.2+dynamo is a customized patch of v0.7.2 from vLLM.
+> - *ai-dynamo-vllm v0.7.2 is a customized patch of v0.7.2 from vLLM.
 > - **The specific version of TensorRT-LLM (planned v0.19.0) that will be supported by Dynamo is subject to change.
 
 
@@ -54,4 +54,4 @@ If you are using a **GPU**, the following GPU models and architectures are suppo
 - **Wheels**: Pre-built Python wheels are only available for **x86_64 Linux**. No wheels are available for other platforms at this time.
 - **Container Images**: We distribute only the source code for container images, and only **x86_64 Linux** is supported for these. Users must build the container image from source if they require it.
 
-Once you've confirmed that your platform and architecture are compatible, you can install **Dynamo** by following the instructions in the [Quick Start Guide](https://github.com/ai-dynamo/dynamo/?tab=readme-ov-file#quick-start).
+Once you've confirmed that your platform and architecture are compatible, you can install **Dynamo** by following the instructions in the [Quick Start Guide](https://github.com/ai-dynamo/dynamo/blob/main/README.md#installation).