Skip to content

Commit 7e469c9

Browse files
committed
Adding Design Principles
1 parent 16ded66 commit 7e469c9

File tree

2 files changed

+53
-0
lines changed

2 files changed

+53
-0
lines changed

Diff for: mkdocs.yml

+1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ nav:
5252
- Introduction: index.md
5353
- Concepts:
5454
API Overview: concepts/api-overview.md
55+
Design Principles: concepts/design-principles.md
5556
Conformance: concepts/conformance.md
5657
Roles and Personas: concepts/roles-and-personas.md
5758
- Implementations: implementations.md

Diff for: site-src/concepts/design-principles.md

+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Design Principles
2+
3+
These principles guide our efforts to build flexible [Gateway API] extensions
4+
that empower the development of high-performance [AI Inference] routing
5+
technologies—balancing rapid delivery with long-term growth.
6+
7+
!!! note "Inference Gateways"
8+
9+
For simplicity, we'll refer to Gateway API Gateways which are
10+
composed together with AI Inference extensions as "Inference Gateways"
11+
throughout this document.
12+
13+
[Gateway]:https://github.com/kubernetes-sigs/gateway-api
14+
[AI Inference]:https://www.arm.com/glossary/ai-inference
15+
16+
17+
## Prioritize stability of the core interfaces
18+
19+
The most critical part of this project is the interfaces between components. To encourage both controller and extension developers to integrate with this project, we need to prioritize the stability of these interfaces.
20+
Although we can extend these interfaces in the future, it’s critical the core is stable as soon as possible.
21+
22+
When describing "core interfaces", we are referring to both of the following:
23+
24+
### 1. Gateway -> Endpoint Picker
25+
At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to.
26+
27+
### 2. Endpoint Picker -> Model Server Framework
28+
This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.
29+
30+
31+
## Our presets are finely tuned
32+
33+
Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization.
34+
35+
36+
## Encourage innovation via extensibility
37+
38+
This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic.
39+
40+
41+
## Objectives over instructions
42+
43+
The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.
44+
45+
46+
## Composable components and reducing reinvention
47+
While it may be tempting to develop an entirely new AI-focused Gateway, many essential routing capabilities are already well established by Kubernetes. Our focus is on creating a layer of composable components that can be assembled together with other Kubernetes components. This approach empowers engineers to use our solution as a building block—combining established technologies like Gateway API with our extensible model to build higher level solutions.
48+
49+
50+
## Additions to the API should be carefully prioritized
51+
52+
Every addition to the API should take the principles described above into account. Given that the goal of the API is to encourage a highly extensible ecosystem, each additional feature in the API is raising the barrier for entry to any new controller or extension. Our top priority should be to focus on concepts that we expect to be broadly implementable and useful. The extensible nature of this API will allow each individual implementation to experiment with new features via custom flags or APIs before they become part of the core API surface.

0 commit comments

Comments
 (0)