Skip to content

Commit e49c726

Browse files
Add roadmap. (skypilot-org#1317)
* Add roadmap. * Update ROADMAP.md
1 parent 99fe087 commit e49c726

File tree

1 file changed

+59
-0
lines changed

1 file changed

+59
-0
lines changed

Diff for: ROADMAP.md

+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# SkyPilot Roadmap
2+
3+
This doc lists general directions of interest to facilitate community contributions.
4+
5+
Note that
6+
- This list is not meant to be comprehensive (i.e., new work items of interest may pop up)
7+
- Even though listed under a specific version, not all items need to be completed before we ship that version (i.e., some items can go into future versions)
8+
9+
## v0.3
10+
11+
### Managed Spot
12+
- Minimize the cost of the controller
13+
- Support running spot controller on an existing/local cluster
14+
- Reducing the fixed cost of the controller (e.g., allow setting controller VM type)
15+
- Supporting a higher number of pending/concurrent jobs
16+
- Framework-specific guides to add checkpointing/reloading using SkyPilot Storage
17+
18+
### Smarter Optimizer
19+
- Fine-grained optimizer: pick by cheapest zone order
20+
- Better consider data egress time/cost
21+
- Consider buckets/Storage objects in file_mounts
22+
- Optimizing the data placement for SkyPilot Storage local uploads
23+
- Use the optimizer to decide the bucket location
24+
25+
### Programmatic API
26+
- Refactor/extend the current API to *make it easy to programmatically use SkyPilot*
27+
- Expose core classes in docs
28+
29+
### Support more clouds
30+
- Refactoring of interfaces to ease adding new clouds
31+
- IBM Cloud
32+
- Explore support for low-cost clouds (e.g., lambda labs/runpod/jarvis labs)
33+
34+
### On-prem
35+
- Robustify the on-prem feature
36+
- Design for switching between cloud and on-prem
37+
- Explore/design of "local mode" to run SkyPilot tasks locally
38+
39+
### Faster launching speed
40+
- Consider a more minimal image
41+
- Azure speed investigation
42+
43+
### k8s support
44+
- Ray-on-k8s backend
45+
- To figure out: Launch a new k8s cluster? Launch SkyPilot Tasks to an existing k8s cluster?
46+
47+
### Cost: Optimization, Tracking, and Reporting
48+
- Track and show costs related to a job/cluster
49+
- For managed spot jobs, track and show %savings vs. on-demand
50+
- Optimizer: take into account disk costs
51+
52+
### Serverless
53+
- Design and prototype of a "serverless jobs" submission API and CLI
54+
- Initial use case: hundreds of hyperparameter tuning trials
55+
56+
### Backend
57+
- Support heterogeneous node types in a cluster (e.g., in RL, CPU actor(s) and GPU learner(s) in the same cluster)
58+
- Support CPUs as resource requirements
59+
- General robustness/UX improvements

0 commit comments

Comments
 (0)