|
| 1 | +# SkyPilot Roadmap |
| 2 | + |
| 3 | +This doc lists general directions of interest to facilitate community contributions. |
| 4 | + |
| 5 | +Note that |
| 6 | +- This list is not meant to be comprehensive (i.e., new work items of interest may pop up) |
| 7 | +- Even though listed under a specific version, not all items need to be completed before we ship that version (i.e., some items can go into future versions) |
| 8 | + |
| 9 | +## v0.3 |
| 10 | + |
| 11 | +### Managed Spot |
| 12 | +- Minimize the cost of the controller |
| 13 | + - Support running spot controller on an existing/local cluster |
| 14 | + - Reducing the fixed cost of the controller (e.g., allow setting controller VM type) |
| 15 | +- Supporting a higher number of pending/concurrent jobs |
| 16 | +- Framework-specific guides to add checkpointing/reloading using SkyPilot Storage |
| 17 | + |
| 18 | +### Smarter Optimizer |
| 19 | +- Fine-grained optimizer: pick by cheapest zone order |
| 20 | +- Better consider data egress time/cost |
| 21 | + - Consider buckets/Storage objects in file_mounts |
| 22 | +- Optimizing the data placement for SkyPilot Storage local uploads |
| 23 | + - Use the optimizer to decide the bucket location |
| 24 | + |
| 25 | +### Programmatic API |
| 26 | +- Refactor/extend the current API to *make it easy to programmatically use SkyPilot* |
| 27 | +- Expose core classes in docs |
| 28 | + |
| 29 | +### Support more clouds |
| 30 | +- Refactoring of interfaces to ease adding new clouds |
| 31 | +- IBM Cloud |
| 32 | +- Explore support for low-cost clouds (e.g., lambda labs/runpod/jarvis labs) |
| 33 | + |
| 34 | +### On-prem |
| 35 | +- Robustify the on-prem feature |
| 36 | +- Design for switching between cloud and on-prem |
| 37 | +- Explore/design of "local mode" to run SkyPilot tasks locally |
| 38 | + |
| 39 | +### Faster launching speed |
| 40 | +- Consider a more minimal image |
| 41 | +- Azure speed investigation |
| 42 | + |
| 43 | +### k8s support |
| 44 | +- Ray-on-k8s backend |
| 45 | + - To figure out: Launch a new k8s cluster? Launch SkyPilot Tasks to an existing k8s cluster? |
| 46 | + |
| 47 | +### Cost: Optimization, Tracking, and Reporting |
| 48 | +- Track and show costs related to a job/cluster |
| 49 | +- For managed spot jobs, track and show %savings vs. on-demand |
| 50 | +- Optimizer: take into account disk costs |
| 51 | + |
| 52 | +### Serverless |
| 53 | +- Design and prototype of a "serverless jobs" submission API and CLI |
| 54 | + - Initial use case: hundreds of hyperparameter tuning trials |
| 55 | + |
| 56 | +### Backend |
| 57 | +- Support heterogeneous node types in a cluster (e.g., in RL, CPU actor(s) and GPU learner(s) in the same cluster) |
| 58 | +- Support CPUs as resource requirements |
| 59 | +- General robustness/UX improvements |
0 commit comments