-
Notifications
You must be signed in to change notification settings - Fork 736
Issues: kubeflow/trainer
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Explore
uv
project manager for Kubeflow Python SDK
area/sdk
kind/discussion
kind/feature
#2462
opened Feb 28, 2025 by
andreyvelich
KEP-2170: Revisit TrainJob Created condition status type
kind/feature
#2459
opened Feb 28, 2025 by
tenzen-y
KEP-2170: Add Kubeflow Trainer Pipeline Framework Concept page to Documentation
kind/documentation
#2458
opened Feb 28, 2025 by
tenzen-y
Distributed training with mutliple pods, with multi-gpu in each pod
#2456
opened Feb 28, 2025 by
githubthunder
Managing Pod Lifecycle in Distributed Training with TFJob
kind/feature
lifecycle/needs-triage
#2454
opened Feb 27, 2025 by
mnmhouse
Strategies for Deleting Successful Pods without Affecting Task Execution in TFJob
kind/bug
lifecycle/needs-triage
#2453
opened Feb 27, 2025 by
mnmhouse
Add unit tests that cover the
pkg/apply
package
area/testing
good first issue
help wanted
#2452
opened Feb 26, 2025 by
astefanutti
Support TensorFlow Runtime
area/gsoc
area/runtime
kind/feature
#2443
opened Feb 17, 2025 by
Electronic-Waste
Support JAX Runtimes
area/gsoc
area/runtime
kind/feature
#2442
opened Feb 17, 2025 by
Electronic-Waste
Export Models to Kubeflow Model Registry
area/gsoc
area/storage
kind/feature
#2438
opened Feb 14, 2025 by
Electronic-Waste
Support Volcano Scheduler in Kubeflow Trainer
area/gsoc
kind/feature
#2437
opened Feb 14, 2025 by
Electronic-Waste
Support Kubernetes v1.32
good first issue
help wanted
kind/feature
#2434
opened Feb 13, 2025 by
astefanutti
Enable GPU Testing for LLM Blueprints
area/gsoc
area/testing
kind/feature
#2432
opened Feb 11, 2025 by
andreyvelich
Reconsider pre-training and post-training phases for the Training Runtimes
kind/discussion
kind/feature
#2430
opened Feb 11, 2025 by
andreyvelich
Add the Config API for Kubeflow Trainer controller manager
area/api
good first issue
help wanted
kind/feature
#2420
opened Feb 5, 2025 by
andreyvelich
Add migration guide from Training Operator to Kubeflow Trainer V2
area/docs
good first issue
help wanted
#2412
opened Feb 4, 2025 by
andreyvelich
Some Prometheus metrics not being reported properly
kind/bug
lifecycle/needs-triage
#2408
opened Jan 31, 2025 by
ishaan-mehta
Cap
nproc_per_node
based on the CPU resources of the node for PyTorch TrainJob
#2407
opened Jan 31, 2025 by
astefanutti
Training Operator V2 Installation - Certificate error
kind/bug
lifecycle/needs-triage
#2404
opened Jan 25, 2025 by
Sharathmk99
KEP-2401: Kubeflow LLM Trainer V2
area/runtime
kind/feature
#2401
opened Jan 23, 2025 by
Electronic-Waste
[SDK] add option to specify pip flags
area/sdk
kind/feature
#2398
opened Jan 22, 2025 by
KPostOffice
Previous Next
ProTip!
What’s not been updated in a month: updated:<2025-01-28.