Replies: 1 comment 5 replies
-
|
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
These are my GPUs. I have tried different optimisers, tried to switch mixed_precision to fp16. it is definitely running on all 4 GPUs. As soon as I get here:
2025-01-29 00:13:30,236 [INFO] Moving the diffusion transformer to GPU in torch.bfloat16 precision.
Here are my configs:
{ "mixed_precision": "bf16", #also tried fp16 "model_type": "lora", "pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev", "gradient_checkpointing": true, "cache_dir": "cache", "set_grads_to_none": true, "gradient_accumulation_steps": 1, "resume_from_checkpoint": "latest", "snr_gamma": 5, "num_train_epochs": 0, "max_train_steps": 10000, "metadata_update_interval": 65, "optimizer": "bnb-adamw8bit", #also tried adamw_bf16 "learning_rate": 0.0001, "lr_scheduler": "polynomial", "seed": 42, "lr_warmup_steps": 100, "output_dir": "output/models", "non_ema_revision": false, "aspect_bucket_rounding": 2, "inference_scheduler_timestep_spacing": "trailing", "training_scheduler_timestep_spacing": "trailing", "report_to": "wandb", "lr_end": 1e-08, "compress_disk_cache": true, "push_to_hub": true, "hub_model_id": "simpletuner-lora", "push_checkpoints_to_hub": true, "model_family": "flux", "disable_benchmark": false, "train_batch": 1, "max_workers": 32, "read_batch_size": 25, "write_batch_size": 64, "caption_dropout_probability": 0.1, "torch_num_threads": 8, "image_processing_batch_size": 1, "vae_batch_size": 1, "validation_prompt": "A photo-realistic image of a blonde woman", "num_validation_images": 1, "validation_num_inference_steps": 20, "validation_seed": 42, "minimum_image_size": 0, "resolution": 256, "validation_resolution": "1024x1024", "resolution_type": "pixel_area", "lycoris_config": "config/lycoris_config.json", "lora_type": "lycoris", "base_model_precision": "fp8-quanto", "checkpointing_steps": 500, "checkpoints_total_limit": 5, "validation_steps": 500, "tracker_run_name": "simpletuner-lora", "tracker_project_name": "flux-training", "validation_guidance": 3.0, "validation_guidance_real": 1.0, "validation_guidance_rescale": 0.0, "validation_negative_prompt": "blurry, cropped, ugly" }
{"algo": "lora", "multiplier": 1.0, "linear_dim": 4096, "linear_alpha": 1, "factor": 12, "apply_preset": {"target_module": ["Attention", "FeedForward"], "module_algo_map": {"Attention": {"factor": 8}, "FeedForward": {"factor": 4}}}}
export TRAINING_NUM_PROCESSES=4 export TRAINING_NUM_MACHINES=1 export ACCELERATE_MACHINE_RANK=0 export ACCELERATE_NUM_MACHINES=1 export ACCELERATE_EXTRA_ARGS=--multi_gpu export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True export SIMPLETUNER_LOG_LEVEL=INFO export SIMPLETUNER_TRAINING_LOOP_LOG_LEVEL=INFO
what could be the reason and what else could i do to try and not run into ooo?
Beta Was this translation helpful? Give feedback.
All reactions