Constantly out of memory. What else can I do? #1306

ktoetotam · 2025-01-29T00:39:39Z

ktoetotam
Jan 29, 2025

These are my GPUs. I have tried different optimisers, tried to switch mixed_precision to fp16. it is definitely running on all 4 GPUs. As soon as I get here:

2025-01-29 00:13:30,236 [INFO] Moving the diffusion transformer to GPU in torch.bfloat16 precision.

Here are my configs:

{ "mixed_precision": "bf16", #also tried fp16 "model_type": "lora", "pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev", "gradient_checkpointing": true, "cache_dir": "cache", "set_grads_to_none": true, "gradient_accumulation_steps": 1, "resume_from_checkpoint": "latest", "snr_gamma": 5, "num_train_epochs": 0, "max_train_steps": 10000, "metadata_update_interval": 65, "optimizer": "bnb-adamw8bit", #also tried adamw_bf16 "learning_rate": 0.0001, "lr_scheduler": "polynomial", "seed": 42, "lr_warmup_steps": 100, "output_dir": "output/models", "non_ema_revision": false, "aspect_bucket_rounding": 2, "inference_scheduler_timestep_spacing": "trailing", "training_scheduler_timestep_spacing": "trailing", "report_to": "wandb", "lr_end": 1e-08, "compress_disk_cache": true, "push_to_hub": true, "hub_model_id": "simpletuner-lora", "push_checkpoints_to_hub": true, "model_family": "flux", "disable_benchmark": false, "train_batch": 1, "max_workers": 32, "read_batch_size": 25, "write_batch_size": 64, "caption_dropout_probability": 0.1, "torch_num_threads": 8, "image_processing_batch_size": 1, "vae_batch_size": 1, "validation_prompt": "A photo-realistic image of a blonde woman", "num_validation_images": 1, "validation_num_inference_steps": 20, "validation_seed": 42, "minimum_image_size": 0, "resolution": 256, "validation_resolution": "1024x1024", "resolution_type": "pixel_area", "lycoris_config": "config/lycoris_config.json", "lora_type": "lycoris", "base_model_precision": "fp8-quanto", "checkpointing_steps": 500, "checkpoints_total_limit": 5, "validation_steps": 500, "tracker_run_name": "simpletuner-lora", "tracker_project_name": "flux-training", "validation_guidance": 3.0, "validation_guidance_real": 1.0, "validation_guidance_rescale": 0.0, "validation_negative_prompt": "blurry, cropped, ugly" }

{"algo": "lora", "multiplier": 1.0, "linear_dim": 4096, "linear_alpha": 1, "factor": 12, "apply_preset": {"target_module": ["Attention", "FeedForward"], "module_algo_map": {"Attention": {"factor": 8}, "FeedForward": {"factor": 4}}}}

export TRAINING_NUM_PROCESSES=4 export TRAINING_NUM_MACHINES=1 export ACCELERATE_MACHINE_RANK=0 export ACCELERATE_NUM_MACHINES=1 export ACCELERATE_EXTRA_ARGS=--multi_gpu export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True export SIMPLETUNER_LOG_LEVEL=INFO export SIMPLETUNER_TRAINING_LOOP_LOG_LEVEL=INFO

what could be the reason and what else could i do to try and not run into ooo?

bghira · 2025-01-29T00:45:04Z

bghira
Jan 29, 2025
Maintainer

--quantize_via=cpu probably will fix this for you.

5 replies

bghira Jan 29, 2025
Maintainer

but fp8 is probably not low enough for 16G GPUs, I think you need NF4

or DeepSpeed to be configured, which is incompatible with LoRA/Lycoris, and probably will be very very slow as it trains the full 12B model

ktoetotam Jan 29, 2025
Author

Thanks, I am trying this and still OOO

{ "mixed_precision": "bf16", "model_type": "lora", "pretrained_model_name_or_path": "black-forest-labs/FLUX.1-dev", "gradient_checkpointing": true, "cache_dir": "cache", "set_grads_to_none": true, "gradient_accumulation_steps": 1, "resume_from_checkpoint": "latest", "snr_gamma": 5, "num_train_epochs": 0, "max_train_steps": 10000, "metadata_update_interval": 65, "optimizer": "bnb-lion8bit-paged", "learning_rate": 0.0001, "lr_scheduler": "polynomial", "seed": 42, "lr_warmup_steps": 100, "output_dir": "output/models", "non_ema_revision": false, "aspect_bucket_rounding": 2, "inference_scheduler_timestep_spacing": "trailing", "training_scheduler_timestep_spacing": "trailing", "report_to": "wandb", "lr_end": 1e-08, "compress_disk_cache": true, "push_to_hub": true, "hub_model_id": "simpletuner-lora", "push_checkpoints_to_hub": true, "model_family": "flux", "disable_benchmark": false, "train_batch": 1, "max_workers": 32, "read_batch_size": 25, "write_batch_size": 64, "caption_dropout_probability": 0.1, "torch_num_threads": 8, "image_processing_batch_size": 1, "vae_batch_size": 1, "validation_prompt": "A photo-realistic image of a blonde woman", "num_validation_images": 1, "validation_num_inference_steps": 20, "validation_seed": 42, "minimum_image_size": 0, "resolution": 512, "validation_resolution": "1024x1024", "resolution_type": "pixel_area", "lycoris_config": "config/lycoris_config.json", "lora_type": "lycoris", "base_model_precision": "nf4-bnb", "checkpointing_steps": 500, "checkpoints_total_limit": 5, "validation_steps": 500, "tracker_run_name": "simpletuner-lora", "tracker_project_name": "flux-training", "validation_guidance": 3.0, "validation_guidance_real": 1.0, "validation_guidance_rescale": 0.0, "validation_negative_prompt": "blurry, cropped, ugly", "quantize_via": "cpu" }

ktoetotam Jan 29, 2025
Author

I have uninstalled deepspeed too

I am following this
https://github.com/bghira/SimpleTuner/blob/main/documentation/quickstart/FLUX.md

Also it should be "base_model_precision": "nf4-bnb", not bnb-nf4 as in the docu

bghira Jan 29, 2025
Maintainer

can you try int2-quanto as a last ditch effort? i think it might just be too small of a system to load and run Flux. multiGPU has its own overhead, so you may need to disable that too.

ktoetotam Jan 29, 2025
Author

"linear_dim": 8

I tried this and it went to training and ooo much later. Not sure if that is too small though.
Trying int2-quanto now

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constantly out of memory. What else can I do? #1306

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Constantly out of memory. What else can I do? #1306

ktoetotam Jan 29, 2025

Replies: 1 comment · 5 replies

bghira Jan 29, 2025 Maintainer

bghira Jan 29, 2025 Maintainer

ktoetotam Jan 29, 2025 Author

ktoetotam Jan 29, 2025 Author

bghira Jan 29, 2025 Maintainer

ktoetotam Jan 29, 2025 Author

ktoetotam
Jan 29, 2025

Replies: 1 comment 5 replies

bghira
Jan 29, 2025
Maintainer

bghira Jan 29, 2025
Maintainer

ktoetotam Jan 29, 2025
Author

ktoetotam Jan 29, 2025
Author

bghira Jan 29, 2025
Maintainer

ktoetotam Jan 29, 2025
Author