Gradient accumulation with distributed training #20578

IvanUkhov · 2024-12-02T16:12:29Z

IvanUkhov
Dec 2, 2024

Does anybody know if the implemented gradient accumulation can run under tf.distribute.MirroredStrategy?

IvanUkhov · 2024-12-03T10:21:34Z

Opened an issue: #20582.

0 replies

abhaskumarsinha · 2024-12-03T14:33:13Z

If you are using tf backend, you can use gradient accumulation old TF way with it.

0 replies

IvanUkhov · 2024-12-03T14:35:00Z

What is this old way?

0 replies

roebel · 2025-03-20T10:08:41Z

I wanted to try this as well, but the problem currently appears to be that MirroredStrategy is broken in the latest versions of keras. see #21061

Besides that, I don't think there is an official old way. That's why there were a few wrappers in the wild as for example this one here: https://gradientaccumulator.readthedocs.io/en/latest/background/gradient_accumulation.html

0 replies