FSDP2 + CPU Offload + AdamW8bit issue #1931

psinger · 2025-03-21T11:20:41Z

I am having some strange issue with low bit optimizer and the combination of FSDP2 and CPU Offloading:

torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_method lerp(*(DTensor(local_tensor=FakeTensor(..., device='cuda:0', size=(2, 1536)), device_mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)), DTensor(local_tensor=FakeTensor(..., size=(2, 1536)), device_mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)), 0.09999999999999998), **{}): got RuntimeError('Unhandled FakeTensor Device Propagation for aten.lerp.Scalar, found two different devices cuda:0, cpu')
https://github.com/pytorch/ao/blob/main/torchao/optim/adam.py#L129

It works fine without CPU Offloading, but with it fails.
All params are on cpu device.

It works fine with regular AdamW.

Torch and torchao are on nightly.

Any ideas? Thanks

The text was updated successfully, but these errors were encountered:

supriyar · 2025-03-21T20:23:27Z

cc @gau-nernst @weifengpy

gau-nernst · 2025-04-01T13:57:31Z

Sorry for the late reply. I can look into this later this week...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FSDP2 + CPU Offload + AdamW8bit issue #1931

FSDP2 + CPU Offload + AdamW8bit issue #1931

psinger commented Mar 21, 2025 •

edited

Loading

supriyar commented Mar 21, 2025

gau-nernst commented Apr 1, 2025

FSDP2 + CPU Offload + AdamW8bit issue #1931

FSDP2 + CPU Offload + AdamW8bit issue #1931

Comments

psinger commented Mar 21, 2025 • edited Loading

supriyar commented Mar 21, 2025

gau-nernst commented Apr 1, 2025

psinger commented Mar 21, 2025 •

edited

Loading