Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP2 + CPU Offload + AdamW8bit issue #1931

Open
psinger opened this issue Mar 21, 2025 · 2 comments
Open

FSDP2 + CPU Offload + AdamW8bit issue #1931

psinger opened this issue Mar 21, 2025 · 2 comments

Comments

@psinger
Copy link

psinger commented Mar 21, 2025

I am having some strange issue with low bit optimizer and the combination of FSDP2 and CPU Offloading:

torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_method lerp(*(DTensor(local_tensor=FakeTensor(..., device='cuda:0', size=(2, 1536)), device_mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)), DTensor(local_tensor=FakeTensor(..., size=(2, 1536)), device_mesh=DeviceMesh('cuda', [0, 1]), placements=(Shard(dim=0),)), 0.09999999999999998), **{}): got RuntimeError('Unhandled FakeTensor Device Propagation for aten.lerp.Scalar, found two different devices cuda:0, cpu')
https://github.com/pytorch/ao/blob/main/torchao/optim/adam.py#L129

It works fine without CPU Offloading, but with it fails.
All params are on cpu device.

It works fine with regular AdamW.

Torch and torchao are on nightly.

Any ideas? Thanks

@supriyar
Copy link
Contributor

cc @gau-nernst @weifengpy

@gau-nernst
Copy link
Collaborator

Sorry for the late reply. I can look into this later this week...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants