-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gaudi: Fix the pipeline failed issue with hpu device #36990
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: yuanwu <[email protected]>
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
Signed-off-by: yuanwu <[email protected]>
For the make fixup issue, Line 812 and line 813 cannot be changed in order. Because when import habana_frameworks.torch, some torch functions will be replaced
|
@IlyasMoutawwakil Please help to review. |
I believe that starting from Pytorch 2.6 and Synapse 1.20 (the one we targeted for upstream), torch.hpu doesn't need patching by habana_frameworks. |
what's the actual issue here ? please provide the code to reproduce it, the error alone is not enough |
Okay I see what's happening here, I only targeted non-lazy mode when integrating hpu in transformers, and weirdly, if you have root@05f434bf385a:/home/ubuntu/workspace/transformers# python
Python 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.hpu
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 2681, in __getattr__
raise AttributeError(f"module '{__name__}' has no attribute '{name}'")
AttributeError: module 'torch' has no attribute 'hpu' vs root@05f434bf385a:/home/ubuntu/workspace/transformers# PT_HPU_LAZY_MODE=0 python
Python 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Calling add_step_closure function does not have any effect. It's lazy mode only functionality. (warning logged once)
Calling mark_step function does not have any effect. It's lazy mode only functionality. (warning logged once)
Calling iter_mark_step function does not have any effect. It's lazy mode only functionality. (warning logged once)
>>> torch.hpu
<module 'habana_frameworks.torch.hpu' from '/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py'>
>>> This is a very implicit behavior that I can't see documented anywhere so the least we could do is make it explicit here with something like this: import torch
if os.environ.get("PT_HPU_LAZY_MODE", "1") == "1":
# import habana_frameworks.torch in case of lazy mode to patch torch with torch.hpu
import habana_frameworks.torch
if not hasattr(torch, "hpu") or not torch.hpu.is_available():
return False |
Ok. |
Signed-off-by: yuanwu <[email protected]>
Done |
@yuanwu2017 please run make style |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
Fix the pipeline failed issue when using the hpu device.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.