Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warning] loading model fails! #18

Open
Godlovecui opened this issue Nov 8, 2024 · 1 comment
Open

[warning] loading model fails! #18

Godlovecui opened this issue Nov 8, 2024 · 1 comment

Comments

@Godlovecui
Copy link

Godlovecui commented Nov 8, 2024

Hello,
The quantized file suffix is ".safetensor", however, when I execute ./bin/llama_example, it output the below warning info,
image
where can I get these ".bin" files?
Thank you!

@lswzjuer
Copy link
Contributor

lswzjuer commented Feb 5, 2025

Hello, where did you get the W8A16 model weights? We open source the simulated quantized ckpts, which is only used for accuracy verification. If actual inference is performed, packing preprocessing is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants