Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kernel support for AArch64 specific GGUF files, i.e. Q4_0_*_* #799

Open
smpurkis opened this issue Sep 27, 2024 · 2 comments
Open

Add kernel support for AArch64 specific GGUF files, i.e. Q4_0_*_* #799

smpurkis opened this issue Sep 27, 2024 · 2 comments
Labels
new feature New feature or request

Comments

@smpurkis
Copy link

Hello,

llama.cpp recently added support for an AArch64 specific type of GGUF and AArch64 specific matmul kernels. Here is the merged PR ggerganov/llama.cpp#5780 (review)

Namely Q4_0_8_8, Q4_0_4_8 and more generic Q4_0_4_4 GGUF model formats.

@smpurkis smpurkis added the new feature New feature or request label Sep 27, 2024
@smpurkis smpurkis changed the title Add kernel support for AArch64 specific GGUF files Add kernel support for AArch64 specific GGUF files, i.e. Q4_0_*_* Sep 27, 2024
@EricLBuehler
Copy link
Owner

@smpurkis thanks for the reference. Taking a look, this is on the radar.

@smpurkis
Copy link
Author

@EricLBuehler I looked through the code and saw Candle is used for quantized tensors, so I've started looking/work on adding the datatype to Candle. huggingface/candle#2605

Could do with some guidance if that is the right place to add it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants