Add kernel support for AArch64 specific GGUF files, i.e. Q4_0__ #799

smpurkis · 2024-09-27T12:50:32Z

Hello,

llama.cpp recently added support for an AArch64 specific type of GGUF and AArch64 specific matmul kernels. Here is the merged PR ggerganov/llama.cpp#5780 (review)

Namely Q4_0_8_8, Q4_0_4_8 and more generic Q4_0_4_4 GGUF model formats.

EricLBuehler · 2024-11-10T03:07:58Z

@smpurkis thanks for the reference. Taking a look, this is on the radar.

smpurkis · 2024-11-10T07:19:56Z

@EricLBuehler I looked through the code and saw Candle is used for quantized tensors, so I've started looking/work on adding the datatype to Candle. huggingface/candle#2605

Could do with some guidance if that is the right place to add it?

smpurkis added the new feature New feature or request label Sep 27, 2024

smpurkis changed the title ~~Add kernel support for AArch64 specific GGUF files~~ Add kernel support for AArch64 specific GGUF files, i.e. Q4_0_*_* Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add kernel support for AArch64 specific GGUF files, i.e. Q4_0__ #799

Add kernel support for AArch64 specific GGUF files, i.e. Q4_0__ #799

smpurkis commented Sep 27, 2024

EricLBuehler commented Nov 10, 2024

smpurkis commented Nov 10, 2024

Add kernel support for AArch64 specific GGUF files, i.e. Q4_0_*_* #799

Add kernel support for AArch64 specific GGUF files, i.e. Q4_0_*_* #799

Comments

smpurkis commented Sep 27, 2024

EricLBuehler commented Nov 10, 2024

smpurkis commented Nov 10, 2024

Add kernel support for AArch64 specific GGUF files, i.e. Q4_0__ #799

Add kernel support for AArch64 specific GGUF files, i.e. Q4_0__ #799