Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4721
server: fix type promotion typo causing crashes w/ --jinja w/o tools …
b4720
vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528) * vulkan: initial support for IQ1_S and IQ1_M quantizations * vulkan: define MMV kernels for IQ1 quantizations * devops: increase timeout of Vulkan tests again * vulkan: simplify ifdef for init_iq_shmem
b4719
llguidance build fixes for Windows (#11664) * setup windows linking for llguidance; thanks @phil-scott-78 * add build instructions for windows and update script link * change VS Community link from DE to EN * whitespace fix
b4718
opencl: Fix rope and softmax (#11833) * opencl: fix `ROPE` * opencl: fix `SOFT_MAX` * Add fp16 variant * opencl: enforce subgroup size for `soft_max`
b4717
cuda : add ampere to the list of default architectures (#11870)
b4716
docker : drop to CUDA 12.4 (#11869) * docker : drop to CUDA 12.4 * docker : update readme [no ci]
b4714
ggml: optimize some vec dot functions for LoongArch ASX (#11842) * Optimize ggml_vec_dot_q3_K_q8_K for LoongArch ASX * Optimize ggml_vec_dot_q4_K_q8_K for LoongArch ASX * Optimize ggml_vec_dot_q6_K_q8_K for LoongArch ASX * Optimize ggml_vec_dot_q5_K_q8_K for LoongArch ASX * Optimize ggml_vec_dot_q2_K_q8_K for LoongArch ASX * Optimize mul_sum_i8_pairs_float for LoongArch ASX * Optimize ggml_vec_dot_iq4_xs_q8_K for LoongArch ASX
b4713
vulkan: linux builds + small subgroup size fixes (#11767) * mm subgroup size * upload vulkan x86 builds
b4712
llama-bench : fix unexpected global variable initialize sequence issu…
b4710
llamafile: use member variable instead of constant for iq4nlt (#11780)