We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When I start the engine it loads everything. But after first prompt it crashes
I use this command:
./start_linux.sh --model llama-2-7b-chat.Q8_0.gguf --share
Model was downloaded from here: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF
No response
$ ./start_linux.sh --model llama-2-7b-chat.Q8_0.gguf --share OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k 10:26:32-717305 INFO Starting Text generation web UI 10:26:32-721474 WARNING The gradio "share link" feature uses a proprietary executable to create a reverse tunnel. Use it with care. 10:26:32-722758 WARNING You are potentially exposing the web UI to the entire internet without any access password. You can create one with the "--gradio-auth" flag like this: --gradio-auth username:password Make sure to replace username:password with your own. 10:26:33-523404 INFO Loading "llama-2-7b-chat.Q8_0.gguf" 10:26:34-444027 INFO llama.cpp weights detected: "models/llama-2-7b-chat.Q8_0.gguf" ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 8 CUDA devices: Device 0: NVIDIA H200, compute capability 9.0, VMM: yes Device 1: NVIDIA H200, compute capability 9.0, VMM: yes Device 2: NVIDIA H200, compute capability 9.0, VMM: yes Device 3: NVIDIA H200, compute capability 9.0, VMM: yes Device 4: NVIDIA H200, compute capability 9.0, VMM: yes Device 5: NVIDIA H200, compute capability 9.0, VMM: yes Device 6: NVIDIA H200, compute capability 9.0, VMM: yes Device 7: NVIDIA H200, compute capability 9.0, VMM: yes llama_model_load_from_file: using device CUDA0 (NVIDIA H200) - 141931 MiB free llama_model_load_from_file: using device CUDA1 (NVIDIA H200) - 141913 MiB free llama_model_load_from_file: using device CUDA2 (NVIDIA H200) - 141913 MiB free llama_model_load_from_file: using device CUDA3 (NVIDIA H200) - 141913 MiB free llama_model_load_from_file: using device CUDA4 (NVIDIA H200) - 141913 MiB free llama_model_load_from_file: using device CUDA5 (NVIDIA H200) - 141913 MiB free llama_model_load_from_file: using device CUDA6 (NVIDIA H200) - 141913 MiB free llama_model_load_from_file: using device CUDA7 (NVIDIA H200) - 141913 MiB free llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from models/llama-2-7b-chat.Q8_0.gguf (version GGUF V2) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = LLaMA v2 llama_model_loader: - kv 2: llama.context_length u32 = 4096 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 10: general.file_type u32 = 7 llama_model_loader: - kv 11: tokenizer.ggml.model str = llama llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 18: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q8_0: 226 tensors llm_load_vocab: control token: 2 '</s>' is not marked as EOG llm_load_vocab: control token: 1 '<s>' is not marked as EOG llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect llm_load_vocab: special tokens cache size = 3 llm_load_vocab: token to piece cache size = 0.1684 MB llm_load_print_meta: format = GGUF V2 llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 4096 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: n_embd_k_gqa = 4096 llm_load_print_meta: n_embd_v_gqa = 4096 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-06 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 11008 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 4096 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q8_0 llm_load_print_meta: model params = 6.74 B llm_load_print_meta: model size = 6.67 GiB (8.50 BPW) llm_load_print_meta: general.name = LLaMA v2 llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_print_meta: EOG token = 2 '</s>' llm_load_print_meta: max token length = 48 llm_load_tensors: tensor 'token_embd.weight' (q8_0) (and 0 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading output layer to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: CUDA0 model buffer size = 1025.47 MiB llm_load_tensors: CUDA1 model buffer size = 820.38 MiB llm_load_tensors: CUDA2 model buffer size = 820.38 MiB llm_load_tensors: CUDA3 model buffer size = 820.38 MiB llm_load_tensors: CUDA4 model buffer size = 820.38 MiB llm_load_tensors: CUDA5 model buffer size = 820.38 MiB llm_load_tensors: CUDA6 model buffer size = 820.38 MiB llm_load_tensors: CUDA7 model buffer size = 748.11 MiB llm_load_tensors: CPU_Mapped model buffer size = 132.81 MiB .................................................................................................. llama_new_context_with_model: n_seq_max = 1 llama_new_context_with_model: n_ctx = 4096 llama_new_context_with_model: n_ctx_per_seq = 4096 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: kv_size = 4096, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 32, can_shift = 1 llama_kv_cache_init: layer 0: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 1: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 2: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 3: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 4: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 5: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 6: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 7: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 8: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 9: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 10: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 11: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 12: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 13: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 14: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 15: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 16: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 17: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 18: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 19: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 20: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 21: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 22: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 23: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 24: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 25: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 26: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 27: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 28: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 29: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 30: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: layer 31: n_embd_k_gqa = 4096, n_embd_v_gqa = 4096 llama_kv_cache_init: CUDA0 KV buffer size = 320.00 MiB llama_kv_cache_init: CUDA1 KV buffer size = 256.00 MiB llama_kv_cache_init: CUDA2 KV buffer size = 256.00 MiB llama_kv_cache_init: CUDA3 KV buffer size = 256.00 MiB llama_kv_cache_init: CUDA4 KV buffer size = 256.00 MiB llama_kv_cache_init: CUDA5 KV buffer size = 256.00 MiB llama_kv_cache_init: CUDA6 KV buffer size = 256.00 MiB llama_kv_cache_init: CUDA7 KV buffer size = 192.00 MiB llama_new_context_with_model: KV self size = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 0.12 MiB llama_new_context_with_model: pipeline parallelism enabled (n_copies=4) llama_new_context_with_model: CUDA0 compute buffer size = 352.01 MiB llama_new_context_with_model: CUDA1 compute buffer size = 352.01 MiB llama_new_context_with_model: CUDA2 compute buffer size = 352.01 MiB llama_new_context_with_model: CUDA3 compute buffer size = 352.01 MiB llama_new_context_with_model: CUDA4 compute buffer size = 352.01 MiB llama_new_context_with_model: CUDA5 compute buffer size = 352.01 MiB llama_new_context_with_model: CUDA6 compute buffer size = 352.01 MiB llama_new_context_with_model: CUDA7 compute buffer size = 352.02 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 40.02 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 9 CUDA : ARCHS = 500,520,530,600,610,620,700,720,750,800,860,870,890,900 | FORCE_MMQ = 1 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | Model metadata: {'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.context_length': '4096', 'general.name': 'LLaMA v2', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '11008', 'llama.attention.layer_norm_rms_epsilon': '0.000001', 'llama.rope.dimension_count': '128', 'llama.attention.head_count': '32', 'tokenizer.ggml.bos_token_id': '1', 'llama.block_count': '32', 'llama.attention.head_count_kv': '32', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '7'} Using fallback chat format: llama-2 10:27:28-354925 INFO Loaded "llama-2-7b-chat.Q8_0.gguf" in 54.83 seconds. 10:27:28-357109 INFO LOADER: "llama.cpp" 10:27:28-358048 INFO TRUNCATION LENGTH: 4096 10:27:28-358900 INFO INSTRUCTION TEMPLATE: "Alpaca" Running on local URL: http://127.0.0.1:7860 Running on public URL: https://27a58bdff74870c9ec.gradio.live This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces) CUDA error: operation not supported current device: 0, in function ggml_backend_cuda_cpy_tensor_async at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:2258 cudaMemcpyPeerAsync(dst->data, cuda_ctx_dst->device, src->data, cuda_ctx_src->device, ggml_nbytes(dst), cuda_ctx_src->stream()) /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:70: CUDA error /home/sp/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda/lib/libggml-base.so(+0x1684b)[0x7d5da623b84b] /home/sp/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda/lib/libggml-base.so(ggml_abort+0x158)[0x7d5da623bbf8] /home/sp/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda/lib/libggml-cuda.so(+0x5faf6)[0x7d5b27e5faf6] /home/sp/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda/lib/libggml-cuda.so(+0x637df)[0x7d5b27e637df] /home/sp/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda/lib/libggml-base.so(ggml_backend_sched_graph_compute_async+0x3cc)[0x7d5da625175c] /home/sp/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda/lib/libllama.so(+0x50d50)[0x7d5d67ad7d50] /home/sp/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda/lib/libllama.so(+0x57f4a)[0x7d5d67adef4a] /home/sp/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda/lib/libllama.so(llama_decode+0x2b)[0x7d5d67adfa8b] /home/sp/text-generation-webui/installer_files/env/lib/python3.11/lib-dynload/../../libffi.so.8(+0xa052)[0x7d5e8422d052] /home/sp/text-generation-webui/installer_files/env/lib/python3.11/lib-dynload/../../libffi.so.8(+0x8925)[0x7d5e8422b925] /home/sp/text-generation-webui/installer_files/env/lib/python3.11/lib-dynload/../../libffi.so.8(ffi_call+0xde)[0x7d5e8422c06e] /home/sp/text-generation-webui/installer_files/env/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0x92e5)[0x7d5e8423d2e5] /home/sp/text-generation-webui/installer_files/env/lib/python3.11/lib-dynload/_ctypes.cpython-311-x86_64-linux-gnu.so(+0x1267e)[0x7d5e8424667e] python(_PyObject_MakeTpCall+0x27c)[0x50452c] python(_PyEval_EvalFrameDefault+0x6a6)[0x511a76] python[0x555ce1] python(_PyEval_EvalFrameDefault+0x538)[0x511908] python[0x555ce1] python(_PyEval_EvalFrameDefault+0x538)[0x511908] python[0x555ce1] python(_PyEval_EvalFrameDefault+0x538)[0x511908] python[0x5581df] python[0x5579ce] python(PyObject_Call+0x12c)[0x5430ac] python(_PyEval_EvalFrameDefault+0x47c0)[0x515b90] python(_PyFunction_Vectorcall+0x173)[0x539153] python(_PyEval_EvalFrameDefault+0x47c0)[0x515b90] python[0x5581df] python[0x557a20] python[0x62a8a3] python[0x5fa3c4] /lib/x86_64-linux-gnu/libc.so.6(+0x9ca94)[0x7d5e8549ca94] /lib/x86_64-linux-gnu/libc.so.6(+0x129c3c)[0x7d5e85529c3c]
I run engine on 8xH200: $ nvidia-smi Wed Jan 15 10:22:18 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA H200 On | 00000000:01:00.0 Off | 0 | | N/A 31C P0 76W / 700W | 114MiB / 143771MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA H200 On | 00000000:02:00.0 Off | 0 | | N/A 28C P0 74W / 700W | 132MiB / 143771MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA H200 On | 00000000:03:00.0 Off | 0 | | N/A 29C P0 75W / 700W | 132MiB / 143771MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA H200 On | 00000000:04:00.0 Off | 0 | | N/A 30C P0 75W / 700W | 132MiB / 143771MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 4 NVIDIA H200 On | 00000000:05:00.0 Off | 0 | | N/A 32C P0 76W / 700W | 132MiB / 143771MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 5 NVIDIA H200 On | 00000000:06:00.0 Off | 0 | | N/A 31C P0 74W / 700W | 132MiB / 143771MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 6 NVIDIA H200 On | 00000000:07:00.0 Off | 0 | | N/A 33C P0 77W / 700W | 132MiB / 143771MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 7 NVIDIA H200 On | 00000000:08:00.0 Off | 0 | | N/A 30C P0 73W / 700W | 132MiB / 143771MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ Everything works in confidential virtual machine and NVSwitch/NVLink not available: $ nvidia-smi topo -m GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X PHB PHB PHB PHB PHB PHB PHB 0-31 0 N/A GPU1 PHB X PHB PHB PHB PHB PHB PHB 0-31 0 N/A GPU2 PHB PHB X PHB PHB PHB PHB PHB 0-31 0 N/A GPU3 PHB PHB PHB X PHB PHB PHB PHB 0-31 0 N/A GPU4 PHB PHB PHB PHB X PHB PHB PHB 0-31 0 N/A GPU5 PHB PHB PHB PHB PHB X PHB PHB 0-31 0 N/A GPU6 PHB PHB PHB PHB PHB PHB X PHB 0-31 0 N/A GPU7 PHB PHB PHB PHB PHB PHB PHB X 0-31 0 N/A Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Describe the bug
When I start the engine it loads everything. But after first prompt it crashes
Is there an existing issue for this?
Reproduction
I use this command:
./start_linux.sh --model llama-2-7b-chat.Q8_0.gguf --share
Model was downloaded from here: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF
Screenshot
No response
Logs
System Info
The text was updated successfully, but these errors were encountered: