Sglang integrarion: Fix dtype mismatch #456

yichiche · 2025-02-05T07:00:38Z

Cast all int64 values to int32 to fix compatibility issue of sglang interface.
Add support for tkl.bf16 to expand data type handling.

harsh-nod · 2025-02-05T16:40:47Z

iree/turbine/kernel/wave/templates/extend_attention.py

@@ -162,12 +163,12 @@ def extend_attention(
            N_KV, H_KV, D_KV, ADDRESS_SPACE, wave_input_dtype, v_cache_layout
        ],
        block_table: tkl.Memory[
-            S, N_KV, GLOBAL_ADDRESS_SPACE, wave_size_dtype, block_table_layout
+            S, N_KV, GLOBAL_ADDRESS_SPACE, tkl.i32, block_table_layout
        ],
        request_indices: tkl.Memory[S, GLOBAL_ADDRESS_SPACE, wave_size_dtype],


since this is a very specific signature, I think you can remove wave_size_dtype and set it to tkl.i64. This will also require some changes to the tests. Would you like to fix this or do you want me to take it over?

harsh-nod · 2025-02-05T16:41:17Z

iree/turbine/kernel/wave/templates/extend_attention.py

@@ -202,11 +204,15 @@ def first_loop(
                elements_per_thread=LOAD_ELEMS_PER_THREAD_QK,
                mapping=q_mapping,
            )
+            if wave_input_dtype == tkl.bf16:
+                q_reg = tkw.cast(tkw.cast(q_reg, tkl.f32), tkl.f16)


I realize now that this is not necessary, so we can remove this here and elsewhere.

raikonenfnu · 2025-02-05T19:33:11Z

iree/turbine/kernel/wave/templates/extend_attention.py

@@ -215,12 +221,14 @@ def first_loop(
            e_delta = tkw.exp2(x_j - m_j)
            e_init = partial_sum * e_delta_max
            d_j = tkw.sum(e_delta, e_init, dim=N_KV)
-            imm_f16 = tkw.cast(e_delta, wave_input_dtype)


I think we should also keep this as imm_f16 = tkw.cast(e_delta, wave_input_dtype) just because we may have bf16 inputs

Sglang integrarion: Fix dtype mismatch

9b7f29b

harsh-nod reviewed Feb 5, 2025

View reviewed changes

raikonenfnu reviewed Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sglang integrarion: Fix dtype mismatch #456

Sglang integrarion: Fix dtype mismatch #456

yichiche commented Feb 5, 2025

harsh-nod Feb 5, 2025

harsh-nod Feb 5, 2025

raikonenfnu Feb 5, 2025

Sglang integrarion: Fix dtype mismatch #456

Are you sure you want to change the base?

Sglang integrarion: Fix dtype mismatch #456

Conversation

yichiche commented Feb 5, 2025

harsh-nod Feb 5, 2025

Choose a reason for hiding this comment

harsh-nod Feb 5, 2025

Choose a reason for hiding this comment

raikonenfnu Feb 5, 2025

Choose a reason for hiding this comment