Support partial vector extension instructions #545

vestata · 2025-01-24T14:54:11Z

Add support for the RISC-V "V" Vector Extension. This pull request implements decoding for 585 out of 616 version 1.0 spec vector instructions, with partial interpreter implementation.

The decoding method for vector instructions, including vector configuration and load/store instructions, follows the approach used in rv32emu. The new rvv_jumptable is introduced to handle remaining arithmetic instructions.

The interpreter implementation is tested using the riscv-vector-tests repository, with current limitations, as outlined in the repo. Included partial support for vector load/store instructions and single-width arithmetic instructions. The architecture now supports different settings for sew, lmul, and vector masking.

Vector instructions passing the tests include:

vle8.v, vle16.v, vle32.v
vse8.v, vse16.v, vse32.v
vadd.vv, vadd.vx, vadd.vi
vsub.vv, vsub.vx, vsub.vi,
vand.vv, vand.vx, vand.vi
vor.vv, vor.vx, vor.vi
vxor.vv, vxor.vx, vxor.vi
vsll.vv, vsll.vx, vsll.vi
vmul.vv, vmul.vx, vmul.vi

Close #504

Summary by Bito

Major implementation of RISC-V Vector Extension support, adding comprehensive instruction decoding for vector operations including load/store, arithmetic, and floating-point instructions. Introduces vector configuration settings, register handling, and constant optimization support. Implements 585 out of 616 version 1.0 spec vector instructions with support for various data widths (8-bit to 64-bit) and addressing modes.

Unit tests added: False

Estimated effort to review (1-5, lower is better): 5

jserv

Benchmarks

Benchmark suite	Current: `4da7057`	Previous: `1627a4b`	Ratio
`Dhrystone`	`1340` Average DMIPS over 10 runs	`1333` Average DMIPS over 10 runs	`0.99`
`Coremark`	`949.671` Average iterations/sec over 10 runs	`952.577` Average iterations/sec over 10 runs	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

src/decode.c

jserv · 2025-01-24T15:59:33Z

src/decode.c


-    /* standard uncompressed instruction */
-    const uint32_t index = (insn & INSN_6_2) >> 2;
+static inline bool op_000000(rv_insn_t *ir, const uint32_t insn)


op_000000 looks misleading. Can you improve its naming scheme?

The naming scheme is based on the function6 field listed in riscv-v-spec/inst-table.adoc. Since each function6 may include OPI, OPM, or OPF functions, often corresponding to unrelated operations. I chose to name them directly based on the function6 for consistency.

This might seem unclear without additional context. To improve clarity, I could add comments explaining the naming convention for each op_function6. Would this address your concern?

src/riscv.h

vacantron · 2025-01-24T16:35:00Z

The interpreter implementation is tested using the riscv-vector-tests repository

Could we create an CI like using ROSCOF for this?

vestata · 2025-01-25T11:05:44Z

The interpreter implementation is tested using the riscv-vector-tests repository

Could we create an CI like using ROSCOF for this?

I'm not familiar with ROSCOF, but I'll look into it and give it a try.

src/riscv_private.h

src/emulate.c

eleanorLYJ · 2025-01-25T14:24:01Z

Suggest using git rebase -i to squash the commit into the previous one instead of adding a new commit.

src/rv32_template.c

.gitignore

src/decode.c

src/decode.h

src/decode.c

src/decode.h

vestata · 2025-01-26T14:39:37Z

Thank you all for your feedback and suggestions! I will fix the typos, add a newline at the end of files, and remove any unnecessary elements. I also noticed that the current code does not fully meet the contributing guidelines, so I will make sure to address those issues. In addition, I will add more detailed comments in src/decode.c and src/rv32_template.c and ensure the formatting is correct.

Since some of the code was misplaced from the beginning, and as @eleanorLYJ mentioned, there are non-compliant comments in an early commit, I’m considering git rebase -i everything from the start. Do you have any suggestions or concerns about that approach?

I’d appreciate your guidance. Thank you!

howjmay · 2025-01-27T00:19:52Z

src/rv32_template.c

+    }                                                                        \
+}
+
+#define VMV_LOOP(des, op1, op2, op, SHIFT, MASK, i, j, itr, vm)             \


may I ask where this is one used?

The VMV_LOOP macro is used in the implementation of vmv_v_i(at src/rv32_template.c, line 6366), as the riscv-vector-tests frequently utilize vmv_v_i to clear bits in vector registers during each test. This serves as a quick implementation for the vmv_v_i instruction.

Additionally, I noticed that the implementations of vmv_v_* (representing vmv_v_v, vmv_v_x, and vmv_v_i) can be refactored to reuse existing macros such as VV_LOOP, VX_LOOP, VI_LOOP, and their _LEFT variants (collectively referred to as V*_LOOP and V*_LOOP_LEFT). I will remove the VMV_LOOP and related _LEFT macros accordingly. Thank you for pointing this out!

src/decode.c

bito-code-review · 2025-02-06T03:01:45Z

src/decode.c

+    if (decode_funct3(insn) != 0b010) {
+        uint8_t eew = decode_eew(insn);
+        ir->eew = 8 << eew;


Consider validating eew before shift operations

Consider adding validation for eew value before using it in shift operations. The decode_eew() function can return -1 for invalid values, which could lead to undefined behavior when used in 8 << eew.

Code suggestion

Check the AI-generated fix before applying

Suggested change

if (decode_funct3(insn) != 0b010) {

uint8_t eew = decode_eew(insn);

ir->eew = 8 << eew;

if (decode_funct3(insn) != 0b010) {

uint8_t eew = decode_eew(insn);

if (eew < 0) {

return false;

}

ir->eew = 8 << eew;

Code Review Run #47f569

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

src/decode.c

bito-code-review · 2025-02-06T03:01:47Z

src/decode.c

+{
+    ir->vs2 = decode_rs2(insn);
+    ir->imm = decode_v_imm(insn);
+    ir->vd = decode_rd(insn);
+    ir->vm = decode_vm(insn);
+}


Consider adding parameter validation checks

Consider validating the input parameters before accessing them. The decode_vitype function directly accesses instruction fields without any validation of ir or insn parameters.

Code suggestion

Check the AI-generated fix before applying

Suggested change

{

ir->vs2 = decode_rs2(insn);

ir->imm = decode_v_imm(insn);

ir->vd = decode_rd(insn);

ir->vm = decode_vm(insn);

}

{

if (!ir) {

return;

}

ir->vs2 = decode_rs2(insn);

ir->imm = decode_v_imm(insn);

ir->vd = decode_rd(insn);

ir->vm = decode_vm(insn);

}

Code Review Run #47f569

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

src/decode.c

bito-code-review · 2025-02-06T03:01:50Z

src/decode.c

+        /* FIXME: Implement the decoding for vmv<nr>r. */
+    case 4:


Consider implementing vmv<nr>r instruction decoding

The vmv<nr>r instruction decoding is marked with a FIXME comment but has no implementation. This could lead to undefined behavior when this instruction is encountered.

Code suggestion

Check the AI-generated fix before applying

Suggested change

/* FIXME: Implement the decoding for vmv<nr>r. */

case 4:

/* Not implemented */

return false;

case 4:

Code Review Run #47f569

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

bito-code-review · 2025-02-06T03:01:51Z

src/decode.c

+static inline bool op_101110(rv_insn_t *ir, const uint32_t insn)
+{
+    switch (decode_funct3(insn)) {


Consider consolidating vector instruction decoding patterns

Consider consolidating similar switch-case patterns across functions op_101110 through op_111100. Many functions follow a similar pattern of decoding vector instructions with repeated code structure.

Code suggestion

Check the AI-generated fix before applying

Suggested change

static inline bool op_101110(rv_insn_t *ir, const uint32_t insn)

{

switch (decode_funct3(insn)) {

static inline bool decode_vector_op(rv_insn_t *ir, const uint32_t insn, const rv_vec_op_t *op_table, size_t table_size)

{

uint32_t funct3 = decode_funct3(insn);

if (funct3 >= table_size) {

return false;

}

const rv_vec_op_t *op = &op_table[funct3];

if (!op->decode_fn) {

return false;

}

op->decode_fn(ir, insn);

ir->opcode = op->opcode;

return true;

}

Code Review Run #47f569

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

src/decode.c

Add decode stage for RISC-V "V" Vector extension instructions from version 1.0, excluding VXUNARY0, VRFUNARY0, VWFUNARY0, VFUNARY1, vmv<nr>r, and VFUNARY0. This commit focuses on the decode stage to ensure correct instructions parsing before proceeding to the execution stage. Verification is currently done through hand-written code. Modify Makefile to support VLEN configuration, via make ENABLE_EXT_V=1 VLEN=<value>. The default value for VLEN is set to 128. The current implementation only supports VLEN=128. Enabling ENABLE_EXT_V=1 will also enable ENABLE_EXT_F=1, as vector load/ store instruction shares the same opcode with load_fp and store_fp.

Add support for vset{i}vl{i} instructions following the RISC-V vector extension version 1.0. Simplify avlmax calculation by directly computing avlmax = lmul * vlen / sew instead of converting to floating-point as described in the specification.

Implement vle8_v, vle16_v, vle32_v, vse8_v, vse16_v, vse32_v. Using loop unrolling technique to handle a word at a time. The implementation assumes VLEN = 128. There are two types of illegal instructions: 1. When eew is narrower than csr_vl. Set vill in vtype to 1 and other bits to 0, set csr_vl to 0. 2. When LMUL > 1 and trying to access a vector register that is larger than 31. Use assert to handle this case.

To emulate vector registers of length VLEN using an array of uint32_t, we first handle different SEW values (8, 16, 32) using sew_*b_handler. Inside the handler, the V*_LOOP macro expands to process different VL values and operand types, along with its corresponding V*_LOOP_LEFT. The goal is to maximize code reuse by defining individual operations next to their respective vector instructions, which can be easily applied using the OPT() macro. V*_LOOP execution steps: 1. Copy the operand op1 (op2). 2. Align op1 to the right. 3. Perform the specified operation between op1 and op2. 4. Mask the result according to the corresponding SEW. 5. Shift the result left to align with the corresponding position. 6. Accumulate the result. In vector register groups, registers should follow the pattern v2*n, v2*n+1 when lmul = 2, etc. The current implementation allows using any vector registers except those exceeding v31. For vector masking, if the corresponding mask bit is 0, the value of the destination vector register is preserved. The process is as follows: 1. Copy the destination register. 2. Clear the bits corresponding to VL. 3. Store the computed result in ans. 4. Update the destination register with ans. If ir->vm == 0, vector masking is activated.

src/decode.c

bito-code-review · 2025-02-06T18:29:24Z

src/rv32_template.c

+    vssub_vv,
+    {
+        for (int i = 0; i < 4; i++) {
+            rv->V[rv_reg_zero][i] = 0;
+        }
+    },


Consider extracting repeated vector init code

Consider refactoring the repetitive vector register initialization pattern. The same code block for (int i = 0; i < 4; i++) { rv->V[rv_reg_zero][i] = 0; } appears in multiple vector operations which could be extracted into a helper function.

Code suggestion

Check the AI-generated fix before applying

Suggested change

vssub_vv,

{

for (int i = 0; i < 4; i++) {

rv->V[rv_reg_zero][i] = 0;

}

},

static inline void init_vector_reg_zero(riscv_t *rv) {

for (int i = 0; i < 4; i++) {

rv->V[rv_reg_zero][i] = 0;

}

}

vssub_vv,

{

init_vector_reg_zero(rv);

},

Code Review Run #005f29

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

bito-code-review · 2025-02-06T18:29:25Z

src/rv32_template.c

+#define op_sll(a, b) \
+    ((a) << ((b) & ((8 << ((rv->csr_vtype >> 3) & 0b111)) - 1)))


Consider adding bounds check for shift

The shift amount calculation in op_sll macro may need bounds checking to prevent undefined behavior when shifting by amounts >= bit width. Consider adding explicit bounds check.

Code suggestion

Check the AI-generated fix before applying

Suggested change

#define op_sll(a, b) \

((a) << ((b) & ((8 << ((rv->csr_vtype >> 3) & 0b111)) - 1)))

#define op_sll(a, b) do { \

int _shift = (b) & ((8 << ((rv->csr_vtype >> 3) & 0b111)) - 1); \

_shift = _shift >= sizeof(a) * 8 ? sizeof(a) * 8 - 1 : _shift; \

((a) << _shift); } while(0)

Code Review Run #005f29

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

bito-code-review · 2025-02-06T18:29:26Z

src/rv32_template.c

+RVOP(
+    vlseg8e8_v,
+    {
+        for (int i = 0; i < 4; i++) {
+            rv->V[rv_reg_zero][i] = 0;
+        }
+    },
+    GEN({/* no operation */}))


Consider consolidating vector register initialization code

Consider consolidating the repeated vector register initialization code blocks into a reusable helper function to reduce code duplication. Each vector instruction implementation currently contains identical initialization logic.

Code suggestion

Check the AI-generated fix before applying

Suggested change

RVOP(

vlseg8e8_v,

{

for (int i = 0; i < 4; i++) {

rv->V[rv_reg_zero][i] = 0;

}

},

GEN({/* no operation */}))

static void init_vector_reg_zero(riscv_t *rv) {

for (int i = 0; i < 4; i++) {

rv->V[rv_reg_zero][i] = 0;

}

}

RVOP(

vlseg8e8_v,

{

init_vector_reg_zero(rv);

},

GEN({/* no operation */}))

Code Review Run #005f29

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

bito-code-review · 2025-02-06T18:29:27Z

src/rv32_template.c

+RVOP(
+    vadc_vvm,
+    {
+        for (int i = 0; i < 4; i++) {
+            rv->V[rv_reg_zero][i] = 0;
+        }
+    },


Consider refactoring repeated vector zero initialization

Consider refactoring the repeated code pattern that sets rv->V[rv_reg_zero][i] = 0 across multiple vector operations. This pattern appears in multiple RVOP definitions and could be extracted into a helper function to improve maintainability.

Code suggestion

Check the AI-generated fix before applying

Suggested change

RVOP(

vadc_vvm,

{

for (int i = 0; i < 4; i++) {

rv->V[rv_reg_zero][i] = 0;

}

},

static inline void zero_vector_register(riscv_t *rv) {

for (int i = 0; i < 4; i++) {

rv->V[rv_reg_zero][i] = 0;

}

}

RVOP(

vadc_vvm,

{

zero_vector_register(rv);

},

Code Review Run #005f29

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

bito-code-review · 2025-02-06T18:29:28Z

src/rv32_template.c

+RVOP(
+    vaaddu_vv,
+    {
+        for (int i = 0; i < 4; i++) {
+            rv->V[rv_reg_zero][i] = 0;
+        }
+    },
+    GEN({/* no operation */}))


Consider consolidating repeated vector zero initialization

Consider refactoring the repeated code block that sets rv->V[rv_reg_zero][i] = 0 across multiple vector operations. This pattern appears in almost every vector operation implementation which could be consolidated into a helper function.

Code suggestion

Check the AI-generated fix before applying

Suggested change

RVOP(

vaaddu_vv,

{

for (int i = 0; i < 4; i++) {

rv->V[rv_reg_zero][i] = 0;

}

},

GEN({/* no operation */}))

static inline void zero_vector_reg(riscv_t *rv) {

for (int i = 0; i < 4; i++) {

rv->V[rv_reg_zero][i] = 0;

}

}

RVOP(

vaaddu_vv,

{

zero_vector_reg(rv);

},

GEN({/* no operation */}))

Code Review Run #005f29

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

bito-code-review · 2025-02-06T18:29:29Z

src/decode.c

+        uint8_t eew = decode_eew(insn);
+        ir->eew = 8 << eew;


Consider adding eew value validation

Consider adding error handling for invalid eew values. Currently, if decode_eew() returns an invalid value, it could lead to undefined behavior when calculating ir->eew = 8 << eew.

Code suggestion

Check the AI-generated fix before applying

Suggested change

uint8_t eew = decode_eew(insn);

ir->eew = 8 << eew;

uint8_t eew = decode_eew(insn);

if (eew > 3) {

return false;

}

ir->eew = 8 << eew;

Code Review Run #005f29

Is this a valid issue, or was it incorrectly flagged by the Agent?

it was incorrectly flagged

vestata · 2025-02-06T18:45:12Z

In src/rv32_template.c, the implementation of vector instruction operations using OPT() relies heavily on macros, including multiple-use parameters within the macro. To address this issue, I am currently refactoring them into functions while maintaining the same logic.

jserv · 2025-02-06T18:58:40Z

src/decode.c

@@ -306,6 +306,87 @@ static inline uint16_t c_decode_cbtype_imm(const uint16_t insn)
 }
 #endif /* RV32_HAS(EXT_C) */

+#if RV32_HAS(EXT_V) /* RV32_HAS(EXT_V) */


Replace the comment /* RV32_HAS(EXT_V) */ with something like "Vector extension."

jserv · 2025-02-06T18:59:42Z

src/decode.c

@@ -1971,67 +2384,2039 @@ static inline bool op_cfsw(rv_insn_t *ir, const uint32_t insn)
 #define op_cflwsp OP_UNIMP
 #endif /* RV32_HAS(EXT_C) && RV32_HAS(EXT_F) */

-/* handler for all unimplemented opcodes */
-static inline bool op_unimp(rv_insn_t *ir UNUSED, uint32_t insn UNUSED)
+#if RV32_HAS(EXT_V) /* RV32_HAS(EXT_V) */


Ditto. Refine the comment.

jserv · 2025-02-06T19:02:20Z

src/rv32_template.c

@@ -2988,3 +2988,5129 @@ RVOP(
    }))

 #endif
+
+#if RV32_HAS(EXT_V)
+#define LEN ((VLEN) >> (5))


Carefully define LEN, which may lead to redefinitions.

jserv

Rebase the latest matser branch, which resolves the recent CI failures.

jserv reviewed Jan 24, 2025

View reviewed changes

jserv mentioned this pull request Jan 24, 2025

Add decoder for RVV instructions #501

Closed

jserv requested review from howjmay and vacantron January 24, 2025 15:56

jserv added this to the release-2025.1 milestone Jan 24, 2025

jserv reviewed Jan 24, 2025

View reviewed changes

src/decode.c Outdated Show resolved Hide resolved

jserv reviewed Jan 24, 2025

View reviewed changes

src/decode.c Outdated Show resolved Hide resolved

jserv reviewed Jan 24, 2025

View reviewed changes

jserv requested review from RinHizakura, visitorckw, Risheng1128, ChinYikMing and eleanorLYJ January 24, 2025 16:00

This comment was marked as resolved.

Sign in to view

jserv changed the title ~~Add RVV extension support~~ Support partial vector extension instructions Jan 24, 2025

ChinYikMing reviewed Jan 24, 2025

View reviewed changes

src/riscv.h Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

eleanorLYJ reviewed Jan 25, 2025

View reviewed changes

src/riscv_private.h Outdated Show resolved Hide resolved

eleanorLYJ reviewed Jan 25, 2025

View reviewed changes

src/emulate.c Outdated Show resolved Hide resolved

visitorckw reviewed Jan 25, 2025

View reviewed changes

src/rv32_template.c Outdated Show resolved Hide resolved

visitorckw reviewed Jan 25, 2025

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

visitorckw reviewed Jan 25, 2025

View reviewed changes

src/decode.c Outdated Show resolved Hide resolved

ChinYikMing reviewed Jan 26, 2025

View reviewed changes

src/decode.h Outdated Show resolved Hide resolved

src/decode.c Outdated Show resolved Hide resolved

src/decode.h Outdated Show resolved Hide resolved

This comment was marked as resolved.

Sign in to view

howjmay reviewed Jan 27, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view