Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace switch_s_plus_2a with branch macro #130

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

zohassadar
Copy link
Collaborator

See commit notes for details.

Interestingly this change causes this test to panic:

// loop until pc is at the instruction after the jsr copyOamStagingToOam
while emu.registers.pc != nmi_label + 25 {
emu.step();
if emu.ppu.current_scanline == 261 {
panic!("render took too long!");
}
}

The first anomaly is that the the nmi + 25 points to line 15 here, not past oam staging as described.

dec sleepCounter
lda sleepCounter
cmp #$FF
bne @jumpOverIncrement
inc sleepCounter
@jumpOverIncrement:
jsr copyOamStagingToOam

The second anomaly is that sleepCounter appears to be affected by where emu.run_until_vblank() stops. It appears as though the decrement happens in one frame but the following increment doesn't happen, causing subsequent frames to have a decrementing sleep counter. If run_until_vblank is replaced with a custom function that stops at scanline 243 instead, the test will pass.

Uses a macro with destinations as args to create a high bytes and a low bytes
table for each destination offset by -1, then uses the stack and rts for the
actual branching.

Macro expands to 11-12 bytes per branch with nothing centralized, compared to
the previous 5-6 bytes per branch to a centralized routine.

Reduces cycle count from 54-57 cycles per branch to 23-26, with the variance
from possible page boundary crossing and if the destination is in zero page.
Normal game logic involves 16 branches per frame.

Supports up to 16 destinations but can be easily extended, the most in use is
the playstate branch with 12 destinations.

Previous:
        lda dest                ; 3/4
        jsr switch_s_plus_2a    ; 6

switch_s_plus_2a:
        asl a                   ; 2
        tay                     ; 2
        iny                     ; 2
        pla                     ; 4
        sta switchTmp1          ; 3
        pla                     ; 4
        sta switchTmp2          ; 3
        lda (switchTmp1),y      ; 5/6
        tax                     ; 2
        iny                     ; 2
        lda (switchTmp1),y      ; 5/6
        sta switchTmp2          ; 3
        stx switchTmp1          ; 3
        jmp (switchTmp1)        ; 5
                                ; 54-57

New:
        ldx dest                ; 3/4
        lda hiBytes,x           ; 4/5
        pha                     ; 3
        lda loBytes,x           ; 4/5
        pha                     ; 3
        rts                     ; 6
                                ; 23-26
@zohassadar
Copy link
Collaborator Author

Forgot to mention, if the macro is modified to be padded with at least 4 nops, the test will also pass.

Throws warning when single destination is defined but optimizes to jmp
instruction anyway.

For 2 to 4 destinations, uses a decrementing x register, branch
instructions and jmp instructions to save a minimum of 7 cycles, up to 18.

Rom space is saved for 1-3 destinations.  In the case of 4 destinations,
it uses an additional 2 bytes compared to the rts method.

How each option expands:

1 destination: 3 cycles
    jmp a1

2 destinations: maximum 11 cycles to a0
    ldx dest
    beq addr0
    jmp a1
addr0:
    jmp a0

3 destinations: maximum 15 cycles to a1
    ldx dest
    beq addr0
    dex
    beq addr1
    jmp a2
addr1:
    jmp a1
addr0:
    jmp a0

4 destinations: maximum 19 cycles to a2
    ldx dest
    beq addr0
    dex
    beq addr1
    dex
    beq addr2
    jmp a3
addr2:
    jmp a2
addr1:
    jmp a1
addr0:
    jmp a0

5+ destinations: 23-26 cycles
    ldx dest
    lda hyBytes,x
    pha
    lda loBytes,x
    pha
    rts
hiBytes:
    .byte >a0,>a1,>a2,>a3,>a4
hiBytes:
    .byte <a0,<a1,<a2,<a3,<a4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant