Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize call result push to the stack #602

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

chfast
Copy link
Collaborator

@chfast chfast commented Oct 13, 2020

No description provided.

@chfast chfast requested review from axic and gumb0 October 13, 2020 17:47
@codecov
Copy link

codecov bot commented Oct 13, 2020

Codecov Report

Merging #602 (dcd8e2f) into master (8482896) will decrease coverage by 0.00%.
The diff coverage is n/a.

❗ Current head dcd8e2f differs from pull request most recent head 90bd063. Consider uploading reports for the commit 90bd063 to get more accurate results

@@            Coverage Diff             @@
##           master     #602      +/-   ##
==========================================
- Coverage   99.28%   99.27%   -0.01%     
==========================================
  Files          86       86              
  Lines       13221    13221              
==========================================
- Hits        13126    13125       -1     
- Misses         95       96       +1     
Flag Coverage Δ
rust 98.58% <ø> (-0.10%) ⬇️
spectests 89.98% <ø> (ø)
unittests 99.21% <ø> (ø)
unittests-32 99.31% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
bindings/rust/src/lib.rs 98.85% <0.00%> (-0.10%) ⬇️

@chfast
Copy link
Collaborator Author

chfast commented Oct 13, 2020

Clang 11

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.0471         -0.0471            86            82            86            82
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    -0.0561         -0.0561          1306          1232          1306          1232
fizzy/execute/ecpairing/onepoint_mean                             +0.0807         +0.0807        401835        434259        401839        434263
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   -0.0354         -0.0354           101            98           101            98
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  -0.0459         -0.0459          1488          1420          1488          1420
fizzy/execute/memset/256_bytes_mean                               -0.0019         -0.0019             7             7             7             7
fizzy/execute/memset/60000_bytes_mean                             +0.0040         +0.0040          1580          1586          1580          1586
fizzy/execute/mul256_opt0/input1_mean                             -0.0336         -0.0336            29            28            29            28
fizzy/execute/ramanujan_pi/33_runs_mean                           +0.0016         +0.0016           115           115           115           115
fizzy/execute/sha1/512_bytes_rounds_1_mean                        -0.0011         -0.0011            90            90            90            90
fizzy/execute/sha1/512_bytes_rounds_16_mean                       -0.0034         -0.0034          1253          1248          1253          1248
fizzy/execute/sha256/512_bytes_rounds_1_mean                      +0.0524         +0.0523            84            88            84            88
fizzy/execute/sha256/512_bytes_rounds_16_mean                     +0.0542         +0.0542          1149          1211          1149          1211
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      -0.0017         -0.0017         42370         42300         42371         42301
fizzy/execute/micro/eli_interpreter/exec105_mean                  -0.0080         -0.0080             5             5             5             5
fizzy/execute/micro/factorial/20_mean                             +0.0012         +0.0012             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                             -0.0055         -0.0055          5425          5395          5425          5395
fizzy/execute/micro/host_adler32/1_mean                           -0.0024         -0.0024             0             0             0             0
fizzy/execute/micro/host_adler32/1000_mean                        +0.0091         +0.0091            35            36            35            36
fizzy/execute/micro/spinner/1_mean                                +0.0239         +0.0239             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             +0.0002         +0.0002            10            10            10            10

GCC 10

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.0468         -0.0468            87            83            87            83
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    -0.0500         -0.0500          1316          1250          1316          1250
fizzy/execute/ecpairing/onepoint_mean                             -0.0422         -0.0422        410735        393403        410738        393405
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   -0.0474         -0.0474           101            97           101            97
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  -0.0620         -0.0620          1503          1410          1503          1410
fizzy/execute/memset/256_bytes_mean                               -0.0594         -0.0594             7             7             7             7
fizzy/execute/memset/60000_bytes_mean                             -0.0638         -0.0638          1599          1497          1599          1497
fizzy/execute/mul256_opt0/input1_mean                             -0.0595         -0.0595            29            27            29            27
fizzy/execute/ramanujan_pi/33_runs_mean                           -0.0391         -0.0391           131           126           131           126
fizzy/execute/sha1/512_bytes_rounds_1_mean                        -0.0450         -0.0450            94            90            94            90
fizzy/execute/sha1/512_bytes_rounds_16_mean                       -0.0465         -0.0465          1314          1253          1314          1253
fizzy/execute/sha256/512_bytes_rounds_1_mean                      -0.1120         -0.1120           100            88           100            88
fizzy/execute/sha256/512_bytes_rounds_16_mean                     -0.1107         -0.1107          1367          1216          1367          1216
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      -0.0119         -0.0119         40540         40059         40541         40059
fizzy/execute/micro/eli_interpreter/exec105_mean                  -0.0232         -0.0232             5             5             5             5
fizzy/execute/micro/factorial/20_mean                             -0.0319         -0.0319             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                             -0.0041         -0.0041          5230          5208          5230          5208
fizzy/execute/micro/host_adler32/1_mean                           +0.0009         +0.0009             0             0             0             0
fizzy/execute/micro/host_adler32/1000_mean                        +0.0113         +0.0113            29            30            29            30
fizzy/execute/micro/spinner/1_mean                                -0.0549         -0.0549             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             -0.0260         -0.0260            10            10            10            10

@chfast chfast added the optimization Performance optimization label Oct 13, 2020
@axic
Copy link
Member

axic commented Oct 13, 2020

I'd merge #554 first / instead of this.

@chfast
Copy link
Collaborator Author

chfast commented Oct 13, 2020

I'd merge #554 first / instead of this.

This is the only effective change in #554. Rest is done by the compiler optimization.

@chfast chfast mentioned this pull request Oct 20, 2020
lib/fizzy/execute.cpp Outdated Show resolved Hide resolved
@axic
Copy link
Member

axic commented Oct 20, 2020

I'd merge #554 first / instead of this.

This is the only effective change in #554. Rest is done by the compiler optimization.

The reason I like #554 because that makes it apparent those are two different code paths and we should optimise separately. We can wait for the std::function work first though.

@chfast
Copy link
Collaborator Author

chfast commented Oct 20, 2020

The reason I like #554 because that makes it apparent those are two different code paths and we should optimise separately. We can wait for the std::function work first though.

Inlining gives you the same effect. The commit 11a3ee4 does not change the assembly at all.

@axic
Copy link
Member

axic commented Oct 20, 2020

The reason I like #554 because that makes it apparent those are two different code paths and we should optimise separately. We can wait for the std::function work first though.

Inlining gives you the same effect. The commit 11a3ee4 does not change the assembly at all.

I totally understand that and didn't mean it will change anything, right now, but makes it apparent for us to deviate that code when looking at it.

@axic
Copy link
Member

axic commented Nov 6, 2020

The question about #554 is moot now, since #616 was merged.

@axic axic mentioned this pull request Nov 6, 2020
Copy link
Member

@axic axic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would need to squash prior to merging.

@chfast
Copy link
Collaborator Author

chfast commented Nov 6, 2020

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.0009         -0.0009           221           221           221           221
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    +0.0059         +0.0058          3357          3377          3356          3376
fizzy/execute/ecpairing/onepoint_mean                             +0.0274         +0.0276       1172130       1204208       1171467       1203788
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   +0.0330         +0.0327           265           273           265           273
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  +0.0341         +0.0340          3901          4034          3901          4033
fizzy/execute/memset/256_bytes_mean                               +0.0473         +0.0470            20            21            20            21
fizzy/execute/memset/60000_bytes_mean                             +0.0501         +0.0501          4407          4628          4407          4627
fizzy/execute/mul256_opt0/input1_mean                             -0.0678         -0.0679            92            86            92            86
fizzy/execute/ramanujan_pi/33_runs_mean                           +0.0012         +0.0012           438           439           438           439
fizzy/execute/sha1/512_bytes_rounds_1_mean                        +0.0697         +0.0700           237           254           237           254
fizzy/execute/sha1/512_bytes_rounds_16_mean                       +0.0700         +0.0700          3308          3540          3308          3539
fizzy/execute/sha256/512_bytes_rounds_1_mean                      +0.1390         +0.1388           244           278           244           278
fizzy/execute/sha256/512_bytes_rounds_16_mean                     +0.1457         +0.1457          3369          3859          3368          3859
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      -0.1457         -0.1457        118667        101382        118653        101369
fizzy/execute/micro/eli_interpreter/exec105_mean                  -0.0354         -0.0355            13            12            13            12
fizzy/execute/micro/factorial/20_mean                             -0.0095         -0.0098             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                             -0.1910         -0.1911         13832         11191         13831         11189
fizzy/execute/micro/host_adler32/1_mean                           +0.3003         +0.3002             0             0             0             0
fizzy/execute/micro/host_adler32/1000_mean                        +0.0337         +0.0335            58            60            58            60
fizzy/execute/micro/icall_hash/1000_steps_mean                    +0.0122         +0.0122           124           125           124           125
fizzy/execute/micro/spinner/1_mean                                +0.0286         +0.0283             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             +0.0343         +0.0347            17            18            17            18

@axic axic force-pushed the optimize_call_result_push branch from c0abd7b to 71bcab2 Compare May 21, 2022 10:52
@axic
Copy link
Member

axic commented May 21, 2022

Rebased.

@axic axic requested a review from gumb0 May 21, 2022 10:52
lib/fizzy/execute.cpp Outdated Show resolved Hide resolved
@axic axic force-pushed the optimize_call_result_push branch from 71bcab2 to c7d1c71 Compare May 21, 2022 12:16
@axic
Copy link
Member

axic commented May 21, 2022

@chfast merge or not merge?

@axic axic force-pushed the optimize_call_result_push branch from c7d1c71 to 62311e8 Compare May 23, 2022 08:22
@chfast chfast force-pushed the optimize_call_result_push branch from 62311e8 to 7dd0274 Compare May 23, 2022 21:35
@chfast
Copy link
Collaborator Author

chfast commented May 23, 2022

This is no good.

GCC 12

fizzy/execute/blake2b/512_bytes_rounds_1_mean                      +0.2228         +0.2228            67            82            67            82
fizzy/execute/blake2b/512_bytes_rounds_16_mean                     +0.2179         +0.2179          1011          1231          1011          1231
fizzy/execute/ecpairing/onepoint_mean                              +0.1895         +0.1895        327477        389546        327477        389545
fizzy/execute/keccak256/512_bytes_rounds_1_mean                    +0.2082         +0.2082            76            92            76            92
fizzy/execute/keccak256/512_bytes_rounds_16_mean                   +0.1316         +0.1316          1105          1250          1105          1250
fizzy/execute/memset/256_bytes_mean                                +0.0068         +0.0068             6             6             6             6
fizzy/execute/memset/60000_bytes_mean                              +0.0027         +0.0027          1279          1282          1279          1282
fizzy/execute/mul256_opt0/input1_mean                              -0.0010         -0.0010            25            25            25            25
fizzy/execute/ramanujan_pi/33_runs_mean                            +0.0076         +0.0076            98            99            98            99
fizzy/execute/sha1/512_bytes_rounds_1_mean                         +0.0285         +0.0286            74            76            74            76
fizzy/execute/sha1/512_bytes_rounds_16_mean                        +0.0314         +0.0314          1033          1065          1033          1065
fizzy/execute/sha256/512_bytes_rounds_1_mean                       +0.0059         +0.0059            73            73            73            73
fizzy/execute/sha256/512_bytes_rounds_16_mean                      +0.0126         +0.0126          1000          1012          1000          1012
fizzy/execute/taylor_pi/pi_1000000_runs_mean                       -0.0002         -0.0002         38223         38214         38223         38215
fizzy/execute/micro/eli_interpreter/exec105_mean                   +0.1523         +0.1523             4             5             4             5
fizzy/execute/micro/factorial/20_mean                              +0.0109         +0.0109             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                              +0.0437         +0.0437          4378          4570          4378          4570
fizzy/execute/micro/host_adler32/1_mean                            -0.0127         -0.0127             0             0             0             0
fizzy/execute/micro/host_adler32/1000_mean                         -0.0102         -0.0102            26            26            26            26
fizzy/execute/micro/icall_hash/1000_steps_mean                     +0.0098         +0.0098            60            61            60            61
fizzy/execute/micro/spinner/1_mean                                 +0.0789         +0.0789             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                              +0.0441         +0.0441             9             9             9             9

clang 14

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     +0.0447         +0.0447            68            72            68            72
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    +0.0405         +0.0405          1035          1077          1035          1077
fizzy/execute/ecpairing/onepoint_mean                             +0.0598         +0.0598        337338        357516        337340        357517
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   +0.0868         +0.0869            82            89            82            89
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  +0.0960         +0.0960          1191          1305          1191          1305
fizzy/execute/memset/256_bytes_mean                               +0.0337         +0.0337             6             6             6             6
fizzy/execute/memset/60000_bytes_mean                             +0.0349         +0.0349          1258          1302          1258          1302
fizzy/execute/mul256_opt0/input1_mean                             +0.0013         +0.0013            23            23            23            23
fizzy/execute/ramanujan_pi/33_runs_mean                           +0.0665         +0.0665            95           102            95           102
fizzy/execute/sha1/512_bytes_rounds_1_mean                        +0.0486         +0.0486            74            78            74            78
fizzy/execute/sha1/512_bytes_rounds_16_mean                       +0.0449         +0.0449          1033          1079          1033          1079
fizzy/execute/sha256/512_bytes_rounds_1_mean                      +0.3413         +0.3413            71            95            71            95
fizzy/execute/sha256/512_bytes_rounds_16_mean                     +0.4002         +0.4002           976          1366           976          1366
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      +0.0045         +0.0045         37441         37610         37442         37611
fizzy/execute/micro/eli_interpreter/exec105_mean                  +0.0661         +0.0661             4             4             4             4
fizzy/execute/micro/factorial/20_mean                             -0.0245         -0.0245             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                             +0.0001         +0.0001          4939          4940          4939          4940
fizzy/execute/micro/host_adler32/1_mean                           +0.0037         +0.0037             0             0             0             0
fizzy/execute/micro/host_adler32/1000_mean                        +0.0241         +0.0241            28            29            28            29
fizzy/execute/micro/icall_hash/1000_steps_mean                    +0.0531         +0.0531            64            67            64            67
fizzy/execute/micro/spinner/1_mean                                +0.0531         +0.0531             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             +0.0676         +0.0676             8             8             8             8
OVERALL_GEOMEAN                                                   +0.0661         +0.0661             0             0             0             0

@chfast chfast force-pushed the optimize_call_result_push branch from 7dd0274 to 90bd063 Compare May 24, 2022 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Performance optimization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants