feat: set of fixes and improvements for vega-market-sim #471

daniel1302 · 2023-07-29T13:01:27Z

Description

I really need this pipeline to test my nullchain changes in the core.

Changes

Added a generic function to the tools.retry function that retries the given code and returns the result or throws an exception when all of the attempts fail.
Added support for the following parameters: PARALLEL_WORKERS, TEST_FUNCTION, and LOG_LEVEL - They allow driving the integration tests script. Jenkins now has support for it - see the related PR.
Added support for the parallel run of the scenarios
- There is a pytest-dist library that has been already included in the poetry.lock. I have tested those PRs against that run as well
- There is a limitation when we use parallel run of the scenarios, where pytest does not print logs to the STDOUT anymore. I have added conftest.py, and some params in the pytest.ini. Those settings enable writing test logs into the file. Now files are saved in the `Jenkins artifacts (you can confirm it here: https://jenkins.ops.vega.xyz/job/common/job/vega-market-sim/2607/artifact/test_logs/ ) - one file per worker.
Disabled the test_rl_run test for the integration run. It takes a very long time to run this single scenario. There is a following issue here(Re-enable the test_rl_run scenario #472) to run it
The biggest problem was a huge memory leak that caused us to need 64GB of RAM on the server to fit the run. Now we need 16 GB max, with 10 parallel runs the pipeline uses 12 GB. On docker, it could work(I am not sure if it was) because docker manages fork/workers differently. When We disabled docker it started causing huge issues. I have implemented a fix, but this fix is not ideal. I have left a comment block that describes limitations. Here is a follow-up issue to fix it properly: TBD...
Changed sleep time, there is no reason to sleep 0.0001s and send up to 1000 requests per second. New sleep is 0.05 and time increases exponentially. Moreover, I have also decreased the number of tries - see details below.
Increased time for waiting for the proposal from 0.8 sec to about 2.8 sec. The timeout formula is now 0.05 * 1.1**i

Sleep Changes

The old sleep formula was: [0.0005 * 1.01**attempt for attempt in range(625)]
The new formula is: [0.05 * 1.03**attempt for attempt in range(100)]

Max wait:

for new formula: sum([0.05 * 1.03**i for i in range(100)]) == 30.364386634760475 - 30.4 sec.
for old formula: sum([0.0005 * 1.01**i for i in range(625)]) == 25.05895385147325 - 25.1 sec

The difference is that instead of 625 we send max 100 requests.

When we were sleeping 0.0005 seconds it was sending about 250 requests, (sum([0.0005 * 1.01**i for i in range(260)]) - 0.6145492739027707) sometimes more - this is my observation
Now it is sending usually 1 request, in some scenarios 3-to 4 requests.

Because we were sending thousands of requests without reason we saw a lot of RPC Connection issues.

Testing

Breaking Changes

No breaking changes.

Closes

close vegaprotocol/devops-infra#1531 .

Related PRs:

vega_sim/api/helpers.py

tests/integration/utils/fixtures.py

TomMcL

LGTM, need to reverse engineer how this is fixing datanode still, but I'll take a working node!

daniel1302 added 2 commits July 28, 2023 17:12

feat: add retry logic

e94bae8

feat: add retry logic for queue forwarder

7f453c5

daniel1302 mentioned this pull request Jul 29, 2023

feat: improvements for vega-market-sim job vegaprotocol/jenkins-shared-library#571

Merged

feat: fix the retain_log_files

61e4a58

daniel1302 force-pushed the improve-market-sim branch from 80ea56d to 61e4a58 Compare July 29, 2023 13:33

feat: add retry to kill containers

1de5fa9

daniel1302 force-pushed the improve-market-sim branch from 2f72d4c to 1de5fa9 Compare July 29, 2023 16:15

daniel1302 mentioned this pull request Jul 29, 2023

Re-enable the test_rl_run scenario #472

Open

daniel1302 commented Jul 29, 2023

View reviewed changes

vega_sim/api/helpers.py Show resolved Hide resolved

feat: small changes after self peer review

ca5be5d

daniel1302 force-pushed the improve-market-sim branch from 61eab43 to ca5be5d Compare July 29, 2023 17:32

daniel1302 marked this pull request as ready for review July 29, 2023 17:35

daniel1302 requested a review from a team as a code owner July 29, 2023 17:35

fix: linting issues

fe42db9

daniel1302 force-pushed the improve-market-sim branch from 07b71fd to fe42db9 Compare July 29, 2023 17:40

daniel1302 mentioned this pull request Jul 29, 2023

feat: nullchain changes for snapshot-compatibility pipeline vegaprotocol/vega#8887

Merged

TomMcL reviewed Jul 31, 2023

View reviewed changes

tests/integration/utils/fixtures.py Show resolved Hide resolved

chore: Minor tidy up

d1bd113

TomMcL approved these changes Jul 31, 2023

View reviewed changes

TomMcL mentioned this pull request Jul 31, 2023

Investigate what fixes datanode #475

Closed

TomMcL merged commit bfc8439 into develop Jul 31, 2023

TomMcL deleted the improve-market-sim branch July 31, 2023 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: set of fixes and improvements for vega-market-sim #471

feat: set of fixes and improvements for vega-market-sim #471

daniel1302 commented Jul 29, 2023 •

edited

Loading

TomMcL left a comment

feat: set of fixes and improvements for vega-market-sim #471

feat: set of fixes and improvements for vega-market-sim #471

Conversation

daniel1302 commented Jul 29, 2023 • edited Loading

Description

Description

Changes

Sleep Changes

Testing

Breaking Changes

Closes

Related PRs:

TomMcL left a comment

Choose a reason for hiding this comment

daniel1302 commented Jul 29, 2023 •

edited

Loading