Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: set of fixes and improvements for vega-market-sim #471

Merged
merged 7 commits into from
Jul 31, 2023

Conversation

daniel1302
Copy link
Contributor

@daniel1302 daniel1302 commented Jul 29, 2023

Description

Description

I really need this pipeline to test my nullchain changes in the core.

Changes

  • Added a generic function to the tools.retry function that retries the given code and returns the result or throws an exception when all of the attempts fail.
  • Added support for the following parameters: PARALLEL_WORKERS, TEST_FUNCTION, and LOG_LEVEL - They allow driving the integration tests script. Jenkins now has support for it - see the related PR.
  • Added support for the parallel run of the scenarios
    • There is a pytest-dist library that has been already included in the poetry.lock. I have tested those PRs against that run as well
    • There is a limitation when we use parallel run of the scenarios, where pytest does not print logs to the STDOUT anymore. I have added conftest.py, and some params in the pytest.ini. Those settings enable writing test logs into the file. Now files are saved in the `Jenkins artifacts (you can confirm it here: https://jenkins.ops.vega.xyz/job/common/job/vega-market-sim/2607/artifact/test_logs/ ) - one file per worker.
  • Disabled the test_rl_run test for the integration run. It takes a very long time to run this single scenario. There is a following issue here(Re-enable the test_rl_run scenario #472) to run it
  • The biggest problem was a huge memory leak that caused us to need 64GB of RAM on the server to fit the run. Now we need 16 GB max, with 10 parallel runs the pipeline uses 12 GB. On docker, it could work(I am not sure if it was) because docker manages fork/workers differently. When We disabled docker it started causing huge issues. I have implemented a fix, but this fix is not ideal. I have left a comment block that describes limitations. Here is a follow-up issue to fix it properly: TBD...
  • Changed sleep time, there is no reason to sleep 0.0001s and send up to 1000 requests per second. New sleep is 0.05 and time increases exponentially. Moreover, I have also decreased the number of tries - see details below.
  • Increased time for waiting for the proposal from 0.8 sec to about 2.8 sec. The timeout formula is now 0.05 * 1.1**i

Sleep Changes

  • The old sleep formula was: [0.0005 * 1.01**attempt for attempt in range(625)]
  • The new formula is: [0.05 * 1.03**attempt for attempt in range(100)]

Max wait:

  • for new formula: sum([0.05 * 1.03**i for i in range(100)]) == 30.364386634760475 - 30.4 sec.
  • for old formula: sum([0.0005 * 1.01**i for i in range(625)]) == 25.05895385147325 - 25.1 sec

The difference is that instead of 625 we send max 100 requests.

  • When we were sleeping 0.0005 seconds it was sending about 250 requests, (sum([0.0005 * 1.01**i for i in range(260)]) - 0.6145492739027707) sometimes more - this is my observation
  • Now it is sending usually 1 request, in some scenarios 3-to 4 requests.

Because we were sending thousands of requests without reason we saw a lot of RPC Connection issues.

Testing

image

Breaking Changes

No breaking changes.

Closes

close vegaprotocol/devops-infra#1531 .

Related PRs:

@daniel1302 daniel1302 force-pushed the improve-market-sim branch from 80ea56d to 61e4a58 Compare July 29, 2023 13:33
@daniel1302 daniel1302 force-pushed the improve-market-sim branch from 61eab43 to ca5be5d Compare July 29, 2023 17:32
@daniel1302 daniel1302 marked this pull request as ready for review July 29, 2023 17:35
@daniel1302 daniel1302 requested a review from a team as a code owner July 29, 2023 17:35
Copy link
Contributor

@TomMcL TomMcL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, need to reverse engineer how this is fixing datanode still, but I'll take a working node!

@TomMcL TomMcL merged commit bfc8439 into develop Jul 31, 2023
@TomMcL TomMcL deleted the improve-market-sim branch July 31, 2023 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants