Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skaled on the lagging node waits only 6 seconds to receive the batch of block #830

Open
oleksandrSydorenkoJ opened this issue Mar 4, 2024 · 3 comments
Assignees
Labels
bug Something isn't working epic:archive-node
Milestone

Comments

@oleksandrSydorenkoJ
Copy link

oleksandrSydorenkoJ commented Mar 4, 2024

Describe the bug
Related to skalenetwork/skaled#1669
During the spinning up of an archival node, the startup consistently begins with catchup. In the case of a large number of blocks, the binary block batch on the sending node may take around 30 seconds to form. But, the skaled on the archival node (receiver) expects to receive blocks in just 6 seconds.

Versions:
skaled:3.18.0

Environment:
Active Schain MEDIUM type with at least 1 million blocks (almost without load, near 20k transactions total)
enabled debug-behavior-apis
log-level - DEBUG
Archival node with whitelisted IP for the chain and debug log level

To Reproduce

  1. Init the archival node
skale sync-node init --archive --catchup --historic-state init-env
  1. Check the skaled logs on the archival node
  2. Check the skaled logs on the core node where the archival node requested the batch of blocks

Expected behavior
Skaled should wait for the default 2 minutes to receive a large block batch.

Actual state:
Skaled on the archival node (receiver) waits only 10 seconds, and the core node (sender) hangs for 30 seconds during the serializing binary batch of blocks

Logs:
The archival node:

[2024-03-04 15:25:41.425] [16:main] [error] 0:!Exception: CatchupClientAgent:Catchupc step 2: can not read catchup response
[2024-03-04 15:25:41.426] [16:main] [error] 0: !Caused by: nlohmann:Read catchup response:Could not read header len from:1.1.1.1
[2024-03-04 15:25:41.426] [16:main] [error] 0:  !Caused by: IO:Peer read timeout
[2024-03-04 15:25:52.689] [16:main] [error] 0:!Exception: CatchupClientAgent:Catchupc step 2: can not read catchup response
[2024-03-04 15:25:52.689] [16:main] [error] 0: !Caused by: nlohmann:Read catchup response:Could not read header len from:2.2.2.2
[2024-03-04 15:25:52.689] [16:main] [error] 0:  !Caused by: IO:Peer read timeout
[2024-03-04 15:26:03.697] [16:main] [error] 0:!Exception: CatchupClientAgent:Catchupc step 2: can not read catchup response
[2024-03-04 15:26:03.697] [16:main] [error] 0: !Caused by: nlohmann:Read catchup response:Could not read header len from:3.3.3.3
[2024-03-04 15:26:03.697] [16:main] [error] 0:  !Caused by: IO:Peer read timeout

The core node:
3_18_0_skaled_core_prepare_serialized_batch.txt

@oleksandrSydorenkoJ oleksandrSydorenkoJ added the bug Something isn't working label Mar 4, 2024
@DmytroNazarenko DmytroNazarenko added this to the SKALE 2.5 milestone Mar 4, 2024
@oleksandrSydorenkoJ oleksandrSydorenkoJ changed the title Skaled on the lagging node waits only 10 seconds to receive the batch of block Skaled on the lagging node waits only 6 seconds to receive the batch of block Mar 5, 2024
@PolinaKiporenko
Copy link

PolinaKiporenko commented Mar 5, 2024

@oleksandrSydorenkoJ please check on 3.17.1 version

@oleksandrSydorenkoJ
Copy link
Author

the same result for 3.17.1

[2024-03-05 12:14:14.354] [config] [warning] Node:21:Thread:140545280173824:ptr<std::createBlockCatchupResponse has been stuck for 24211 ms
[2024-03-05 12:14:15.354] [config] [warning] Node:21:Thread:140545280173824:CatchupServerAgent::processNextAvailableConnection has been stuck for 25264 ms
[2024-03-05 12:14:15.354] [config] [warning] Node:21:Thread:140545280173824:ptr<std::createBlockCatchupResponse has been stuck for 25211 ms
[2024-03-05 12:14:15.668] [21:main] [info] 1069569:RETURNED_CATCHUP_BLOCKS:15062:CRT:25525
[2024-03-05 12:14:15.726] [21:main] [error] 1069569:!Exception: CatchupServerAgent:Could not send serialized binary
[2024-03-05 12:14:15.726] [21:main] [error] 1069569: !Caused by: IO:Destination unexpectedly closed connection

@PolinaKiporenko
Copy link

PolinaKiporenko commented Mar 5, 2024

to unblock QA - prepare custom build with increase timeout to 8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working epic:archive-node
Projects
Status: No status
Development

No branches or pull requests

4 participants