Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Selection of transition ID in finalize. #2596

Merged
merged 17 commits into from
Jan 17, 2025
Merged

Conversation

d0cd
Copy link
Collaborator

@d0cd d0cd commented Jan 16, 2025

Motivation

The motivating issue for this PR is here.

The original PR is here.

From the original discussion:

Fundamentally, while the call graph is constructed correctly, it is used in an unsound way to find the transition ID that corresponds to Future that is passed in as an operand to an await command (source). The child_transition_id linked above is used to initialize the FinalizeRegisters. The transition_id in FinalizeRegisters is only used to initialize an RNG in the rand instructions. The purpose is the ensure that on-chain logic (in finalize scopes) have a unique seed each time they are invoked.

The current implementation has the following issues:

  1. In the case where an async function calls a mix of async and non-async functions, a non-existent transition is used in the call graph. Specifically it must an async function call that itself makes an async function call that precedes a non-async call. This results in the transaction getting rejected. In this case, the impacted programs are ones that use a specific mix of async and non-async calls.
  2. In the case where only async calls are made, but are awaited in a different order than the call graph, an incorrect transition ID is used for the FinalizeRegisters. The only place where the transition ID is used is in the rand.chacha instruction. In this case, the impacted programs are ones that make async calls to programs with rand.chacha but await the futures in a different order.

There are a number of possible ways this could be addressed:

  1. Update the implementation to get the correct transition ID. This would require more information to be passed in during finalization.
  2. Add a transition ID to the Future. This would be a breaking change in the data format, but conceptually sound.
  3. Fix the use of the transition ID to not require the precise sub-transition, while maintaining the invariant that RNG seeds are unique.

This PR proposes a solution to this issue by introducing a nonce to FinalizeRegisters. This nonce is used to seed the rand commands. Furthermore, instead of attempting to determine the child_transition_id that corresponds to an awaited Future from the call graph, this approach uses the main transition ID to initialize all FinalizeRegisters for a given transaction. The main transition ID along with the nonce ensures that each finalize context uses a unique seed for the RNG, while removing the need to correctly determine the transition ID for a given Future (a complicated process).

Migration

This proposal has been written to migrate at N::CONSENSUS_V2_HEIGHT.
Given timelines, it's more likely that a new N::CONSENSUS_V3_HEIGHT would need to be introduced. The migration would follow the process introduced in ARC-0042.

Test Plan

This PR includes a test, whose expected output demonstrates the the failure and the fix after CONSENSUS_V2_HEIGHT is reached.

Included is the CI branch and the CI diff.

Note that the test_rand integration test expectation changed because the program integration tests now use the test feature which has different block metadata, thus changing the blockhash, thus changing the on-chain RNG seed.

Impact

As stated above, there are two classes of programs that are impacted by this issue:

  1. those that use a mix of async and non-async calls in a specific way.
  2. that make async calls to programs with rand.chacha but await the futures in a different order that the async functions were called.

In scanning all Aleo programs deployed on Mainnet as of 2/11/25 8AM PT:

  1. There are 13 functions among all programs deployed to mainnet, which contain a non-async call, followed by an async call. In the above functions, none of the async calls take a Future as input, thus do not exhibit this issue.

  2. There are 28 programs that use the rand.chacha command and 0 programs that import, and consequently, call them.

This was confirmed by a static analyzer on all programs on mainnet at the above date and time and double-checked by manual audit.

Related PRs

To Reviewers

An important function of this PR is to provide an understanding of how async execution mechanism works and why this PR is needed. If reviewers need context or clarification, please feel to post your questions in this thread. I am also happy to do a call explaining the original and proposed design/code.

Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM engineering-wise

Base automatically changed from feat/increase_validators to staging January 17, 2025 00:02
@raychu86 raychu86 merged commit 8038932 into staging Jan 17, 2025
2 checks passed
@raychu86 raychu86 deleted the fix/rng-finalize branch January 17, 2025 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants