Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New V1CheckpointLogReplayScanner #743

Open
sebastiantia opened this issue Mar 13, 2025 · 0 comments · May be fixed by #744
Open

New V1CheckpointLogReplayScanner #743

sebastiantia opened this issue Mar 13, 2025 · 0 comments · May be fixed by #744
Assignees
Labels
enhancement New feature or request

Comments

@sebastiantia
Copy link
Collaborator

sebastiantia commented Mar 13, 2025

Please describe why this is necessary.

These changes are part of checkpoint write support.

To build a V1 checkpoint, we need to:

  1. Perform Log Segment Replay to retrieve all action batches for the table's state
  2. Each batch has to be scanned and filtered to only include actions to be written to the V1 checkpoint file.

The FileActionsVisitor and NonFileActionsVisitor visitors need to be applied to each batch.

Introduce V1CheckpointLogReplayScanner, a component responsible for filtering actions during log replay to include only those necessary for constructing a V1 checkpoint with the new visitors.

Describe the functionality you are proposing.

This scanner should:

Retain only the most recent protocol and metadata actions.

Deduplicate transaction actions per app ID.

Remove duplicate file actions based on path and unique ID.

Exclude tombstones older than minimum_file_retention_timestamp.

Additionally, introduce v1_checkpoint_actions_iter to leverage this scanner when iterating through log actions.

Additional context

No response

@sebastiantia sebastiantia added the enhancement New feature or request label Mar 13, 2025
@sebastiantia sebastiantia changed the title New LogReplayForV1Checkpoint New LogReplayScannerForV1Checkpoint Mar 13, 2025
@sebastiantia sebastiantia changed the title New LogReplayScannerForV1Checkpoint New V1CheckpointReplayScanner Mar 13, 2025
@sebastiantia sebastiantia changed the title New V1CheckpointReplayScanner New V1CheckpointLogReplayScanner Mar 13, 2025
@sebastiantia sebastiantia self-assigned this Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant