Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: parallel plan and apply also in a single workspace (rebased) #5264

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

plentydone
Copy link

what

@finnag did all the real work here. I just rebased and regenerated mocks. I'm new to this codebase, but I've reviewed the code and it seems reasonable to me, and I've reviewed all the commits between October and today and I see no logical conflicts. If in fresh review any changes are needed, I'm happy to do so.

From the original PR description:

- Add more thorough locking around Clone() calls, covering all of these phases:
   Am I on the right commit
   Merge with upstream
   Clone if necessary
- Reduce the number of remote git operations when planning or applying in parallel
   Clean up the Clone() method, split into Clone() and MergeAgain()

For parallel mode to work, you must either set the environment variable `TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE` to something, or check in your .hcl files. Otherwise terraform cannot run in parallel.

why

The Clone call had several race conditions where it could miss clones or delete the working directory under running processes causing failures.

tests

I ran make test-all fmt lint

references

This is just #3670 rebased

@plentydone plentydone requested review from a team as code owners January 23, 2025 04:28
@plentydone plentydone requested review from jamengual, lukemassa and nitrocode and removed request for a team January 23, 2025 04:28
@github-actions github-actions bot added go Pull requests that update Go code provider/github provider/gitlab labels Jan 23, 2025
@dosubot dosubot bot added the feature New functionality/enhancement label Jan 23, 2025
@plentydone plentydone changed the title Parallel plans fix: parallel plan and apply also in a single workspace (rebased) Jan 23, 2025
@plentydone
Copy link
Author

I'm not sure what the etiquette is around signing off on this as the committer given that I just rebased someone else's work from their own MR.

@jamengual jamengual added the waiting-on-review Waiting for a review from a maintainer label Jan 23, 2025
@jamengual
Copy link
Contributor

I'm not sure what the etiquette is around signing off on this as the committer given that I just rebased someone else's work from their own MR.

just sign it with yours

We generate a list of all interesting directories, so we can target
the locks to the affected directories instead of using a (too) global lock

Signed-Off-By: Andrew Carter <[email protected]>
There is a race condition here where we test if we are current, and
only then if we are not current we grab the lock. In the meantime,
that information could be stale.

Extend the lock to cover all operations, and unconditionally wait for
the lock. We can't assume anything can be skipped if we have to wait for
the lock.

Signed-Off-By: Andrew Carter <[email protected]>
All Clone() calls that have signaled an interest in merging
before another Clone() checks whether a merge is necessary
can skip their own checks.

This should reduce the thundering herd problem at the
beginning of large paralell runs.

Signed-Off-By: Andrew Carter <[email protected]>
Clone is now a NOP if the PR has not changed, and loses its second
return value, the MergedAgain flag.

MergeAgain must be called explicitly in the only location that
cares about this flag, just before planning.

This cleans up the code for Clone and re-merging a bit.

Also regenerated mocks

Signed-Off-By: Andrew Carter <[email protected]>
@jseiser
Copy link

jseiser commented Jan 27, 2025

While we wait on this to get merged, we wanted to test running it locally. Are the images built, ever pushed to ghcr?

docker pull ghcr.io/runatlantis/atlantis:pr-5264-debian                                                         
Error response from daemon: manifest unknown

@jamengual
Copy link
Contributor

While we wait on this to get merged, we wanted to test running it locally. Are the images built, ever pushed to ghcr?

docker pull ghcr.io/runatlantis/atlantis:pr-5264-debian                                                         
Error response from daemon: manifest unknown

sha256:5f2412560425761d177fab5e1e878e3d9e08c0bce7a248d65eff88efde593b84

@jseiser
Copy link

jseiser commented Jan 27, 2025

sha256:5f2412560425761d177fab5e1e878e3d9e08c0bce7a248d65eff88efde593b84

Any chance you can paste syntax on this? Everything Im attempting is apparently wrong.

@jseiser
Copy link

jseiser commented Jan 28, 2025

 docker pull ghcr.io/runatlantis/atlantis@sha256:5f2412560425761d177fab5e1e878e3d9e08c0bce7a248d65eff88efde593b84                                                                                                                            
Error response from daemon: manifest unknown   

@jamengual
Copy link
Contributor

jamengual commented Jan 28, 2025 via email

@jseiser
Copy link

jseiser commented Feb 4, 2025

Any update on this?

@jseiser
Copy link

jseiser commented Feb 10, 2025

@jamengual

We just had an almost 2 hours long production deployment, so we would really love to be able to test this image, but it still doesnt seem to be built?

Thanks,

@jamengual
Copy link
Contributor

@jamengual

We just had an almost 2 hours long production deployment, so we would really love to be able to test this image, but it still doesnt seem to be built?

Thanks,

I just updated the branch, it should build the new image in a few minutes.

Signed-off-by: Luke Massa <[email protected]>
Signed-off-by: Luke Massa <[email protected]>
@lukemassa
Copy link
Contributor

FYI There were a few issues with the PR; firstly an additional callsite to Clone() added since this was written that I adjusted to expect the new number of results, secondly a test was failing, also due to the change in return values from Clone. I just lending a hand fixing up the PR

@jseiser
Copy link

jseiser commented Feb 11, 2025

Just wanted to report. We were able to deploy this image

ghcr.io/runatlantis/atlantis:dev-debian-cf0b3f3

We then created a scaled down reproduction of our monorepo. Each bottom folder you see, is a state.

└── fake
    ├── global
    │   └── user
    └── us-gov-east-1
        ├── deploys
        │   ├── f1
        │   │   ├── child
        │   │   └── parent
        │   └── f2
        │       ├── child
        │       └── parent
        └── ops
            └── network

We created a simple atlantis.yml

version: 3
automerge: true
autodiscover:
  mode: disabled
delete_source_branch_on_merge: true
parallel_plan: true
parallel_apply: true
abort_on_execution_order_fail: true
projects:
  - name: fake/global/user
    dir: fake/global/user
    execution_order_group: 1

  - name: fake/us-gov-east-1/ops/network
    dir: fake/us-gov-east-1/ops/network
    execution_order_group: 2

  - name: fake/us-gov-east-1/deploys/f1/parent
    dir: fake/us-gov-east-1/deploys/f1/parent
    execution_order_group: 3

  - name: fake/us-gov-east-1/deploys/f2/parent
    dir: fake/us-gov-east-1/deploys/f2/parent
    execution_order_group: 3

  - name: fake/us-gov-east-1/deploys/f1/child
    dir: fake/us-gov-east-1/deploys/f1/child
    execution_order_group: 4

  - name: fake/us-gov-east-1/deploys/f2/child
    dir: fake/us-gov-east-1/deploys/f2/child
    execution_order_group: 4

Things worked perfectly. On initial deployment, the plans happened in order, but would fail since we are calling the state of of 1 into order group 2 etc. Once things were deployed, creating a 2nd MR to update all the tags happened in the proper order. We also did not get any errors related to multiple states trying to access providers etc.

This would be a huge workflow improvement for our organization.

Copy link
Contributor

@jamengual jamengual left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we ahh documentation for this and maybe some examples on how to use it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality/enhancement go Pull requests that update Go code provider/github provider/gitlab waiting-on-review Waiting for a review from a maintainer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants