Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 21.12.0 #1193

Closed
3 tasks done
chadwhitacre opened this issue Dec 7, 2021 · 69 comments
Closed
3 tasks done

Release 21.12.0 #1193

chadwhitacre opened this issue Dec 7, 2021 · 69 comments

Comments

@chadwhitacre
Copy link
Member

chadwhitacre commented Dec 7, 2021

22.1.0

This release carries more risk than usual because of #796. I'm making a ticket here to remember some things to pay attention to.

@chadwhitacre
Copy link
Member Author

The changelog for tomorrow is actually going to be tricky because we don't have a PR related to #1196, and craft's auto mechanism assumes PRs/commits for every changelog entry. I read craft code and tested out putting an "Unreleased" section but that simply gets updated to the new version without actually listing all of the new material.

@chadwhitacre
Copy link
Member Author

chadwhitacre commented Dec 15, 2021

What I am going to try is manually modifying the CHANGELOG.md on the release/21.12.0 branch after the prepare action is run but before ... oh wait, CalVer releases are auto-accepted. I guess we need a release blocker? Nope, that prevents event the action-prepare-release from completing.

Okay, plan C: Make a dummy PR related to the milestone in order to work with the existing changelog infra.

@chadwhitacre
Copy link
Member Author

chadwhitacre commented Dec 15, 2021

prep milestones

Seeing these to account for:

https://github.com/getsentry/self-hosted/milestone/8

https://github.com/getsentry/self-hosted/milestone/9

https://github.com/getsentry/self-hosted/milestone/10

Skip

Merp

  • Revert "Rename onpremise to self-hosted" (5495fe2)
  • Rename onpremise to self-hosted (9ad05d8)

@chadwhitacre
Copy link
Member Author

Bloop. Can't retroactively skip commits (re: garbage 5495fe2 9ad05d8).

@chadwhitacre
Copy link
Member Author

$ craft prepare --no-push 21.12.0
$ head -n45 CHANGELOG.md

Changelog

21.12.0

Support Docker Compose v2 (ongoing)

Self-hosted Sentry mostly works with Docker Compose v2 (in addition to v1 >= 1.28.0). There is one more bug we are trying to squash.

By: @chadwhitacre (#1179)

Prevent Component Drift

When a user runs the install.sh script, they get the latest version of the Sentry, Snuba, Relay and Symbolicator projects. However there is no guarantee they have pulled the latest self-hosted version first, and running an old one may cause problems. To mitigate this, we now perform a check during installation that the user is on the latest commit if they are on the master branch. You can disable this check with --skip-commit-check.

By: @chadwhitacre (#1191), @aminvakil (#1186)

React to log4shell

Self-hosted Sentry is not vulnerable to the log4shell vulnerability.

By: @chadwhitacre (#1203)

Forum → Issues

In the interest of reducing sources of truth and restarting the fire of the self-hosted Sentry community, we deprecated the Discourse forum in favor of GitHub Issues.

By: @chadwhitacre (#1167, #1160, #1159)

Rename onpremise to self-hosted (ongoing)

In the beginning we used the term "on-premise" and over time we introduced the term "self-hosted." In an effort to regain some consistency for both branding and developer mental overhead purposes we are standardizing on the term "self-hosted." This release includes a fair portion of the work in code towards this, hopefully a future release will include the remainder. The effect of this as a self-hosted user will be that you will see orphaned blah blah blah you need to clean those up floo flah how

By: @chadwhitacre (#1169)

Add support for custom DotEnv file

There are several ways to configure self-hosted Sentry and one of them is the .env file. In this release we add support for a .env.custom file that is git-ignored to make it easier for you to override keys configured this way with custom values. Thanks to @Sebi94nbg for the contribution!

By: @Sebi94nbg (#1113)

Various fixes & improvements

@chadwhitacre
Copy link
Member Author

Gosh shouldn't we link to the changelogs for all of the components, as well? I'm thinking of #1131.

@chadwhitacre
Copy link
Member Author

Milestones ready.

@BYK
Copy link
Member

BYK commented Dec 15, 2021

@chadwhitacre you could have blocked the release, prepare locally, edit the Changelog in the release branch, and then finalize the release.

I think we should document this process and potentially find a way to easily edit the changelog before it goes out.

The issue is automation and manual editing are things are at odds so the best I can think of it is having a mechanism like you offered: allow having an unreleased section which gets prepended (or appended?) to the auto-generated changelog.

Happy to dive into Craft code after a while if this sounds like a good idea.

@chadwhitacre
Copy link
Member Author

Sentry failure:

Error:  Validation Failed: ***"resource":"Release","code":"custom","field":"body","message":"body is too long (maximum is 125000 characters)"***

Presuming this has to do with the changelog.

@chadwhitacre
Copy link
Member Author

Snuba and Relay are out, fwiw.

@chadwhitacre
Copy link
Member Author

@chadwhitacre chadwhitacre added the release-blocker Any issue open with this tag will stop CalVer releases from happening label Dec 15, 2021
@chadwhitacre
Copy link
Member Author

Here's the failing call:

      const created = await this.github.repos.createRelease(
        createReleaseParams
      );

Here's the payload:

    const createReleaseParams = {
      draft: false,
      name: tag,
      owner: this.githubConfig.owner,
      prerelease: isPreview,
      repo: this.githubConfig.repo,
      tag_name: tag,
      target_commitish: revision,
      ...changes,
    };

...changes 🧐

@chadwhitacre
Copy link
Member Author

@chadwhitacre
Copy link
Member Author

chadwhitacre commented Dec 15, 2021

Okay, I did a little poking at getsentry/craft#327. I'm not going to try to engineer a fix for that atm.

Also, I have a review on getsentry/craft#335, but I don't think I'm going to try to use that for this release, in the interest of minimizing variability.

I think my plan at this point is to:

@BYK
Copy link
Member

BYK commented Dec 15, 2021

Manually modify the changelog on the release/21.12.0 branch in sentry. Due to 327 it's not going to get merged back to master anyway. 🙄

This is an excellent workaround. You can still use the publish flow for this.

Manually make a PR to sentry after the release to update the changelog for the last three releases (including today's).

Is this because of the wrong merge base?

@BYK
Copy link
Member

BYK commented Dec 15, 2021

Re-accept publish: getsentry/[email protected] publish#700. As I read/understand it craft will pick up with a pre-existing branch in whatever state it's in.

Yes, it will use the existing branch.

@chadwhitacre
Copy link
Member Author

You can still use the publish flow for this.

Meaning "Re-accept [the publish ticket]"?

Is this because of the wrong merge base?

Yes.

chadwhitacre added a commit to getsentry/sentry that referenced this issue Dec 15, 2021
@chadwhitacre
Copy link
Member Author

@chadwhitacre
Copy link
Member Author

CI is green for the CHANGES repair on the branch, I reaccepted the ticket. Technically this will be a different artifact than what we uploaded to PyPI and Docker, but the only delta is in the changelog so meh.

@chadwhitacre
Copy link
Member Author

@chadwhitacre
Copy link
Member Author

Was about to say ... need to whack the old 21.12.0 tag ...

@chadwhitacre
Copy link
Member Author

Here's the original 21.12.0 tag for reference.

@chadwhitacre
Copy link
Member Author

It was probably in a Slack thread that is gone now. Reason to do this here in GitHub instead.

@chadwhitacre
Copy link
Member Author

Oh gosh, I think I ended up recreating the release object in GitHub manually? And it was a total cluster?

@chadwhitacre
Copy link
Member Author

Yeah it was last month, this one has my name on it:

https://github.com/getsentry/sentry/releases/tag/21.11.0

@chadwhitacre
Copy link
Member Author

Sweet! Slack thread still exists! 👍

@chadwhitacre
Copy link
Member Author

I'm going to start with relay since it's simpler. My plan is to delete the 21.11.0 release and rerun the release via publish repo.

Going to do the same here.

@chadwhitacre
Copy link
Member Author

@chadwhitacre
Copy link
Member Author

chadwhitacre commented Dec 15, 2021

Failed again ...

@chadwhitacre
Copy link
Member Author

Forgot to remove the tag.

@chadwhitacre
Copy link
Member Author

@chadwhitacre
Copy link
Member Author

Made it past GitHub. 👍

@chadwhitacre
Copy link
Member Author

Okay! Relay is really out this time.

@chadwhitacre
Copy link
Member Author

Back to self-hosted ...

@chadwhitacre
Copy link
Member Author

Exfiltrating the steps to recover from "Cannot upload asset" for sentry in 21.11.0 as it was much more complicated:

On to Sentry ...
Tag and release deleted.
Hrm, actually, the publish ticket claims to have already published everything but GitHub.
I'm going to retag 21.11.0 and rerun.
Fail.
😞
Reference "refs/tags/21.11.0" already exists. Does tag "21.11.0" already exist?
If the tag exists it also wants the release to exist.
If the release were still there it would proceed, and delete assets before re-uploading.
Since the tag would be in the same place the assets would end up the same as on the previous run.
So I either recreate the GH release object manually, or remove ...................................
I'm unraveling how SHA gets passed from the publish issue to the craft publish invocation.
I think I need to delete the tag and rerun. I think it pulls the commit from the ... no, it doesn't.
It deploys master every time, afaict.
I'm going to try manually creating a release object for 21.11.0 and rerunning to populate with assets.
I stubbed it out, I will see about a changelog later.
sentry artifacts are up, I'm going to restart onpremise and then work on sentry changelog.
Rerunning test suite for onpremise.
[changelog blah blah]
onpremise suite passed, rerunning publish for onpremise.
onpremise is out, so the release is done, except for the changelog on the sentry release.

Then I discovered getsentry/craft#327.

@chadwhitacre
Copy link
Member Author

Oh right #1171 ...

@chadwhitacre
Copy link
Member Author

chadwhitacre commented Dec 15, 2021

Oh right it was missing relay 21.12.0 ... 😅

@chadwhitacre
Copy link
Member Author

@chadwhitacre
Copy link
Member Author

Green! Re-accepting getsentry/publish#704 ...

@chadwhitacre
Copy link
Member Author

Done.

@chadwhitacre
Copy link
Member Author

chadwhitacre commented Dec 15, 2021

Post-Mortem

This was anecdotally the most difficult self-hosted release since I started a year ago. Issues:

  1. snuba - no issues
  2. sentry-docs
    1. https://github.com/getsentry/publish/issues/705
  3. relay
    1. silent failure - not sure why
      1. if recurs frequently debug/address
      2. bare minimum: confirm that release actually exists once publish ticket closes
    2. intermittent asset upload failure from GitHub
      1. worked around by deleting the tag and release object and re-accepting the issue
      2. case contributing to Tag more carefully in GitHub target craft#336
      3. probably worth retrying ... Improve GitHub asset upload retry algorithm craft#337
  4. sentry
    1. Changelog too large, failed to create GitHub release object
      1. Cut a new craft release with Limit the number of leftovers listed craft#335 before next self-hosted release
    2. Last step of publish is selecting the wrong merge target craft#327
      1. Make manual PRs to update changelog post-release
    3. INC-87 - delete/recreate tag while addressing (i) led to corrupted pip cache in test image for getsentry, resulting in red CI
      1. New issue: Tag more carefully in GitHub target craft#336
  5. self-hosted
    1. Friction around changelog due to log4j vulnerability (CVE-2021-44228) - not vulnerable #1196 and garbage commits
      1. add release-blocker to the release ticket (i.e., this one), manually edit changelog on release branch, and proceed
    2. Slowed down by CI flakes under "Ensure cleanup crons are working" #1171

General pattern: something something tags something something. I feel like a number of failure modes end up with us having to recreate a tag, and this can actually be catastrophic (inc-87) ... getsentry/craft#336

@BYK
Copy link
Member

BYK commented Dec 16, 2021

relay
silent failure - not sure why

This looks like some terrible GitHub mishap. I actually got the Relay release email so my guess is either the job or the logs are corrupted. I'd file a support ticket with a link so they can investigate before the logs disappear.

bare minimum: confirm that release actually exists once publish ticket closes

I don't think you need an extra step for this as Craft should crash itself causing the job to fail. If the release is out, self-hosted release would fail anyway.

add release-blocker to the release ticket (i.e., this one), manually edit changelog on release branch, and proceed

I think this process needs to be documented and maybe even streamlined.

@chadwhitacre
Copy link
Member Author

I'd file a support ticket

Done. I cc'd you. 👍

self-hosted release would fail anyway

Yeah, that's in fact how I noticed that relay had not succeeded (even though it seemed it had). Hopefully a rare case. Not a big action item here more just a note to self.

I think this process needs to be documented and maybe even streamlined.

getsentry/publish#710 👍

@chadwhitacre
Copy link
Member Author

Sharing out GH support thread for posterity:

Hi Chad,

Since you are running the Docker action docker://getsentry/craft:latest, the runner will simply run docker run getsentry/craft:latest and wait for docker to exit. The runner will mark the step succeed as long as the entry process for the step exit with return code 0.

You might want to add some trace and debug for the container to see why it is exiting without properly erroring out. One thing we've noticed is the bad run seems to go on for a long time compared to the good one.

unnamed
unnamed (1)

25m vs 2m

Best,

[GH support]

Could this be some system-level container killer scenario? Or a timeout?

@BYK

Even if it is a system-level timeout or reaper, it seems that they are making us responsible for the container taking so long. Leaving this here for future discoverability in case of a repeat, but I don't think we need to take any further action now.

@chadwhitacre
Copy link
Member Author

My original post to them for context:

We saw behavior yesterday where a GitHub Action was marked successful but did not run to completion.

Here is the bad run, showing a truncated log:

[debug] [[status-provider/github]] Got status "success" for revision 91e58909b0dfeb3bbc464a12f10a2c1f714ced6b
[info] [[status-provider/github]] Revision 91e58909b0dfeb3bbc464a12f10a2c1f714ced6b has been built successfully.
[debug] [[artifact-provider/github]] Fetching artifact list for revision `91e58909b0dfeb3bbc464a12f10a2c1f714ced6b`.
[info] [[artifact-provider/github]] Fetching Github artifacts for getsentry/relay, revision 91e58909b0dfeb3bbc464a12f10a2c1f714ced6b
[debug] GET /repos/getsentry/relay/actions/artifacts?per_page=100&page=0 - 200 in 286ms
[debug] [[artifact-provider/github]] Requesting archive URL from Github...
[debug] GET /repos/getsentry/relay/actions/artifacts/126993957/zip - 200 in 20299ms
[debug] [[artifact-provider/github]] Downloading ZIP from Github artifacts...

Here is a good run for comparison, showing the continuation of the log:

[debug] [[status-provider/github]] Got status "success" for revision 2bef2ac8e891048e1f048cba1b5b7ec68d8a2192
[info] [[status-provider/github]] Revision 2bef2ac8e891048e1f048cba1b5b7ec68d8a2192 has been built successfully.
[debug] [[artifact-provider/github]] Fetching artifact list for revision `2bef2ac8e891048e1f048cba1b5b7ec68d8a2192`.
[info] [[artifact-provider/github]] Fetching Github artifacts for getsentry/relay, revision 2bef2ac8e891048e1f048cba1b5b7ec68d8a2192
[debug] GET /repos/getsentry/relay/actions/artifacts?per_page=100&page=0 - 200 in 298ms
[debug] [[artifact-provider/github]] Requesting archive URL from Github...
[debug] GET /repos/getsentry/relay/actions/artifacts/114545885/zip - 200 in 20178ms
[debug] [[artifact-provider/github]] Downloading ZIP from Github artifacts...
[info] [[artifact-provider/github]] Finished downloading.
[debug] [[artifact-provider/github]] Extracting "/tmp/craft-1IJdRJnUoFR6o" to "/tmp/craft-ruzZTh"...
[debug] [[artifact-provider/github]] Found 6 artifacts.
[info]  
[info] Available artifacts: 
┌──────────────────────────────┬──────────┬─────────┬─────────────┐
│ File Name                    │ Size     │ Updated │ ContentType │
├──────────────────────────────┼──────────┼─────────┼─────────────┤
│ relay-Darwin-x86_64          │ 30.98 MB │         │             │
├──────────────────────────────┼──────────┼─────────┼─────────────┤
│ relay-Darwin-x86_64-dsym.zip │ 58.00 MB │         │             │
├──────────────────────────────┼──────────┼─────────┼─────────────┤
│ relay-Linux-x86_64           │ 20.68 MB │         │             │
├──────────────────────────────┼──────────┼─────────┼─────────────┤
│ relay-Linux-x86_64-debug.zip │ 77.97 MB │         │             │
├──────────────────────────────┼──────────┼─────────┼─────────────┤
│ relay-Windows-x86_64-pdb.zip │ 35.34 MB │         │             │
├──────────────────────────────┼──────────┼─────────┼─────────────┤
│ relay-Windows-x86_64.exe     │ 18.52 MB │         │             │
└──────────────────────────────┴──────────┴─────────┴─────────────┘

The expected output of the run is a new release entry here. For the bad run, no release appeared on that page ...

... but! A collaborator reported that they did receive the release email for the bad run:

I actually got the Relay release email so my guess is either the job or the logs are corrupted. I'd file a support ticket with a link so they can investigate before the logs disappear.

... and here we are. :-)

@github-actions github-actions bot locked and limited conversation to collaborators Jan 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants