-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish Observability SRE images to internal container registry #17401
base: feature/fedramp-high-8.x
Are you sure you want to change the base?
Publish Observability SRE images to internal container registry #17401
Conversation
c27ea34
to
1426454
Compare
This commit adds a step to the pull_request_pipeline buildkite definition to push a docker image to the elastic container registry. It is added here to show that we have the proper creds etc in CI to push the container where it needs to go. We will likely move this into the DRA pipeline once we are confident it is pushing to the correct place with a naming convention that works for all consumers/producers. The general idea is to build the container with our gradle task, then once we have that image we can tag it with the git sha and a "latest" identifier. This would allow consumers to choose between an exact sha for a stream like 8.19.0 or the "latest". I will also need to factor in the case where we have the tag *without* the sha postfix. Obviously we will want to fold this in to the existing DRA pipeline for building/staging images but for now it seems reasonable to handle this separately.
1426454
to
297226b
Compare
At this time the POC is successfully pushing (see passing build in buildkite) I can verify that with proper docker login i can pull the image:
Open questions:
|
If possible a separate set of jobs (one for daily snapshot) another for the release would be best since this is not tied to DRA and maybe we don't want to allow this new flow to impact DRA if something goes wrong.
My first reaction was to only build what's necessary for SRE, but having been testing this locally myself it's painful to use the x64 when our laptops are aarch64, so I'd suggest building both.
for the current non-fips snapshot images we do:
So I suggest we do the same naming scheme. and then for staging images:
So we'd do:
|
@jsvd thanks for the review! After starting to make this a separate workflow i realized just how much duplication that would add. Instead I opted to add this to DRA pipeline with the explicit guiding principle of "do not interfere with existing artifact generation/publishing". I ended up adding a step at the very end of the pipeline that is configured to "soft fail". This should ensure that we dont interfere with existing publishing while we iterate on this. As far as the naming... I'm having trouble understanding the pattern we expect. What i came up with is this: For snapshot: For staging: The idea there being that for "snapshot" we will run frequently and not want to overwrite images, hence giving them a sha tag. For staging we run this infrequently and ultimately want only one image for the version there. I decided to also include the sha too so we can track any history of images pushed there. The main thing i'm thinking of here is that as a consumer of the image I would want to use a tag like |
This commit takes the POC from the pull_request_pipeline and adds it to the DRA pipeline. Noteably, we take care to not disrupt anything about the existing DRA pipeline by making this wait until after the artifacts are published and we set a soft_fail. While this is being introduced and stabilized we want to ensure the existing DRA pipeline continues to work without interruption. As we get more stability we can look at a tigther integration.
457aba0
to
8cc2b90
Compare
DRA snapshot build: https://buildkite.com/elastic/logstash-dra-snapshot-pipeline/builds/2555 (note i ommitted the depends_on for testing so i dont have to wait an hour for publishing to happen). |
Eventually we will want to do proper annotations with manifests but for now just add arch to the tag.
Regarding architecture... Currently i'm building and pushing for each architecture, however this creates a race condition where the slowest job seems to overwrite the image in the registry. In order to get around this for now i've added architecture into the name of the image tag. I'm not sure how the release manager handles this. One option would be to explicitly publish manifest information after images exist, but i'm not sure how to coordinate that across machines at this point. I figured that for now just requiring architecture in the name would solve our immediate needs. |
I'm good with that, that's exactly the tradeoff I was expecting we'd analyze and make an informed call on.
I believe we want both. e.g. the non-sha label convenience of doing
Currently our non fips images go through three steps:
For snapshot daily builds I'd expect the tag to have
For staging builds I'd expect a -sha, non-SNAPSHOT tag:
For promotion of RCs to GA, I'd expect a final job would add an extra non-sha'ed label to the latest staging build:
We should be able to push the individual arch-named images upstream and them create a manifest for the generic label encompassing the arch-named images, as described in https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/, in the chapter
|
This commit refactors the POC pipeline for pushing observabilty SRE containers to handle conflicts for tags based on target architectures. Cells with respective architectures build containers and push to the container registry with a unique identifier. Once those exist we introduce a separate step to use the docker manifest command to annotate those images such that a container client can download the correct image based on architecture. As a result for every artifact there will be 2 images pushed (one for each arch) and N manifests pushed. The manifests will handle the final naming that the consumer would expect.
I refactored the workflow to build and push architecture specific images then use So for example in this build https://buildkite.com/elastic/logstash-dra-snapshot-pipeline/builds/2579 The following containers are built and pushed:
Once those jobs are done a new step is added which adds the following manifests:
I verified locally:
|
Regarding the naming... I'm not quite following why the sha would come before snapshot. I'm using the qualified version script to create the version. It seems like that is pretty standard and it is responsible for adding the So for the stage where we actually build and push a container we get a unique ID with [VERSION][SHA][ARCH] Then in the manifest stage we construct the unique [VERSION][SHA] and [VERSION] (which will be the de-facto "latest"). |
I was just following the current naming convention we use for the snapshot and staging builds, as seen in https://artifacts-snapshot.elastic.co/logstash/8.18.0-15c9af3b/summary-8.18.0-SNAPSHOT.html, for example:
We can see in these examples:
|
I'm really struggling to understand this... I'll write out what i'm looking at and maybe you can help me wrap my head around it... So, for a snapshot build (lets stick with 8.18) i see https://buildkite.com/elastic/logstash-dra-snapshot-pipeline/builds/2580 This was triggered as a scheduled build for logstash branch 8.18 on sha According to that link there should be the following images:
I do not understand where the I want to be able to follow a pattern for this, but i'm just not understanding the pattern to follow 😅 . As mentioned and implemented so far the pattern I was proposing is to create two tags:
The "qualified version" comes from the shared https://github.com/elastic/logstash/blob/main/.buildkite/scripts/common/qualified-version.sh script and the unique identifier is the sha of the commit the container is built from. |
In order to follow more closely the existing tagging scheme this commit refactors the naming for images to include the build sha BEFORE the SNAPSHOT identifier. WHile this does not exactly follow the whole system that exists today for container images in DRA it follows a pattern that is more similar. Ideally we can iterate to fold handling of this container into DRA and in that case consumers would not need to update their patterns for identifying images.
d14ba95
to
676bf86
Compare
|
💚 Build Succeeded
History
|
https://buildkite.com/elastic/logstash-dra-snapshot-pipeline/builds/2588 shows a build with the updated naming as discussed in slack. |
Release notes
[rn:skip]
What does this PR do?
Add pipelines for building and shipping the observability SRE image. Specifically this adds a step to the DRA pipeline that largely follows the existing patterns and steps for doing artifact publishing. The noteable difference is that currently we directly do a docker build and docker push workflow instead of the existing build/staging workflow with the release manager. A design goal for this iteration is to ensure that this new step does not interfere with any existing DRA steps. As such we ensure this happens after DRA is published and that failures are marked with a soft_fail option so as to not interrupt any artifact publishing while we stabilize and iterate on this workflow.
Why is it important/What is the impact to the user?
N/A
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have added tests that prove my fix is effective or that my feature worksRelated issues