Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bre] Use a local registry #13

Open
wants to merge 20 commits into
base: databricks
Choose a base branch
from

Conversation

gabrielrussoc
Copy link

@gabrielrussoc gabrielrussoc commented Nov 8, 2024

The main motivation of this pull request is to use docker pull instead of docker load for loading images.
Despite the name, we are not pulling anything from the network but instead we pull from a local binary we spin up with our images loaded.

This makes a lot of sense because docker pull is a really smart and optimised command that can pull only the missing layers and will avoid doing a lot of extra work.
docker load on the other hand is very simple and it requires tar ball with ALL the layers and it will always write them to the data directory regardless if the directory already has it or not. That's exactly why the code was so complicated and tried to optimize this tarball by only selecting the missing layers. All that is gone with docker pull.

At Databricks, we even tried our best to not docker load in universe by trying to check if the image was already in the daemon before calling these rules, and that's also now obsolete.

docker pull is also much better to work with RBE because it acknowledges the snapshotter and can do a better job with storage (see comments in the code).

Of course, the downside is that we have to maintain a local registry binary. However, it's a very small and straightforward binary that implements a battle tested API. It's not long lived and all it does is to store all the layers and serve them to docker pull when it asks. It can only serve the layers from a particular target.

macOS notice: Unfortunately this does not work well in macOS because we can't easily pull from local registries. It won't allow HTTP and it won't trust self signed HTTPS certs, unless users do a lot of manual configuration on their docker desktop. So we keep the original behaviour for macOS. See https://stackoverflow.com/questions/76034521/docker-log-in-to-local-registry-with-docker-desktop-for-mac.

tests

I ran multiple docker tests on both linux and macOS

bazel test //experimental/gabriel.russo:foo_docker --override_repository=io_bazel_rules_docker=$HOME/rules_docker --nocache_test_results

@gabrielrussoc gabrielrussoc changed the title Gabrielrussoc/local registry instead of load [bre] Use a local registry Nov 8, 2024
@gabrielrussoc gabrielrussoc force-pushed the gabrielrussoc/local-registry-instead-of-load branch from 5a6bb1e to f7ca769 Compare November 13, 2024 16:18
@@ -201,7 +201,7 @@ EOF

# On macOS, clean all xattrs from the files we're going to load.
if [ "$(uname)" == "Darwin" ]; then
echo "Cleaning xattrs from files on macOS..."
echo "Cleaning xattrs from files on macOS..." >&2
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinqian-db this log line actually broke macOS as I explained on Slack since it polluted the output
I'm just logging it to stderr to fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants