Skip to content

Commit

Permalink
Add back support for azblob cache
Browse files Browse the repository at this point in the history
Signed-off-by: Pranav Pandit <[email protected]>
  • Loading branch information
vangarp committed Feb 13, 2025
1 parent 6702365 commit 1d5af10
Show file tree
Hide file tree
Showing 154 changed files with 24,266 additions and 6 deletions.
5 changes: 4 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ ARG DNSNAME_VERSION=v1.3.1
ARG NYDUS_VERSION=v2.2.4
ARG MINIO_VERSION=RELEASE.2022-05-03T20-36-08Z
ARG MINIO_MC_VERSION=RELEASE.2022-05-04T06-07-55Z
ARG AZURITE_VERSION=3.18.0
ARG AZURITE_VERSION=3.33.0
ARG GOTESTSUM_VERSION=v1.9.0
ARG DELVE_VERSION=v1.23.1

Expand Down Expand Up @@ -413,6 +413,9 @@ RUN apk add --no-cache shadow shadow-uidmap sudo vim iptables ip6tables dnsmasq
ARG NERDCTL_VERSION
RUN curl -Ls https://raw.githubusercontent.com/containerd/nerdctl/$NERDCTL_VERSION/extras/rootless/containerd-rootless.sh > /usr/bin/containerd-rootless.sh \
&& chmod 0755 /usr/bin/containerd-rootless.sh
ARG AZURITE_VERSION
RUN apk add --no-cache nodejs npm \
&& npm install -g azurite@${AZURITE_VERSION}
# The entrypoint script is needed for enabling nested cgroup v2 (https://github.com/moby/buildkit/issues/3265#issuecomment-1309631736)
RUN curl -Ls https://raw.githubusercontent.com/moby/moby/v25.0.1/hack/dind > /docker-entrypoint.sh \
&& chmod 0755 /docker-entrypoint.sh
Expand Down
50 changes: 50 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ Join `#buildkit` channel on [Docker Community Slack](https://dockr.ly/comm-slack
- [Local directory](#local-directory-1)
- [GitHub Actions cache (experimental)](#github-actions-cache-experimental)
- [S3 cache (experimental)](#s3-cache-experimental)
- [Azure Blob Storage cache (experimental)](#azure-blob-storage-cache-experimental)
- [Consistent hashing](#consistent-hashing)
- [Metadata](#metadata)
- [Systemd socket activation](#systemd-socket-activation)
Expand Down Expand Up @@ -589,6 +590,55 @@ Other options are:
* `manifests_prefix=<prefix>`: set global prefix to store / read manifests on s3 (default: `manifests/`)
* `name=<manifest>`: name of the manifest to use (default `buildkit`)

#### Azure Blob Storage cache (experimental)

```bash
buildctl build ... \
--output type=image,name=docker.io/username/image,push=true \
--export-cache type=azblob,account_url=https://myaccount.blob.core.windows.net,name=my_image \
--import-cache type=azblob,account_url=https://myaccount.blob.core.windows.net,name=my_image
```

The following attributes are required:
* `account_url`: The Azure Blob Storage account URL (default: `$BUILDKIT_AZURE_STORAGE_ACCOUNT_URL`)

Storage locations:
* blobs: `<account_url>/<container>/<prefix><blobs_prefix>/<sha256>`, default: `<account_url>/<container>/blobs/<sha256>`
* manifests: `<account_url>/<container>/<prefix><manifests_prefix>/<name>`, default: `<account_url>/<container>/manifests/<name>`

Azure Blob Storage configuration:
* `container`: The Azure Blob Storage container name (default: `buildkit-cache` or `$BUILDKIT_AZURE_STORAGE_CONTAINER` if set)
* `blobs_prefix`: Global prefix to store / read blobs on the Azure Blob Storage container (`<container>`) (default: `blobs/`)
* `manifests_prefix`: Global prefix to store / read blobs on the Azure Blob Storage container (`<container>`) (default: `manifests/`)

Azure Blob Storage authentication:

There are 2 options supported for Azure Blob Storage authentication:

* Any system using environment variables supported by the [Azure SDK for Go](https://docs.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication). The configuration must be available for the buildkit daemon, not for the client.
* Secret Access Key, using the `secret_access_key` attribute to specify the primary or secondary account key for your Azure Blob Storage account. [Azure Blob Storage account keys](https://docs.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage)

> [!NOTE]
> Account name can also be specified with `account_name` attribute (or `$BUILDKIT_AZURE_STORAGE_ACCOUNT_NAME`)
> if it is not part of the account URL host.
`--export-cache` options:
* `type=azblob`
* `mode=<min|max>`: specify cache layers to export (default: `min`)
* `min`: only export layers for the resulting image
* `max`: export all the layers of all intermediate steps
* `prefix=<prefix>`: set global prefix to store / read files on the Azure Blob Storage container (`<container>`) (default: empty)
* `name=<manifest>`: specify name of the manifest to use (default: `buildkit`)
* Multiple manifest names can be specified at the same time, separated by `;`. The standard use case is to use the git sha1 as name, and the branch name as duplicate, and load both with 2 `import-cache` commands.
* `ignore-error=<false|true>`: specify if error is ignored in case cache export fails (default: `false`)

`--import-cache` options:
* `type=azblob`
* `prefix=<prefix>`: set global prefix to store / read files on the Azure Blob Storage container (`<container>`) (default: empty)
* `blobs_prefix=<prefix>`: set global prefix to store / read blobs on the Azure Blob Storage container (`<container>`) (default: `blobs/`)
* `manifests_prefix=<prefix>`: set global prefix to store / read manifests on the Azure Blob Storage container (`<container>`) (default: `manifests/`)
* `name=<manifest>`: name of the manifest to use (default: `buildkit`)

### Consistent hashing

If you have multiple BuildKit daemon instances, but you don't want to use registry for sharing cache across the cluster,
Expand Down
214 changes: 214 additions & 0 deletions cache/remotecache/azblob/exporter.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
package azblob

import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"time"

"github.com/Azure/azure-sdk-for-go/sdk/azcore"
"github.com/Azure/azure-sdk-for-go/sdk/azcore/to"
"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blob"
"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/bloberror"
"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/blockblob"
"github.com/Azure/azure-sdk-for-go/sdk/storage/azblob/container"
"github.com/containerd/containerd/v2/core/content"
"github.com/containerd/containerd/v2/pkg/labels"
"github.com/moby/buildkit/cache/remotecache"
v1 "github.com/moby/buildkit/cache/remotecache/v1"
"github.com/moby/buildkit/session"
"github.com/moby/buildkit/solver"
"github.com/moby/buildkit/util/bklog"
"github.com/moby/buildkit/util/compression"
"github.com/moby/buildkit/util/progress"
digest "github.com/opencontainers/go-digest"
"github.com/pkg/errors"
)

// ResolveCacheExporterFunc for "azblob" cache exporter.
func ResolveCacheExporterFunc() remotecache.ResolveCacheExporterFunc {
return func(ctx context.Context, g session.Group, attrs map[string]string) (remotecache.Exporter, error) {
config, err := getConfig(attrs)
if err != nil {
return nil, errors.Wrap(err, "failed to create azblob config")
}

containerClient, err := createContainerClient(ctx, config)
if err != nil {
return nil, errors.Wrap(err, "failed to create container client")
}

cc := v1.NewCacheChains()
return &exporter{
CacheExporterTarget: cc,
chains: cc,
containerClient: containerClient,
config: config,
}, nil
}
}

var _ remotecache.Exporter = &exporter{}

type exporter struct {
solver.CacheExporterTarget
chains *v1.CacheChains
containerClient *container.Client
config *Config
}

func (ce *exporter) Name() string {
return "exporting cache to Azure Blob Storage"
}

func (ce *exporter) Finalize(ctx context.Context) (map[string]string, error) {
config, descs, err := ce.chains.Marshal(ctx)
if err != nil {
return nil, err
}

for i, l := range config.Layers {
dgstPair, ok := descs[l.Blob]
if !ok {
return nil, errors.Errorf("missing blob %s", l.Blob)
}
if dgstPair.Descriptor.Annotations == nil {
return nil, errors.Errorf("invalid descriptor without annotations")
}
var diffID digest.Digest
v, ok := dgstPair.Descriptor.Annotations[labels.LabelUncompressed]
if !ok {
return nil, errors.Errorf("invalid descriptor without uncompressed annotation")
}
dgst, err := digest.Parse(v)
if err != nil {
return nil, errors.Wrap(err, "failed to parse uncompressed annotation")
}
diffID = dgst

key := blobKey(ce.config, dgstPair.Descriptor.Digest)

exists, err := blobExists(ctx, ce.containerClient, key)
if err != nil {
return nil, err
}

bklog.G(ctx).Debugf("layers %s exists = %t", key, exists)

if !exists {
layerDone := progress.OneOff(ctx, fmt.Sprintf("writing layer %s", l.Blob))
ra, err := dgstPair.Provider.ReaderAt(ctx, dgstPair.Descriptor)
if err != nil {
err = errors.Wrapf(err, "failed to get reader for %s", dgstPair.Descriptor.Digest)
return nil, layerDone(err)
}
if err := ce.uploadBlobIfNotExists(ctx, key, content.NewReader(ra)); err != nil {
return nil, layerDone(err)
}
layerDone(nil)
}

la := &v1.LayerAnnotations{
DiffID: diffID,
Size: dgstPair.Descriptor.Size,
MediaType: dgstPair.Descriptor.MediaType,
}
if v, ok := dgstPair.Descriptor.Annotations["buildkit/createdat"]; ok {
var t time.Time
if err := (&t).UnmarshalText([]byte(v)); err != nil {
return nil, err
}
la.CreatedAt = t.UTC()
}
config.Layers[i].Annotations = la
}

dt, err := json.Marshal(config)
if err != nil {
return nil, errors.Wrap(err, "failed to marshal config")
}

for _, name := range ce.config.Names {
if innerError := ce.uploadManifest(ctx, manifestKey(ce.config, name), bytesToReadSeekCloser(dt)); innerError != nil {
return nil, errors.Wrapf(innerError, "error writing manifest %s", name)
}
}

return nil, nil
}

func (ce *exporter) Config() remotecache.Config {
return remotecache.Config{
Compression: compression.New(compression.Default),
}
}

// For uploading manifests, use the Upload API which follows "last writer wins" sematics
// This is slightly slower than UploadStream call but is safe to call concurrently from multiple threads. Refer to:
// https://github.com/Azure/azure-sdk-for-go/issues/18490#issuecomment-1170806877
func (ce *exporter) uploadManifest(ctx context.Context, manifestKey string, reader io.ReadSeekCloser) error {
defer reader.Close()
blobClient := ce.containerClient.NewBlockBlobClient(manifestKey)

ctx, cnclFn := context.WithCancelCause(ctx)
ctx, _ = context.WithTimeoutCause(ctx, time.Minute*5, errors.WithStack(context.DeadlineExceeded))
defer cnclFn(errors.WithStack(context.Canceled))

_, err := blobClient.Upload(ctx, reader, &blockblob.UploadOptions{})
if err != nil {
return errors.Wrapf(err, "failed to upload blob %s: %v", manifestKey, err)
}

return nil
}

// For uploading blobs, use the UploadStream with access conditions which state that only upload if the blob
// does not already exist. Since blobs are content addressable, this is the right thing to do for blobs and it gives
// a performance improvement over the Upload API used for uploading manifests.
func (ce *exporter) uploadBlobIfNotExists(ctx context.Context, blobKey string, reader io.Reader) error {
blobClient := ce.containerClient.NewBlockBlobClient(blobKey)

uploadCtx, cnclFn := context.WithCancelCause(ctx)
uploadCtx, _ = context.WithTimeoutCause(uploadCtx, time.Minute*5, errors.WithStack(context.DeadlineExceeded))
defer cnclFn(errors.WithStack(context.Canceled))

// Only upload if the blob doesn't exist
_, err := blobClient.UploadStream(uploadCtx, reader, &blockblob.UploadStreamOptions{
BlockSize: IOChunkSize,
Concurrency: IOConcurrency,
AccessConditions: &blob.AccessConditions{
ModifiedAccessConditions: &blob.ModifiedAccessConditions{
IfNoneMatch: to.Ptr(azcore.ETagAny),
},
},
})

if err == nil {
return nil
}

if bloberror.HasCode(err, bloberror.BlobAlreadyExists) {
return nil
}

return errors.Wrapf(err, "failed to upload blob %s: %v", blobKey, err)
}

var _ io.ReadSeekCloser = &readSeekCloser{}

type readSeekCloser struct {
io.Reader
io.Seeker
io.Closer
}

func bytesToReadSeekCloser(dt []byte) io.ReadSeekCloser {
bytesReader := bytes.NewReader(dt)
return &readSeekCloser{
Reader: bytesReader,
Seeker: bytesReader,
Closer: io.NopCloser(bytesReader),
}
}
Loading

0 comments on commit 1d5af10

Please sign in to comment.