Skip to content

Files

164 lines (156 loc) · 11.4 KB
·

writing-a-reconciler.md

File metadata and controls

164 lines (156 loc) · 11.4 KB
·

Writing a Reconciler (or Watcher)

A Reconciler is a logical pattern which takes the state of a set of objects in the API server, and propagates changes to other components based on that state. This logic is asynchronous, and works by having a function which is called when a change occurs in the set of resources the reconciler is watching, which provides the current state of the object, and the type of change that occurred (Create, Update, Delete). For more information, see the Asynchronous Business Logic section in Platform Concepts.

Reconciler vs Watcher

Both reconcilers and watchers are used for the reconciliation process. Whether you use one or the other is down to preference, and use-case. Both reconcilers and watchers are powered by the same informer design within an InformerController, with just slightly different handling logic. They both have an Opinionated variant that can wrap the interface as well.

The major difference between a Reconciler and a Watcher is that a Reconciler has a single function, Reconcile, which is called for every event for a kind, while a Watcher has a function for each event type (Add, Update, Delete). There are more minor differences in how these events are handled as well:

  • A Reconciler can return a response with an explicit "retry after this time period" message, while a Watcher will only return success/fail (nil or error).
  • A Watcher will give you the previous state of the resource on an update event, while a Reconcile event will not.
  • A Reconciler can pass state between retries in-memory if storing state in the API server is failing, however, this will only persist until the operator is restarted and should not be relied upon excepting situations where the API server cannot be reached.

Considerations When Writing a Reconciler

When writing a reconciler, it's important to take a few things into consideration:

  • If you make an update to the object you're doing the reconcile (or watch) event for, this will trigger another reconcile (or watch) event. Generally, favor only updating subresources (specifically status) and some metadata in your reconcile (or watch) events, as a status update should not trigger the metadata.generation value to increase (only metadata.resourceVersion), which will allow you to filter events out. Using the operator.OpinionatedWatcher will filter these events for you, but you will need to track this yourself in a Reconciler; if you prefer not to use OpinionatedWatcher or want to do your own event filtering, keep in mind how updates within your reconcile loop will be received.
  • The reconciler is taking action on every consumed event. Finding ways to escape from a reconcile or watcher event early will help your overall program logic.
  • All objects for the kind(s) you are watching are cached to memory by default. This can be customized by using a different informer implementation, such as operator.CustomCacheInformer. Custom informers can be used in simple.App with AppConfig.InformerConfig.InformerSupplier, or by using your own custom app.App implementation.
  • Don't rely on retries to track operator state; use the status subresource to track operator success/failure, so that your operator can work out state from a fresh start (a restart will remove all pending retries, which are stored purely in-memory). This also allows a user to track operator status by viewing the status subresource.
  • If your reconcile process makes requests for other resources, consider caching, as high-traffic objects may cause your application to have to make these requests extremely frequently.
  • If your operator has a watcher or reconciler that updates the resource in a deterministic way (such as adding a label based on the spec), consider adding mutation for the kind on your App instead, as it makes that process synchronous and will never leave the object in an intermediate state (and reduces calls to the API server from your operator). Mutation can be added for a kind in simple.App with AppConfig.ManagedKinds[].Mutator, or by implementing the behavior in Mutate if you're using a custom app.App implementation (don't forget to add mutation in your manifest as well).
  • When you have multiple versions of a kind, your reconciliation should only deal with one of them (typically the latest), as events are always issued for any version as the version requested by the operator's watch (so a user creating a v1 version of a resource will still produce a v2 version of that resource in a watch request for the v2 of the kind).
  • CRD's have a built-in conversion mechanism that is roughly equivalent to running json.Marshal on the stored version and then json.Unmarshal into the requested version. If this is not good enough for your purposes, add conversion to your app (for simple.App, use AppConfig.Converters, or implement Convert if you're implementing app.App yourself. Don't forget to add conversion in your manifest as well).

An Example Reconciler

Let's consider an example reconciler for a kind defined by the CUE:

{
	kind: "MyKind"
	current: "v1"
	versions: {
		"v1": {
			schema: {
				spec: {
					someInfo: string
					otherInfo: string
				}
				status: {
					lastAppliedGeneration: int
				}
			}
		}
	}
}

We want to build a reconciler that will send someInfo and otherInfo to some other system, but we only need to do this if someInfo or otherInfo change. Since there are ways a resource can change without the contents of spec being altered, we need a way to track if the current spec has been applied. To do this here, we track lastAppliedGeneration in the status. generation is a kubernetes API object metadata property which increments when the spec changes, so we can use it to check if a given request has a different version of the spec than one we've already processed.

func NewMyKindReconciler(infoClient InfoClient, store *resource.TypedStore[*v1.MyKind]) operator.Reconciler {
	// operator.TypedReconciler implements operator.Reconciler but calls ReconcilerFunc with an operator.TypedReeconcileRequest 
	// rather than an operator.ReconcileRequest, avoiding the need to cast resource.Object into our go type. 
	// We could also have a struct which implements operator.Reconciler, but this is easier for simple things.
	return &operator.TypedReconciler[*v1.MyKind]{ 
		ReconcileFunc: func(ctx context.Context, req operator.TypedReconcileRequest[*v1.MyKind]) (operator.ReconcileResult, error) {
			logging.FromContext(ctx).Info("Reconcile request", "name", req.Object.GetName(), "action", operator.ResourceActionFromReconcile(req.Action), "generation", req.Object.GetGeneration())
			// If we're deleting the object, tell InfoClient
			if req.Action == operator.ReconcileActionDeleted {
				err := infoClient.Delete(req.Object.GetNamespace(), req.Object.GetName())
				return operator.ReconcileResult{}, err
			}
			// If the last applied generation matches the current generation of the resource, we can ignore this reconcile request
			if req.Object.GetGeneration() == req.Object.Status.LastAppliedGeneration {
				return operator.ReconcileResult{}, nil
			}
			// Attempt to apply the state to the third-party service
			err := infoClient.ApplyInfo(req.Object.GetNamespace(), req.Object.GetName(), req.Object.Spec.SomeInfo, req.Object.Spec.OtherInfo)
			if err != nil {
				// Check the error, if it's retryable, tell the controller to try again in a bit
				if IsRetryable(err) {
					return operator.ReconcileResult{
						RequeueAfter: time.Minute,
					}, nil
				}
				// Otherwise, return the error. The controllers RetryPolicy will dictate if it should be retried, and after how long
				return operator.ReconcileResult{}, err
			}
			// Set status.lastAppliedGeneration
			req.Object.Status.LastAppliedGeneration = req.Object.GetGeneration()
			_, err = store.UpdateSubresource(ctx, req.Object.GetStaticMetadata().Identifier(), resource.SubresourceStatus, req.Object)
			if err != nil {
				return operator.ReconcileResult{}, err
			}
			return operator.ReconcileResult{}, nil
		}
	}   
}

We could write a similar Watcher, with the caveat being that this function becomes split and duplicated amongst the watcher's methods for each action, and we can't control requeue behavior from the watcher response (we have to leave it up to the controller's RetryPolicy). A watcher version of this would look like:

func NewMyKindReconciler(infoClient InfoClient, store *resource.TypedStore[*v1.MyKind]) operator.Reconciler {
	// simple.Watcher implements operator.Watcher and calls the defined functions for each event. 
	// We could also have a struct which implements operator.Watcher, but this is easier for simple things.
	return &simple.Watcher{ 
		AddFunc: func(ctx context.Context, obj resource.Object) error {
			logging.FromContext(ctx).Info("Add event", "name", obj.GetName(), "action", "add", "generation", obj.GetGeneration())
			// Cast the object
			mykind, ok := obj.(*v1.MyKind)
			if !ok {
				return fmt.Errorf("unable to cast object into *v1.MyKind")
			}
			// We still need to check the lastAppliedGeneration, as an add event can be called on operator startup, 
			// as the application doesn't have the state to know if the resource already existed.
			if mykind.GetGeneration() == mykind.Status.LastAppliedGeneration {
				return nil
			}
			// Attempt to apply the state to the third-party service
			err := infoClient.ApplyInfo(mykind.GetNamespace(), mykind.GetName(), mykind.Spec.SomeInfo, mykind.Spec.OtherInfo)
			if err != nil {
				// Return the error. The controller's RetryPolicy will dictate if it should be retried, and after how long
				return err
			}
			// Set status.lastAppliedGeneration
			mykind.Status.LastAppliedGeneration = mykind.GetGeneration()
			_, err = store.UpdateSubresource(ctx, mykind.GetStaticMetadata().Identifier(), resource.SubresourceStatus, mykind)
			if err != nil {
				return err
			}
			return nil
		}
		UpdateFunc: func(ctx context.Context, oldObj resource.Object, newObj resource.Object) error {
			logging.FromContext(ctx).Info("Update event", "name", obj.GetName(), "action", "update", "generation", obj.GetGeneration())
			// Cast the object
			mykind, ok := newObj.(*v1.MyKind)
			if !ok {
				return fmt.Errorf("unable to cast object into *v1.MyKind")
			}
			// If the last applied generation matches the current generation of the resource, we can ignore this update
			if mykind.GetGeneration() == mykind.Status.LastAppliedGeneration {
				return nil
			}
			// Attempt to apply the state to the third-party service
			err := infoClient.ApplyInfo(mykind.GetNamespace(), mykind.GetName(), mykind.Spec.SomeInfo, mykind.Spec.OtherInfo)
			if err != nil {
				// Return the error. The controller's RetryPolicy will dictate if it should be retried, and after how long
				return err
			}
			// Set status.lastAppliedGeneration
			mykind.Status.LastAppliedGeneration = mykind.GetGeneration()
			_, err = store.UpdateSubresource(ctx, mykind.GetStaticMetadata().Identifier(), resource.SubresourceStatus, mykind)
			if err != nil {
				return err
			}
			return nil
		}
		DeleteFunc: func(ctx context.Context, obj resource.Object) error {
			logging.FromContext(ctx).Info("Delete event", "name", obj.GetName(), "action", "delete", "generation", obj.GetGeneration())
			err := infoClient.Delete(req.Object.GetNamespace(), req.Object.GetName())
			return operator.ReconcileResult{}, err
		}
	}   
}