-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node affinity for snapshots #1019
Comments
@dhess This is an interesting one. I'm not sure the workaround used for In this case (if I understand correctly), a new PVC from snapshot is created, and the VolSync mover pod should then be the 1st consumer of this PVC. Normally I would have thought the pod should get scheduled automatically in the correct place, but maybe something else is going on. Does ZFS-LocalPV use the csi topology feature? https://kubernetes-csi.github.io/docs/topology.html One more question: When you create your original sourcePVC and then run your application pod, do you also need to manually configure that pod to run on a particular node that corresponds to where the PVC was provisioned? |
Hi @tesshuflower, thanks for the quick response.
I'm not familiar with CSI Topology, but from what I can tell, it seems it does: I'm guessing this manifest for the - args:
- --csi-address=$(ADDRESS)
- --v=5
- --feature-gates=Topology=true
- --strict-topology
- --leader-election
- --enable-capacity=true
- --extra-create-metadata=true
- --default-fstype=ext4
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: registry.k8s.io/sig-storage/csi-provisioner:v3.5.0
imagePullPolicy: IfNotPresent
name: csi-provisioner
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir Are there any particular topology keys I should use for compatibility with VolSync? Is the ZFS-LocalPV Helm chart's default
I think you're referring to statically provisioned PVCs here? If so, I'm not using those, so I'm not sure. All of the PVCs I'm trying to use as source PVCs for VolSync are dynamically provisioned as part of a |
@dhess there's nothing specific in VolSync that you should need to do to ensure compatibility. I guess normally I'd expect that the first consumer (the volsync mover pod in this case) of a PVC should get automatically scheduled on a node where that pvc is accessible. It sounds like this is happening with your statefulset for example. Maybe you could try something to help me understand - If you create a volumesnapshot for one of your source PVCs and then create a PVC from this snapshot (or do a clone instead of volumesnapshot+pvc if you're using copymethod |
Ahh, I see what you mean now. I'll try an experiment and get back to you. |
👋 democratic-csi/democratic-csi#329 seems to be a time based racecondition. |
@danielsand I don't think this issue was specifically about the volumepopulator - would you be able to explain the scenario where you're hitting the issue? |
So since I originally posted this issue, VolSync snapshots with ZFS-LocalPV have been working pretty reliably. However, we just ran into the issue (or at least a similar one) again, and I think it's possible that I misdiagnosed the original problem. This time what happened is:
The ---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: db-primer-service-0
spec:
sourcePVC: db-primer-service-0
trigger:
# 1 backup per hour
schedule: "30 * * * *"
restic:
cacheStorageClassName: zfspv-pool-0
copyMethod: Clone
pruneIntervalDays: 7
repository: restic-config-db-primer-service-0
retain:
hourly: 24
daily: 7
weekly: 1
volumeSnapshotClassName: zfspv-snapclass where In the last few months we've also added support for Mayastor to our cluster, and those PVCs are not tied to a particular node, so when I changed the cache storage class to Mayastor, the backup job ran and completed successfully: ---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: db-primer-service-0
spec:
sourcePVC: db-primer-service-0
trigger:
# 1 backup per hour
schedule: "30 * * * *"
restic:
cacheStorageClassName: mayastor-pool-0-repl-1
copyMethod: Clone
pruneIntervalDays: 7
repository: restic-config-db-primer-service-0
retain:
hourly: 24
daily: 7
weekly: 1
volumeSnapshotClassName: zfspv-snapclass So I think that the problem here isn't with the source volume, but with the cache volume. I suspect that in order to reliably use a local PV storage class for cache volumes, there'll need to be some way to specify the topology of that volume. What's still puzzling is that all of our other |
@dhess is your storageclass using a VolumeBindingMode of |
The linked issue wasnt about the volumepopulator, Just a reference it on what was is currently running on my end and what is working. (CSI and volume snapshots work as they should) Volumepopulator is failing at random currently on my setup. Will circle back when I push the topic again. |
@danielsand I've created a separate issue #1255 to track this. I believe both issues are about storage drivers that create volumesnapshots/pvcs that are constrained to specific nodes, but I think your issue is related to using the volumepopulator, and this one is not. |
Hi, thanks for this great project! We just started using it with our Rook/Ceph volumes, and it's working great.
It doesn't work so well with OpenEBS ZFS LocalPV (ZFS-LocalPV) volumes, however. ZFS-LocalPV has first-class support for CSI snapshotting and cloning, but VolSync can't figure out that the ZFS-LocalPV snapshot of a PVC mounted on, e.g.,
node-a
, can also only be consumed fromnode-a
.copyMethod: Direct
doesn't help here for in-use volumes, because they can't be remounted. (Actually, I seem to recall that ZFS-LocalPV does support simultaneous pod mounts with a bit of extra configuration, but I'd prefer to use snapshots for proper PiT backups, anyway.)Would it be difficult to add first-class support to VolSync for node-local provisioners with snapshotting support, like ZFS-LocalPV? Unless I'm missing something, it seems like it should be possible: since
copyMethod: Direct
can determine which node a PVC is mounted on and ensure the sync is performed from that node, then naïvely, it seems that an additional configuration option could be added to tell VolSync to mount a snapshot and run the sync operation on the same node where the source PVC is mounted.The text was updated successfully, but these errors were encountered: