-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support nested BGP peering with calico-nodes running in local kubevirt VM pods #9875
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't spot any glaring issues (though I understand you know of one!)
a8c2d78
to
484b1e2
Compare
{{- end}} | ||
# For peer {{.Key}} | ||
{{- if eq $data.ip ($node_ip) }} | ||
# Skipping ourselves ({{$node_ip}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the node itself show up in the local WEP peers?
logCxt.Debug("Workload endpoint status file created") | ||
epStatus, err := epstatus.GetWorkloadEndpointStatusFromFile(fileName) | ||
if err != nil { | ||
logCxt.WithError(err).Error("Failed to read endpoint status from file, it may just be created.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this might be spammy since we'll always race with felix writing. can you defer the error (if the file is still bad after >5s then log an error).
if len(epStatus.Ipv4Nets) != 0 { | ||
ip, _, err := net.ParseCIDR(epStatus.Ipv4Nets[0]) | ||
if err != nil { | ||
log.WithError(err).Error("Workload endpoint status does not have a valid Ipv4Nets, ignore it for now") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably use Warn for this since you're handling the problem (by ignoring it)
"github.com/projectcalico/calico/libcalico-go/lib/backend/model" | ||
) | ||
|
||
var _ = Describe("ActiveBGPPeerCalculator", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This component should be tested through the calculation graph FV suite so that we get the benefits of its "fuzzing" approach.
@@ -234,6 +243,8 @@ func newEndpointManager( | |||
floatingIPsEnabled bool, | |||
nft bool, | |||
) *endpointManager { | |||
nlHandle, _ := netlink.NewHandle() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right (ignoring the error, not shimmable). Use a netlinkshim.HandleManager
, which has a mock alternative.
} | ||
|
||
// Peer information that we track for each active local endpoint. | ||
type EpPeerData struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type EpPeerData struct { | |
type EndpointBGPPeer struct { |
Think spelling it out would help in the other files where this name is seen.
var err error | ||
// If LocalBGPPeerIP has been updated, we need to remove old peer IP from all workload interfaces. | ||
for ifaceName := range m.activeWlIfaceNameToID { | ||
err = m.removeBGPPeerIPOnInterface(ifaceName, m.localBGPPeerIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suspicious that we need to remove the old IP specifically; what if the desired IP changes while Felix is restarting? Seems we'd get stuck
|
||
addrs, err := m.nlHandle.AddrList(link, family) | ||
if err != nil { | ||
// Not sure why this would happen, but pass it up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link might be deleted under you by CNI plugin
return nil | ||
} | ||
|
||
func lookupLink(nlHandle netlinkHandle, name string) (link netlink.Link, err error, notFound bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should use errors.Is(err, netlink.LinkNotFoundError)
in the caller; that's more common to see
if !errors.Is(err, fs.ErrExist) { | ||
lastError = err | ||
logrus.Error("IterActionNoOp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dev error left in?
22d1918
to
5cbb0f2
Compare
@@ -94,6 +94,7 @@ protocol bgp Global_10_192_0_3 from bgp_template { | |||
calico_export_to_bgp_peers(true); | |||
reject; | |||
}; # Only want to export routes for workloads. | |||
next hop self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to a normal non-local peer; was that intended?
|
||
|
||
|
||
# Skipping global bgp peer (2001::102) | ||
|
||
|
||
# Skipping global bgp peer (2001::103) | ||
|
||
|
||
# Skipping global bgp peer (2001::104) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we should be less noisy about skipping non-local peers? Just seems like we're adding a bit of cruft to every file instead of just saying "# No local peers configured."
} | ||
|
||
// Given a new peer data, check and update the cache if needed. | ||
func (abp *ActiveBGPPeerCalculator) checkAndUpdatePeerData(id model.WorkloadEndpointKey, newPeerData EndpointBGPPeer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please can you move these "leaf" functions below onEndpointUpdate()
I much prefer reading top-down to bottom up.
logCxt := logrus.WithField("update", update) | ||
switch id := update.Key.(type) { | ||
case model.WorkloadEndpointKey: | ||
if update.Value != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use an InheritIndex
for matching the WEPs. Some of their labels get inherited from the namespace/service account.
// It does not support assigning multiple IPs to the interface. | ||
|
||
// ipNetStr is string format of net.IPNet. | ||
type ipNetStr string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this type re-invents somethignw e already have. Why not use Felix's ip.Addr
type, which is comparable (so can be used in map keys) and already has methods for converting to CIDR (and could easily be extended to with IsIPv6Bootstrap()
for example. I think go's stdlib net.IP
has some methods for checking the type of the address that you might be able to leverage.
// reset w.fsWatcher | ||
w.fsWatcher = nil | ||
|
||
if w.newFsnotifyWatcherErr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: prefer to avoid polluting prod code with test machinery. I'd add a field newFsnotifyWatcher func() error
that can be shimmed, with the default impl being the real one.
// Start begins watching the directory. | ||
func (w *FileWatcher) runWatcher() { | ||
// Get current state of the directory and emit initial events. | ||
w.scanDirectory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your initial scan needs to be after you've started watching. Otherwise you might miss a file being created between here and starting the watch.
if err != nil { | ||
log.WithError(err).Info("Error initializing fsnotify. Falling back to polling.") | ||
} else { | ||
defer w.fsWatcher.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer
in a loop is usually a bug, defer won't run until the function returns, so you'll stack up defers
with each loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My go-to answer is to move the loop body to a method.
|
||
currentState := make(map[string]os.FileInfo) | ||
|
||
err := filepath.Walk(w.dir, func(path string, info os.FileInfo, err error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to walk subdirs or is our directory flat? If flat, I think you could just do os.ReadDir()
func (w *FileWatcher) Stop() { | ||
close(w.stopChan) | ||
if w.fsWatcher != nil { | ||
w.fsWatcher.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also deferred in the main loop, so this could close it twice (and possibly race if it's not concurrency safe?)
Description
This PR adds support for allowing
calico-node
to peer withcalico-node
instances running inside KubeVirt VM pods locally, based on the labels of the VM pods.API changes:
LocalWorkloadSelector
to BGPPeer resource.localWorkloadPeeringIPV4
andlocalWorkloadPeeringIPV4
to BGPConfigurations.Felix changes:
localWorkloadPeeringIP
to the network interface of the workload selected by the BGPPeer.Confd changes
libcalico-go changes
Related issues/PRs
Todos
Release Note
Reminder for the reviewer
Make sure that this PR has the correct labels and milestone set.
Every PR needs one
docs-*
label.docs-pr-required
: This change requires a change to the documentation that has not been completed yet.docs-completed
: This change has all necessary documentation completed.docs-not-required
: This change has no user-facing impact and requires no docs.Every PR needs one
release-note-*
label.release-note-required
: This PR has user-facing changes. Most PRs should have this label.release-note-not-required
: This PR has no user-facing changes.Other optional labels:
cherry-pick-candidate
: This PR should be cherry-picked to an earlier release. For bug fixes only.needs-operator-pr
: This PR is related to install and requires a corresponding change to the operator.