Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd: fix default instance when leaving the hidden state (backport 10.3) #18161

Closed
wants to merge 53 commits into from

Conversation

louis-6wind
Copy link
Contributor

backport of #18119

ton31337 and others added 30 commits February 4, 2025 16:37
Signed-off-by: Donatas Abraitis <[email protected]>
(cherry picked from commit c9a2928)
If we have IPv6-only network and no IPv4 addresses at all, then by default
0.0.0.0 is created which is treated as malformed according to RFC 6286.

Signed-off-by: Donatas Abraitis <[email protected]>
(cherry picked from commit 739f2b5)
Signed-off-by: Donatas Abraitis <[email protected]>
(cherry picked from commit 48560b5)
…/pr-17959

bgpd: Do not start BGP session if BGP identifier is not set (backport FRRouting#17959)
Memory is being leaked when processing the eoiu marker.
BGP is creating a dummy dest to contain the data but
it was never freed.  As well as the eoiu info was
not being freed either.

Signed-off-by: Donald Sharp <[email protected]>
(cherry picked from commit c6b7a99)
…/pr-18000

bgpd: Fix up memory leak in processing eoiu marker (backport FRRouting#18000)
msg_new takes a uint16_t, the length passed
down variable is a unsigned int, thus 32 bit.
It's possible, but highly unlikely, that the
msglen could be greater than 16 bit.
Let's just add some checks to ensure that
this could not happen.

Signed-off-by: Donald Sharp <[email protected]>
(cherry picked from commit 283cc51)
Signed-off-by: Donald Sharp <[email protected]>
(cherry picked from commit 4b96752)
Grab the count of streams in ibuf when it is protected
by a mutex.  Since this data is written to it in another
pthread.

Signed-off-by: Donald Sharp <[email protected]>
(cherry picked from commit f94ad53)
The dg_update_list access is controlled by the dg_mutex in all
other locations.  Let's just add a mutex usage around the initialization
of the dg_update_list even if it's part of the startup, just to keep
things consistent.

Signed-off-by: Donald Sharp <[email protected]>
(cherry picked from commit 19af3f3)
Use a memory allocation specific type for filter names (to help detect memory
leaks) and fix a memory leak when releasing peer memory.

Signed-off-by: Rafael Zalamena <[email protected]>
(cherry picked from commit d1440da)
…/pr-18038

pimd: fix memory leak and assign allocation type (backport FRRouting#18038)
Issue:
When there is no traffic for a group, the LHR and RP take the default KAT+Join timer expiry of
a maximum of 480 seconds to clear the S,G . However, in the FHR, we update the state from JOINED
to NOT Joined, downstream state from PPto NOINFO.  This restarts the ET timer, causing S,G on FHR to
take more than 10 minutes to age out.

In other words,
Consider a case where (S,G) is in Join state. When the traffic stops and the KAT (210) expires,
 the Join expiry timer restarts. At this time, if we receive a prune, the expectation is to set
 PPT to 0 (RFC 4601 sec 4.5.2).
 When the PPT expires, we move to the noinfo state and restart the expiry timer one more time. We remove the
 (S,G) entry only after ~10 minutes when there is no active traffic.

Summary:
KAT Join ET 210 + PP ET 210 + NOINFO ET 210.

Solution:
Delete the ifchannel when in noinfo state, and KAT is not running.

Ticket: FRRouting#13703

Signed-off-by: Rajesh Varatharaj <[email protected]>
(cherry picked from commit afed39e)
In case interface address is learnt during configuration, make sure to
run DR election when configuring PIM/PIM passive on interface.

Signed-off-by: Rafael Zalamena <[email protected]>
(cherry picked from commit 8644524)
…/pr-14105

pimd: Fix for FHR mroute taking longer to age out (backport FRRouting#14105)
While the loop is currently exited in all cases after using nexthop, it
is a footgun to have "nh" around to be reused in another iteration of
the loop.  This would leave nexthop with partial data from the previous
use.  Make it local where needed instead.

Signed-off-by: David Lamparter <[email protected]>
(cherry picked from commit ce7f5b2)
Zero out the 12 unused bytes (for the IPv6 address) when reading in an
IPv4 address.

Signed-off-by: David Lamparter <[email protected]>
(cherry picked from commit 95cf0b2)
Doesn't seem to break anything but really poor style to pass potentially
uninitialized data to hash_lookup.

Signed-off-by: David Lamparter <[email protected]>
(cherry picked from commit c88589f)
rmap_src wasn't initialized, so for IPv4 the unused 12 bytes would
contain whatever junk is on the stack on function entry.  Also move
the IPv4 parse before the IPv6 parse so if it's successful we can be
sure the other bytes haven't been touched.

Signed-off-by: David Lamparter <[email protected]>
(cherry picked from commit b666ee5)
When reading in a nexthop from ZAPI, only set the fields that actually
have meaning.  While it shouldn't happen to begin with, we can otherwise
carry padding garbage into the unused leftover union bytes.

Signed-off-by: David Lamparter <[email protected]>
(cherry picked from commit 4a0e141)
We were hashing 4 bytes of the address.  Even for IPv6 addresses.

Oops.

The reason this was done was to try to make it faster, but made a
complex maze out of everything.  Time for a refactor.

Signed-off-by: David Lamparter <[email protected]>
(cherry picked from commit 001fcfa)
In bgp route leak, when import vrf x is executed,
it creates bgp instance as hidden with asn value as unspecified.

When router bgp x is configured ensure the correct as,
asnotation is applied otherwise running config shows asn value as 0.

This can lead to frr-reload failure when any FRR config change.

Fix:
Move asn and asnotiation, as_pretty value in common done section,
so when bgp_create gets existing instance but before returning
update asn and required fields in common section.

In bgp_create(): when returning for hidden at least update asn
and required when bgp instance created implicitly due to vrf leak.

if (hidden) {
    bgp = bgp_old;
    goto peer_init; <<<
}

Before fix:
show running:

router bgp 0 vrf purple
 bgp router-id 10.10.3.11
 !
 address-family ipv4 unicast
  redistribute static
  import vrf blue
 exit-address-family
 !
 address-family ipv6 unicast
  import vrf blue
 exit-address-family
 !
 address-family l2vpn evpn
  advertise ipv4 unicast
  advertise ipv6 unicast
 exit-address-family
exit

Testing:

1) following snippet config:
router bgp 63420 vrf blue
 import vrf purple
router bgp 63420 vrf purple
 import vrf blue
2) restart frr leads to the running config with 0 asn value.

Signed-off-by: Chirag Shah <[email protected]>
(cherry picked from commit 2ff08af)
Blocking all signals on non-main threads is not the way to go, at least
the handlers for SIGSEGV, SIGBUS, SIGILL, SIGABRT and SIGFPE need to run
so we get backtraces.  Otherwise the process just exits.

Signed-off-by: David Lamparter <[email protected]>
(cherry picked from commit 13a6ac5)
…/pr-18081

bgpd: fix bgp vrf instance creation from implicit (backport FRRouting#18081)
Sometimes, NHRP receives L2 information on a cache entry with the
0.0.0.0 IP address. NHRP considers it as valid and updates the binding
with the new IP address.

> Feb 09 20:09:54 aws-sin-vpn01 nhrpd[2695]: [QQ0NK-1H449] Netlink: new-neigh 10.2.114.238 dev dmvpn1 lladdr 162.251.180.10 nud 0x2 cache used 0 type 4
> Feb 09 20:10:35 aws-sin-vpn01 nhrpd[2695]: [QQ0NK-1H449] Netlink: new-neigh 10.2.114.238 dev dmvpn1 lladdr 162.251.180.10 nud 0x4 cache used 1 type 4
> Feb 09 20:10:48 aws-sin-vpn01 nhrpd[2695]: [QQ0NK-1H449] Netlink: del-neigh 10.2.114.238 dev dmvpn1 lladdr 162.251.180.10 nud 0x4 cache used 1 type 4
> Feb 09 20:10:49 aws-sin-vpn01 nhrpd[2695]: [QQ0NK-1H449] Netlink: who-has 10.2.114.238 dev dmvpn1 lladdr (unspec) nud 0x1 cache used 1 type 4
> Feb 09 20:10:49 aws-sin-vpn01 nhrpd[2695]: [QVXNM-NVHEQ] Netlink: update binding for 10.2.114.238 dev dmvpn1 from c 162.251.180.10 peer.vc.nbma 162.251.180.10 to lladdr (unspec)
> Feb 09 20:10:49 aws-sin-vpn01 nhrpd[2695]: [QQ0NK-1H449] Netlink: new-neigh 10.2.114.238 dev dmvpn1 lladdr 0.0.0.0 nud 0x2 cache used 1 type 4
> Feb 09 20:11:30 aws-sin-vpn01 nhrpd[2695]: [QQ0NK-1H449] Netlink: new-neigh 10.2.114.238 dev dmvpn1 lladdr 0.0.0.0 nud 0x4 cache used 1 type 4

Actually, the 0.0.0.0 IP addressed mentiones in the 'who-has' message is
wrong because the nud state value means that value is incomplete and
should not be handled as a valid entry. Instead of considering it, fix
this by by invalidating the current binding. This step is necessary in
order to permit NHRP to trigger resolution requests again.

Signed-off-by: Philippe Guibert <[email protected]>
(cherry picked from commit 3202323)
When SRv6 is enabled and an SRv6 locator is specified in the BGP
configuration, BGP may attempt to request SRv6 locator information from
zebra before the connection is fully established. If this occurs, the
request fails with the following error:

```
2025/02/06 16:37:32 BGP: [HR66R-TWQYD][EC 100663302] srv6_manager_get_locator: invalid zclient socket
````

As a result, BGP is unable to obtain the locator information,
preventing SRv6 VPN from working.

This commit fixes the issue by ensuring BGP requests SRv6 locator
information once the connection with zebra is successfully established.

Signed-off-by: Carmine Scarpitta <[email protected]>
(cherry picked from commit 16640b6)
…/pr-18069

bgpd: Request SRv6 locator after zebra connection (backport FRRouting#18069)
…/pr-18078

nhrpd: fix dont consider incomplete L2 entry (backport FRRouting#18078)
…/pr-18060

lib: crash handlers must be allowed on threads (backport FRRouting#18060)
cscarpitta and others added 11 commits February 13, 2025 18:38
The `show ipv6 route json` command displays the IPv6 routing table in
JSON format, including SRv6 SIDs. For each SRv6 SID, it provides
behavior and SID attributes. However, it does not include the SID
structure.

This commit adds the SID structure to the SRv6 SID JSON output.

Signed-off-by: Carmine Scarpitta <[email protected]>
(cherry picked from commit 312f7b3)
The `static_srv6_sids` topotest verifies that staticd correctly
programs the SIDs in the zebra RIB. Currently, the topotest only
validates the programmed behavior and SID attributes.

This commit extends the topotest to also validate the SID structure.

Signed-off-by: Carmine Scarpitta <[email protected]>
(cherry picked from commit a6d02fe)
When a BGP instance with a manually assigned VPN label is deleted, the
label is not released from the Zebra label registry. As a result,
reapplying a configuration with the same manual label leads to VPN
prefix export failures.

For example, with the following configuration:

> router bgp 65000 vrf BLUE
>  address-family ipv4 unicast
>   label vpn export <int>

Release zebra label registry on unconfiguration.

Fixes: d162d5f ("bgpd: fix hardset l3vpn label available in mpls pool")
Signed-off-by: Louis Scalbert <[email protected]>
(cherry picked from commit d636362)
…/pr-18121

bgpd: release manual vpn label on instance deletion (backport FRRouting#18121)
…/pr-18064

staticd: Fix SRv6 SID installation and deletion (backport FRRouting#18064)
…/pr-18023

lib: fix false context information for SRv6 route (backport FRRouting#18023)
When unconfiguring a default BGP instance with VPN SAFI configurations,
the default BGP structure remains but enters a hidden state. Upon
reconfiguration, the instance name incorrectly appears as "VIEW ?"
instead of "VRF default". And the name_pretty pointer

The name_pretty pointer is replaced by another one with the incorrect
name. This also leads to a memory leak as the previous pointer is not
properly freed.

Do not rewrite the instance name.

Fixes: 4d0e7a4 ("bgpd: VRF-Lite fix default bgp delete")
Signed-off-by: Louis Scalbert <[email protected]>
bgp_process_queue_init() is not called in bgp_create() when leaving the
BGP instance hidden state because of the following goto:

>	if (hidden) {
>		bgp = bgp_old;
>		goto peer_init;
>	}

Upon reconfiguration of the default instance, the prefixes are never set
into a meta queue by mq_add_handler(). They are never processed for
zebra RIB installation and announcements of update/withdraw.

Do not delete the BGP process_queue when hiding.

Fixes: 4d0e7a4 ("bgpd: VRF-Lite fix default bgp delete")
Signed-off-by: Louis Scalbert <[email protected]>
Upon reconfiguration of the default instance, free the previous pointer.

> =================================================================
> ==1209420==ERROR: LeakSanitizer: detected memory leaks
>
> Direct leak of 4 byte(s) in 1 object(s) allocated from:
>     #0 0x7fbde0eaa037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
>     #1 0x7fbde0874634 in qcalloc lib/memory.c:106
>     #2 0x55dcca019263 in bgp_rtc_plist_entry_asn_new bgpd/bgp_rtc.c:474
>     #3 0x55dcca0199f6 in bgp_rtc_plist_entry_add bgpd/bgp_rtc.c:556
>     FRRouting#4 0x55dcca01b078 in bgp_rtc_plist_entry_set bgpd/bgp_rtc.c:700
>     FRRouting#5 0x55dcca016421 in bgp_nlri_parse_rtc bgpd/bgp_rtc.c:56
>     FRRouting#6 0x55dcc9f39f61 in bgp_nlri_parse bgpd/bgp_packet.c:352
>     FRRouting#7 0x55dcc9f47628 in bgp_update_receive bgpd/bgp_packet.c:2485
>     FRRouting#8 0x55dcc9f54867 in bgp_process_packet bgpd/bgp_packet.c:4114
>     FRRouting#9 0x7fbde097aebc in event_call lib/event.c:1984
>     FRRouting#10 0x7fbde084710f in frr_run lib/libfrr.c:1246
>     FRRouting#11 0x55dcc9dd818b in main bgpd/bgp_main.c:557
>     FRRouting#12 0x7fbde044fd79 in __libc_start_main ../csu/libc-start.c:308

Fixes: 4d0e7a4 ("bgpd: VRF-Lite fix default bgp delete")
Signed-off-by: Louis Scalbert <[email protected]>
Test that leaving the hidden BGP instance state is working.

Signed-off-by: Louis Scalbert <[email protected]>
Creates the default VRF instance after the other VRF instances. The
default VRF instance is created in hidden state. Check that AS number
in show run is correctly written.

Signed-off-by: Louis Scalbert <[email protected]>
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions github-actions bot added the rebase PR needs rebase label Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.