Skip to content

Tips for Deployments

Andre Nathan edited this page Nov 1, 2022 · 25 revisions

Introduction

These are notes on a Gatekeeper deployment consisting of one Gatekeeper server and two Grantor servers. They assume Ubuntu 20.04 servers with Gatekeeper installed via packages.

This small deployment is meant to help new users to get started with Gatekeeper, so they can evaluate Gatekeeper, write their policy, and incrementally grow their deployment from this first step.

The network topology is shown below, where the Gatekeeper server has its front port connected to a data center uplink and its back port connected to a router. The router works as a gateway for a number of servers which provide services to the Internet via the external network, while the internal network is used for administrative purposes. The Gatekeeper server uses a patched version of the Bird routing daemon to establish a full-routing BGP session with the uplink provider and an iBGP session with the router. The Grantor servers have their front port connected to the external network. Grantor servers do not have a back port configuration in Gatekeeper, and the internal network link is used solely for administrator access.

                                external network
                    +-------------------+-----------+------------+
                    |                   |           | front      | front
              +-----+------+       +----+---+  +----+----+  +----+----+
              |            |       |        |  |         |  |         |
uplink -------+ gatekeeper +-------+ router |  | grantor |  | grantor |
        front |            | back  |        |  |         |  |         |
              +-----+------+       +----+---+  +----+----+  +----+----+
                    |                   |           |            |
                    +-------------------+-----------+------------+
                                internal network

Gatekeeper front IPv4: 10.1.0.1/30
Gatekeeper front IPv6: 2001:db8:1::1/126

Gatekeeper back IPv4: 10.2.0.1/30
Gatekeeper back IPv6: fd00:2::1/126

Router IPv4 on Gatekeeper link: 10.2.0.2/30
Router IPv6 on Gatekeeper link: fd00:2::2/126

External network IPv4 CIDR: 1.2.3.0/24
External network IPv6 CIDR: 2001:db8:123::/48

Grantor front IPv4: 1.2.3.4 and 1.2.3.5
Grantor front IPv6: 2001:db8:123::4 and 2001:db8:123::5

Basic configuration

These steps can be performed for both Gatekeeper and Grantor servers, with the caveat that Grantors only have a front port, so any references to the back port can be ignored.

  1. Setup huge pages

The Gatekeeper server in this deployment has 256 GB of RAM. We reserve 16 GB for the kernel and allocate the remaining 240 GB in 1 GB huge pages. To pass the appropriate command line parameters to the kernel, edit /etc/default/grub and set GRUB_CMDLINE_LINUX_DEFAULT, running update-grub afterwards.

GRUB_CMDLINE_LINUX_DEFAULT="default_hugepagesz=1G hugepagesz=1G hugepages=240"
  1. Rename front and back ports

It's useful to have friendly interface names in machines with many NICs. We're going to call the Gatekeeper front and back ports, appropriately, "front" and "back". This will be done with systemd link files. In the link file, it's important to specify a Match section option that doesn't cause the kernel to rename back the interface once it has been taken control of by Gatekeeper.

For this deployment, we have used the PCI addresses of the interfaces. It can be obtained via udevadm:

# udevadm info /sys/class/net/<front port name> | grep ID_PATH=
E: ID_PATH=pci-0000:01:00.0

# udevadm info /sys/class/net/<back port name> | grep ID_PATH=
E: ID_PATH=pci-0000:02:00.0

Create systemd link files for the front and back interfaces (the latter only in the Gatekeeper server) and run update-initramfs -u afterwards. An example using the output from the above udevadm commands is given below:

# /etc/systemd/network/10-front.link
[Match]
Property=ID_PATH=pci-0000:01:00.0
[Link]
Name=front

# /etc/systemd/network/10-back.link
[Match]
Property=ID_PATH=pci-0000:02:00.0
[Link]
Name=back

Once these two changes are in place, reboot the machine for them to take effect. It's also important to remember that DPDK won't take over an interface that is in the UP state, so it's advised to remove the front and back interfaces from the operating system's network configuration (e.g. /etc/network/interfaces in Ubuntu).

Gatekeeper server configuration

Environment variables

The first step is to edit the /etc/gatekeeper/envvars and set the GATEKEEPER_INTERFACES variable with the PCI addresses of the front and back interfaces:

GATEKEEPER_INTERFACES="01:00.0 02:00.0"

Main configuration

For the Gatekeeper server, set gatekeeper_server to true in /etc/gatekeeper/main_config.lua:

local gatekeeper_server = true

Gatekeeper is composed of multiple functional blocks, each one with its own Lua configuration script located in /etc/gatekeeper.

GK block: /etc/gatekeeper/gk.lua

In this file, the following variables have been set as below:

local log_level = staticlib.c.RTE_LOG_NOTICE
local flow_ht_size = 250000000
local max_num_ipv4_rules = 2000000
local num_ipv4_tbl8s = 2000
local max_num_ipv6_rules = 400000
local num_ipv6_tbl8s = 100000

To calculate these values, we first generated IPv4 and IPv6 routing table dumps from full routing BGP sessions, creating, respectively, the ipv4-ranges and ipv6-ranges text files, each containing one CIDR per line. The max_num_ipv[46]_rules and num_ipv[46]_tbl8s variables are set to a round number above the values given by the gtctl tool as described in the project's README file, using the gtcl estimate command, where the ipv4-ranges and ipv6-ranges files are lists of prefixes obtained from routing table dumps of full routing IPv4 and IPv6 BGP sessions.

$ gtctl estimate -4 ipv4-ranges
ipv4: rules=1811522, tbl8s=1554

$ gtctl estimate -6 ipv6-ranges
ipv6: rules=313120, tbl8s=76228

The flow_ht_size variable is set close to the largest number that enables Gatekeeper to boot up. The larger the flow table, the better Gatekeeper can deal with complex attacks since it can keep state for more flows. To estimate how much memory a given value will consume, multiply flow_ht_size by the number of NUMA nodes, two (i.e. the default number of instances of GK blocks per NUMA node), and 256 bytes. The Gatekeeper server in this deployment has two Intel Xeon processors, that is, two NUMA nodes, so our setting consumes 250000000 * 2 * 2 * 256 bytes ~ 238GB. Notice that this value is an upper bound, so it, in fact, consumes less memory than this estimate. Finally, it is worth pointing out that this setup tracks 250000000 * 2 * 2 = 1 billion flows.

Solicitor block: /etc/gatekeeper/sol.lua

By default, Gatekeeper limits the request bandwitdh to 5% of the link capacity. In our deployment, we are using 10 Gbps interfaces for the Gatekeeper server and router, but the external network runs on 1 Gbps ethernet. With this configuration, 5% of the link capacity would amount to 50% of the external network bandwitdh, so we reduce the request bandwidth rate to 0.5% of the Gatekeeper link capacity:

local req_bw_rate = 0.005

Network block configuration: /etc/gatekeeper/net.lua

In this file, set the variables below according to your network setup. Examples have been given below for a front port named front and a back port named back. In this deployment, the front port belongs to a VLAN and uses LACP, so we set the appropriate VLAN tags for IPv4 and IPv6, and the bonding mode to staticlib.c.BONDING_MODE_8023AD. In our environment, the back port is not in a VLAN, nor does it use link aggregation. The back_mtu variable is set to a high value to account for IP-IP encapsulation in packets sent to the Grantor servers. Note that the MTU for the network interfaces in the path from the Gatekeeper servers' back_port to the Grantor server's front_port should be set to this value (other network interfaces in the network do not need to be reconfigured).

local user = "gatekeeper"

local front_ports = {"front"}
local front_ips = {"10.1.0.1/30", "2001:db8:1::1/126"}
local front_bonding_mode = staticlib.c.BONDING_MODE_8023AD
local front_ipv4_vlan_tag = 1234
local front_ipv6_vlan_tag = 1234
local front_vlan_insert = true
local front_mtu = 1500

local back_ports = {"back"}
local back_ips = {"10.2.0.1/30", "fd00:2::1/126"}
local back_bonding_mode = staticlib.c.BONDING_MODE_ROUND_ROBIN
local back_ipv4_vlan_tag = 0
local back_ipv6_vlan_tag = 0
local back_vlan_insert = false
local back_mtu = 2048

Other functional blocks

In the remaining Lua configuration files, we simply set the log_level variable. For production use, we specify the WARNING level:

local log_level = staticlib.c.RTE_LOG_WARNING

Configuring grantors in Gatekeeper

The Grantor servers must be configured using Gatekeeper's dynamic configuration mechanism.

As illustrated in the network topology description, the two Grantor servers have external IPv4 addresses 1.2.3.4 and 1.2.3.5 and external IPv6 addresses 2001:db8:123::4 and 2001:db8:123::5. The router's addresses in the interface connected to the Gatekeeper server's back port are 10.2.0.2 and fd00:2::2, and the external network IPv4 and IPv6 CIDR blocks are 1.2.3.0/24 and 2001:db8:123::/48, respectively.

Create the /etc/gatekeeper/grantors.lua file with the following script:

require "gatekeeper/staticlib"

local dyc = staticlib.c.get_dy_conf()

local addrs = {
  { gt_ip = '1.2.3.4', gw_ip = '10.2.0.2' },
  { gt_ip = '1.2.3.5', gw_ip = '10.2.0.2' },
}
dylib.add_grantor_entry_lb('1.2.3.0/24', addrs, dyc.gk)

local addrs = {
  { gt_ip = '2001:db8:123::4', gw_ip = 'fd00:2::2' },
  { gt_ip = '2001:db8:123::5', gw_ip = 'fd00:2::2' },
}
dylib.add_grantor_entry_lb('2001:db8:123::/48', addrs, dyc.gk)

return "gk: successfully configured grantors for 2 prefixes"

In other words, gt_ip corresponds to the public IP address associated to the Grantor server's front port, and gw_ip is the IP address in the router interface that is connected to the Gatekeeper server.

This script must be sent to Gatekeeper via the gkctl tool after Gatekeeper is started. The best way to do this is to configure a systemd override with an ExecStartPost command that runs gkctl, with a long enough timeout to account for the Gatekeeper startup delay. Run systemctl edit gatekeeper and insert the following content:

[Service]
ExecStartPost=/usr/sbin/gkctl -t 300 /etc/gatekeeper/grantors.lua
TimeoutStartSec=300

Start Gatekeeper

Simply start and enable Gatekeeper via systemd:

# systemctl start gatekeeper
# systemctl enable gatekeeper

Grantor server configuration

Main configuration

For the Grantor server, set gatekeeper_server to false in /etc/gatekeeper/main_config.lua:

local gatekeeper_server = false

GT block: /etc/gatekeeper/gt.lua

In this file, the following variables have been set as below:

local n_lcores = 2
local lua_policy_file = "policy.lua"
local lua_base_directory = "/etc/gatekeeper"

Network block configuration: /etc/gatekeeper/net.lua

For Grantor servers, the network configuration is analogous to the one for the Gatekeeper servers, with the exception that there's no back port when running Gatekeeper in Grantor mode.

Here we assume no link aggregation and no VLAN configuration. Notice the MTU configuration matching the Gatekeeper server's back_mtu value.

local user = "gatekeeper"

local front_ports = {"front"}
local front_ips = {"1.2.3.4/24", "2001:db8:123::4/48"}
local front_bonding_mode = staticlib.c.BONDING_MODE_ROUND_ROBIN
local front_ipv4_vlan_tag = 0
local front_ipv6_vlan_tag = 0
local front_vlan_insert = false
local front_mtu = 2048

Other functional blocks

In the remaining Lua configuration files, we simply set the log_level variable. For production use, we specify the WARNING level:

local log_level = staticlib.c.RTE_LOG_WARNING

The policy script

The Grantor configuration in gt.lua points to a Lua policy script, a fundamental element of the Gatekeeper architecture. When a packet from a new flow arrives at the Gatekeeper server, it is forwarded to the Grantor server for a policy decision. In the simplest case, this decision is a binary choice of granting or declining packets belonging to this flow, along with the maximum bandwidth for the granted flows and the duration of each decision. However, the policy response is in fact a reference to a BPF program installed in the Gatekeeper server, which can not only accept or deny packets, but also control the bandwidth budget available to the flow and adapt its response according to changing traffic patterns. Once a BPF program has been assigned to the flow, further packets will be handled directly by the Gatekeeper server, according to the rules encoded in the program, and no new requests will be sent to the Grantor server until the flow expires.

The entry point of the policy script is a function called lookup_policy, which receives as arguments a packet information object, which allows policy decisions to be made based on layer 2, 3 and 4 header fields, and a policy object, which can be used to set bandwidth and duration limits to the policy decision. This function must return a boolean value to indicate whether the policy decision is to grant or decline the flow. In practice, we can use the decision_granted and decision_declined functions and their variations from the policylib Lua package to set the policy parameters (i.e. the BPF program index, the bandwidth budget and the duration of the decision) and return the appropriate boolean value. These functions set the BPF program index field of the policy decision, respectively, to the granted and declined programs, which are bundled with a standard Gatekeeper installation. In the example below, we will in fact use the decision_grantedv2 function, which is a simple wrapper for decision_grantedv2_will_full_params. They set the BPF program index to the more flexible grantedv2 program, also included with Gatekeeper. It supports negative and secondary bandwidth settings, allows for direct delivery to be selected and can also be reused in custom BPF programs. We will also use the decision_web function, also a wrapper for decision_grantedv2_will_full_params, which selects the web BPF program that also comes with Gatekeeper. This example BPF program allows for ICMP packets and incoming TCP segments with destination ports HTTP, HTTPS, SSH and FTP-related ports. It also allows incoming TCP segments with source ports HTTP and HTTPS and an example of how to allow replies to connections initiated from the server. These functions have the following signatures:

function policylib.{decision_granted,decision_grantedv2,decision_web}(
  policy,          -- the policy object
  tx_rate_kib_sec, -- maximum bandwidth in KiB/s
  cap_expire_sec,  -- policy decision (capability) duration, in seconds
  next_renewal_ms, -- how long until sending a renewal request for this flow, in milliseconds
  renewal_step_ms  -- when sending renewal requests, don't send more than one per this duration, in milliseconds.
)

function policylib.decision_grantedv2_will_full_params(
  program_index,     -- corresponds to the index of the bpf_programs table in gk.lua in the Gatekeeper server.
  policy,            -- the policy object
  tx1_rate_kib_sec,  -- maximum primary bandwidth in KiB/s
  tx2_rate_kib_sec,  -- maximum secondary bandwidth in KiB/s
  cap_expire_sec,    -- policy decision (capability) duration, in seconds
  next_renewal_ms,   -- how long until sending a renewal request for this flow, in milliseconds
  renewal_step_ms,   -- when sending renewal requests, don't send more than one per this duration, in milliseconds.
  direct_if_possible -- whether to enable direct delivery
)

function policylib.decision_declined(
  policy,    -- the policy object
  expire_sec -- policy decision (capability) duration, in seconds
)

As a practical example, we show below a policy script that is able to perform the following decisions:

  • Grant or decline flows based on their source IPv4 addresses, based on labeled prefixes loaded from an external file;
  • Grant or decline flows based on their destination IPv4 addresses, allowing traffic to a subrange containing web servers;
  • Decline malformed packets;
  • Grant packets not matching the rules above, with limited bandwidth.

We start by requiring the libraries policylib from Gatekeeper and ffi from LuaJIT. Requiring policylib also gives us access to the lpmlib package, which contains functions to manipulate LPM (Longest Prefix Match) tables.

local policylib = require("gatekeeper/policylib")
local ffi = require("ffi")

Next, we define helper functions that represent our policy decisions. These functions take a policy argument, which has type struct ggu_policy, but which can be considered as an opaque object for our purposes, as it's simply forwarded to the functions policylib.decision_grantedv2 or policylib.decision_declined, described above.

-- Decline flows with malformed packets.
local function decline_malformed_packet(policy)
  return policylib.decision_declined(policy, 10)
end

-- Decline flows by policy decision.
local function decline(policy)
  return policylib.decision_declined(policy, 60)
end

-- Grant flow by policy decision.
local function grant(policy)
  return policylib.decision_grantedv2(
    policy,
    3072,   -- tx_rate_kib_sec = 3 MiB/s
    300,    -- cap_expire_sec = 5 minutes
    240000, -- next_renewal_ms = 4 minutes
    3000    -- renewal_step_ms = 3 seconds
  )
end

-- Grant flows destined to web servers by policy decision.
local function grant_web(policy)
  return policylib.decision_web(
    policy,
    3072,   -- tx_rate_kib_sec = 3 MiB/s
    300,    -- cap_expire_sec = 5 minutes
    240000, -- next_renewal_ms = 4 minutes
    3000    -- renewal_step_ms = 3 seconds
  )
end

-- Grant flow not matching any policy, with reduced bandwidth.
local function grant_unmatched(policy)
  return policylib.decision_grantedv2(
    policy,
    1024,   -- tx_rate_kib_sec = 1 MiB/s
    300,    -- cap_expire_sec = 5 minutes
    240000, -- next_renewal_ms = 4 minutes
    3000    -- renewal_step_ms = 3 seconds
  )
end

We then define a Lua table that maps its indices to policy decisions. The indices in this table correspond to the label that is associated to a network prefix when inserted in LPM (Longest Prefix Match) tables to be created below. Therefore, when inspecting a packet, we can perform a lookup for its source and/or destination IP addresses in this LPM table, using the returned label to obtain the function that will grant or decline this flow.

In the table below, flows labeled 1 in the LPM table will be declined, while those labeled 2 and 3 will be granted, respectively, by the grantedv2 and web BPF programs. The grant_unmatched function is called statically and therefore is not referenced in the table.

local policy_decision_by_label = {
  [1] = decline,
  [2] = grant,
  [3] = grant_web,
}

The policy script continues with the definition of the aforementioned LPM tables, with the use of the helper function new_lpm_from_file. The fact that the src_lpm_ipv4 and dst_lpm_ipv4 variables are global (i.e. their declarations do not use the local keyword) is relevant, because it allows them to be accessed by other scripts. This is useful, for example, to update an LPM table, or to print it for inspection.

The new_lpm_from_file function, given below, assumes the input file is in a two-column format, where the first column is a network prefix in CIDR notation, and the second column is its label. It uses functions in the lpmlib package to create and populate the LPM table. Given the policy_decision_by_label table above, the input file containing source addresses ranges should use label 1 for those we want to decline and label 2 for those we want to grant access to. Similarly, the input file containing destination address ranges should attach the label 3 to its prefixes.

src_lpm_ipv4 = new_lpm_from_file("/path/to/lpm/source/addresses/file")
dst_lpm_ipv4 = new_lpm_from_file("/path/to/lpm/destination/addresses/file")

function new_lpm_from_file(path)
  -- Find minimum values for num_rules and num_tbl8s.
  local num_rules = 0
  local num_tbl8s = 0

  local prefixes = {}
  for line in io.lines(path) do
    local prefix, label = string.match(line, "^(%S+)%s+(%d+)$")
    if not prefix or not label then
      error(path .. ": invalid line: " .. line)
    end
    -- Convert string in CIDR notation to IP address and prefix length.
    local ip_addr, prefix_len = lpmlib.str_to_prefix(prefix)
    num_rules = num_rules + 1
    num_tbl8s = num_tbl8s + lpmlib.lpm_add_tbl8s(ip_addr, prefix_len, prefixes)
  end

  -- Adjust parameters.
  local scaling_factor_rules = 2
  local scaling_factor_tbl8s = 2
  num_rules = math.max(1, scaling_factor_rules * num_rules)
  num_tbl8s = math.max(1, scaling_factor_tbl8s * num_tbl8s)

  -- Create and populate LPM table.
  local lpm = lpmlib.new_lpm(num_rules, num_tbl8s)
  for line in io.lines(path) do
    local prefix, label = string.match(line, "^(%S+)%s+(%d+)$")
    if not prefix or not label then
      error(path .. ": invalid line: " .. line)
    end
    -- Convert string in CIDR notation to IP address and prefix length.
    local ip_addr, prefix_len = lpmlib.str_to_prefix(prefix)
    lpmlib.lpm_add(lpm, ip_addr, prefix_len, tonumber(label))
  end

  return lpm
end

Finally, we implement the lookup_policy function. As described above, this is the entry point of the policy script, i.e., the function called by the Grantor server to obtain a policy decision for a given packet.

The function receives two arguments. The first is pkt_info, which is a gt_packet_headers struct, accessible from the policy script via the ffi module. These are the headers of the IP-in-IP encapsulated packet sent from Gatekeeper to Grantor. The second argument is policy, which we will simply pass along to the policy decision functions.

The lookup_policy function starts by checking if the inner packet is an IPv4 packet. In production we have IPv6-specific LPM tables and other policies, but for simplicity, in this example we will just apply the default policy for non-IPv4 traffic. The function then proceeds with an LPM table lookup for the source address of the incoming packet, which, if successful, will return a policy decision function that is then applied. Otherwise, the script attempts to obtain a policy by performing a lookup in the destination addresses LPM table. These two steps are performed by the helper functions lookup_src_lpm_ipv4_policy and lookup_dst_lpm_ipv4_policy, respectively, which are given below. Finally, if no policy is found, we apply the default policy decision function, grant_unmatched.

function lookup_policy(pkt_info, policy)
  if pkt_info.inner_ip_ver ~= policylib.c.IPV4 then
    return grant_unmatched(policy)
  end

  local fn = lookup_src_lpm_ipv4_policy(pkt_info)
  if fn then
    return fn(policy)
  end

  local fn = lookup_dst_lpm_ipv4_policy(pkt_info)
  if fn then
    return fn(policy)
  end

  return grant_unmatched(policy)
end

The lookup_src_lpm_ipv4_policy and lookup_dst_lpm_ipv4_policy functions perform lookups, respectively, on the src_lpm_ipv4 and dst_lpm_ipv4 tables, which were populated with network prefixes loaded from input files, as described above. We use the ffi.cast function to obtain an IPv4 header, so that we can access the packet's source IP address and look it up in the LPM table, with lpmlib.lpm_lookup. This function returns the matching label for the network prefix to which the flow's source address belongs, which will be used to obtain its associated policy decision function via the mapping in the policy_decision_by_label Lua table. Note that lpmlib.lpm_lookup returns a negative number if no match is found, and since the policy_decision_by_label table has no negative indices, the table lookup will return nil, and the lookup_policy function will proceed without performing the code in the then branch of the if statements.

function lookup_src_lpm_ipv4_policy(pkt_info)
  local ipv4_header = ffi.cast("struct rte_ipv4_hdr *", pkt_info.inner_l3_hdr)
  local label = lpmlib.lpm_lookup(src_lpm_ipv4, ipv4_header.src_addr)
  return policy_decision_by_label[label]
end

function lookup_dst_lpm_ipv4_policy(pkt_info)
  local ipv4_header = ffi.cast("struct rte_ipv4_hdr *", pkt_info.inner_l3_hdr)
  local label = lpmlib.lpm_lookup(dst_lpm_ipv4, ipv4_header.dst_addr)
  return policy_decision_by_label[label]
end

Finally, we add four helper functions to the policy script. These functions are not used by the policy itself, but by the dynamic configuration script that keeps the LPM table up to date. The add_src_v4_prefix and add_dst_v4_prefix functions take a prefix string in CIDR format and an integer label and insert them in the appropriate LPM table. The del_src_v4_prefix and del_dst_v4_prefix functions take a prefix string in CIDR format and remove them from the appropriate LPM table.

More details about dynamically updating the LPM table are given below.

function add_src_v4_prefix(prefix, label)
  local ip_addr, prefix_len = lpmlib.str_to_prefix(prefix)
  lpmlib.lpm_add(src_lpm_ipv4, ip_addr, prefix_len, label)
end

function add_dst_v4_prefix(prefix, label)
  local ip_addr, prefix_len = lpmlib.str_to_prefix(prefix)
  lpmlib.lpm_add(dst_lpm_ipv4, ip_addr, prefix_len, label)
end

function del_src_v4_prefix(prefix)
  local ip_addr, prefix_len = lpmlib.str_to_prefix(prefix)
  lpmlib.lpm_del(src_lpm_ipv4, ip_addr, prefix_len)
end

function del_dst_v4_prefix(prefix)
  local ip_addr, prefix_len = lpmlib.str_to_prefix(prefix)
  lpmlib.lpm_del(dst_lpm_ipv4, ip_addr, prefix_len)
end

Updating LPM tables with with drib and gtctl

Fetching IP prefixes

The example policy script given above loads network prefixes and labels from a file. In practice, these prefixes are usually assembled from multiple online sources of unwanted source networks, such as Spamhaus' EDROP or Team Cymru's Bogon prefixes to decline flows whose source address belongs to these prefixes.

These online unwanted prefix lists are continuously updated, and may contain intersecting network blocks, so it makes sense to use a tool designed to fetch, merge and label them automatically, generating a file that can be consumed by the policy script. The Drib tool has been developed with this purpose.

This tool aggregates IP prefixes from configurable online and offline sources and allows each source to be labeled with its own "class", which is just an arbitrary string. Once the prefixes are aggregated, Drib can render a template, feeding it with the prefixes and their respective class. We use the source class configuration in Drib as the label to be associated with a prefix when inserted in the policy's LPM table.

Going back to the policy script, recall the definition of the policy_decision_by_label variable:

local policy_decision_by_label = {
  [1] = decline,
  [2] = grant,
  [3] = grant_web,
}

This means prefixes labeled 1 will be declined, and those labeled 2 and 3 will be granted, according to the respective BPF programs. Below we show a Drib configuration file, /etc/drib/drib.yaml, that labels network blocks fetched from the EDROP and Bogons lists with a class value of 1. To make the example more complete, we also add a static network block labeled with a class value of 2 as an "office" network from which we always want to accept traffic. Finally, we add a static network block of web servers with a class value of 3, to which we want to accept web-related traffic according to the rules in the web BPF program.

Note that Drib supports specifying a group-scoped kind setting, which is a tag shared by all prefixes in a given group. We define the decline and grant groups with kind src for the source address prefixes and the servers group with kind dst for the destination address prefixes, and use the entry.kind field in templates that will generate Lua scripts that manipulate the src_lpm_ipv4 and dst_lpm_ipv4 LPM tables.

log_level: "warn"

bootstrap: {
  input: "/etc/drib/bootstrap.tpl",
  output: "/var/lib/drib/bootstrap_{proto}_{kind}",
}

ipv4: {
  decline: {
    priority: 30,
    kind: "src",

    edrop: {
      remote: {
        url: "https://www.spamhaus.org/drop/edrop.txt",
        check_interval: "12h",
        parser: {ranges: {one_per_line: {comment: ";"}}},
      },
      class: "1",
    },

    fullbogons: {
      remote: {
        url: "https://www.team-cymru.org/Services/Bogons/fullbogons-ipv4.txt",
        check_interval: "1d",
        parser: {ranges: {one_per_line: {comment: "#"}}},
      },
      class: "1",
    },
  },

  grant: {
    priority: 30,
    kind: "src",

    office: {
      range: "100.90.80.0/24",
      class: "2",
    },
  },

  servers: {
    priority: 20,
    kind: "dst",

    web: {
      range: "1.2.3.0/26",
      class: "3",
    },
  },
}

Given this configuration, the following bootstrap template file, /etc/drib/bootstrap.tpl, is used to generate input files in the format expected by the policy script, that is, a two-column file with a network prefix in CIDR format in the first column, and an integer label in the second one:

{% for entry in ranges -%}
{{entry.range}} {{entry.class}}
{% endfor -%}

A cron job is set up to run the drib aggregate command, which will download the EDROP and Bogon prefixes, merge them, exclude the office network range from the resulting set, and save a serialization of the result in what is called an aggregate file.

We tie everything together by calling the drib bootstrap --no-download command in a systemd override ExecStartPre command. This will make Drib read an existing aggregate file (generated by the aforementioned cron job) and render the above template. When Gateekeeper runs in Grantor mode, it will run the policy script, which will then read the recently-rendered template with the set of prefixes obtained from Drib.

The systemd override can be created with the systemctl edit gatekeeper command in the Grantor servers. Add the following content to the override file:

[Service]
ExecStartPre=/usr/sbin/drib bootstrap --no-download

This ensures the policy script will load up to date data when Gatekeeper starts in Grantor mode.

Updating LPM tables incrementally

The setup described above works well for the generation of an initial (bootstrap) list of prefixes on Gatekeeper startup. However, the EDROP and Bogons lists, as well as similar online unwanted prefix lists, are continually updated, and Gatekeeper's in-memory LPM tables should be kept up to date.

To do this, we use the gtctl tool. This is a tool that is able to parse Drib's aggregate files (generated in the cron job mentioned in the previous section) and compare it to an aggregate file saved from a previous run, generating sets of newly inserted and removed IP addresses. These sets are used as inputs to render policy update scripts, which gtctl then feeds into Gatekeeper via its dynamic configuration mechanism.

The policy update template, /etc/gtctl/policy_update.lua.tpl simply generates calls to the add_src_v4_prefix, add_dst_v4_prefix, del_src_v4_prefix and del_dst_v4_prefix functions defined in the policy script. Note the usage of the entry.kind field in the template so that the appropriate function is called.

local function update_lpm_tables()
{%- for entry in ipv4.remove %}
  del_{{entry.kind}}_v4_prefix("{{entry.range}}")
{%- endfor %}

{%- for entry in ipv4.insert %}
  add_{{entry.kind}}_v4_prefix("{{entry.range}}", {{entry.class}})
{%- endfor %}
end

local dyc = staticlib.c.get_dy_conf()
dylib.update_gt_lua_states_incrementally(dyc.gt, update_lpm_tables, false)

Depending on the number of updates, it might be necessary to create a new LPM table that is able to accommodate the new set of prefixes. For this case, gtctl uses a policy replacement template, /etc/gtctl/policy_replace.lua.tpl, to generate the script:

{{lpm_table}} = nil
collectgarbage()

{{lpm_table}} = {{lpm_table_constructor}}({{params.num_rules}}, {{params.num_tbl8s}})

local function update_lpm_tables()
{%- for entry in ipv4.insert %}
  add_{{entry.kind}}_v4_prefix("{{entry.range}}", {{entry.class}})
{%- endfor %}
end

local dyc = staticlib.c.get_dy_conf()
dylib.update_gt_lua_states_incrementally(dyc.gt, update_lpm_tables, false)

The template above mentions the params variable. This variable is created by gtctl after running a parameters estimation script, /etc/gtctl/lpm_params.lua.tpl, which is also rendered from a template:

require "gatekeeper/staticlib"
require "gatekeeper/policylib"

local dyc = staticlib.c.get_dy_conf()

if dyc.gt == nil then
  return "Gatekeeper: failed to run as Grantor server\n"
end

local function get_lpm_params()
  local lcore = policylib.c.gt_lcore_id()
  local num_rules, num_tbl8s = {{lpm_params_function}}({{lpm_table}})
  return lcore .. ":" .. num_rules .. "," .. num_tbl8s .. "\n"
end

dylib.update_gt_lua_states_incrementally(dyc.gt, get_lpm_params, false)

Given these templates, the gtctl configuration file, /etc/gtctl/gtctl.yaml, which references them, is shown below.

log_level: "warn"
remove_rendered_scripts: true
socket: "/var/run/gatekeeper/dyn_cfg.socket"
state_dir: "/var/lib/gtctl"

replace: {
  input: "/etc/gtctl/policy_replace.lua.tpl",
  output: "/var/lib/gtctl/policy_replace_{proto}_{kind}.{2i}.lua",
  max_ranges_per_file: 1500,
}

update: {
  input: "/etc/gtctl/policy_update.lua.tpl",
  output: "/var/lib/gtctl/policy_update_{proto}_{kind}.{2i}.lua",
  max_ranges_per_file: 1500,
}

lpm: {
  table_format: "{kind}_lpm_{proto}", # for this example's drib.yaml, yields "src_lpm_ipv4" and "dst_lpm_ipv4"

  parameters_script: {
    input: "/etc/gtctl/lpm_params.lua.tpl",
    output: "/var/lib/gtctl/lpm_params_{proto}_{kind}.lua",
  },

  ipv4: {
    lpm_table_constructor: "lpmlib.new_lpm",
    lpm_get_params_function: "lpmlib.lpm_get_paras",
  },

  ipv6: {
    lpm_table_constructor: "lpmlib.new_lpm6",
    lpm_get_params_function: "lpmlib.lpm6_get_paras",
  },
}

The only missing piece is a way to run gtctl once a new aggregate file has been generated by Drib. Our current solution is to rely on our configuration management tool, Puppet, to detect this and trigger the gtctl execution:

file { '/var/lib/gtctl/aggregate.new':
  ensure => 'present',
  source => 'puppet:///drib/aggregate',
  owner  => 'root',
  group  => 'root',
  mode   => '0644',
  notify => Exec['gtctl'],
}

exec { 'gtctl':
  command     => 'gtctl dyncfg -a /var/lib/gtctl/aggregate.new',
  onlyif      => 'systemctl is-active gatekeeper',
  refreshonly => true,
}

Using custom BPF programs

Let's extend the example above with a new range of IPv4 addresses for recursive DNS servers. These are assumed to be for internal use only (i.e. they are used only by other servers, and not open to the Internet), and therefore should accept no external connections. However, in order to be able to perform recursive DNS queries, replies to packets sent to TCP and UDP port 53 must be allowed to reach the server. In other words, the BPF program must accept incoming packets with TCP and UDP source port 53.

Create a new BPF program

We create a dns-recursive.c file with the following changes compared to the web.c file from the Gatekeeper repository.

  1. Include UDP headers: add the udp.h header file to the list of the program's includes near the top of the file:
#include <netinet/udp.h>
  1. Handle UDP traffic: this code grants access to UDP datagrams with source port 53, meaning they are sent by other DNS servers as replies to the queries made by our server. In the switch statement on the ctx->l4_proto field, we add the following case:
case IPPROTO_UDP: {
    struct udphdr *udp_hdr;

    if (ctx->fragmented)
        goto secondary_budget;
    if (unlikely(pkt->l4_len < sizeof(*udp_hdr))) {
        /* Malformed UDP header. */
        return GK_BPF_PKT_RET_DECLINE;
    }
    udp_hdr = rte_pktmbuf_mtod_offset(pkt, struct udphdr *,
           pkt->l2_len + pkt->l3_len);

    /* Authorized external services. */
    switch (ntohs(udp_hdr->uh_sport)) {
    case 53:    /* DNS */
        break;
    default:
        return GK_BPF_PKT_RET_DECLINE;
    }

    goto forward;
}
  1. In the TCP section (below the comment "Only TCP packets from here on") we remove the whole switch statement on the TCP destination port (tcp_hdr->th_dport), replacing it with a switch statement on the TCP source port, analogously to the UDP source port switch described in the previous step.
/* Authorized external services. */
switch (ntohs(tcp_hdr->th_sport)) {
case 53:	/* DNS */
    if (tcp_hdr->syn && !tcp_hdr->ack) {
        /* No listening ports. */
        return GK_BPF_PKT_RET_DECLINE;
    }
    break;

default:
    return GK_BPF_PKT_RET_DECLINE;
}

To compile the program, it is necessary to build Gatekeeper by following the instructions in the Build from Source section of the README. Once Gatekeeper is compiled, run the following command:

$ GATEKEEPER_ROOT=/path/to/gatekeeper/repository
$ clang -O2 -target bpf \
    -I$(GATEKEEPER_ROOT)/include -I$(GATEKEEPER_ROOT)/bpf -Wno-int-to-void-pointer-cast \
    -o dns-recursive.bpf -c dns-recursive.c

Install a new BPF program

The resulting dns-recursive.bpf file must be uploaded to the Gatekeeper server and installed along the other BPF programs, by default in /etc/gatekeeper/bpf. Next, it must be added to the bpf_programs table in the gk.lua file. We add it with an index of 100, as indices below that number are considered to be reserved. In /etc/gatekeeper/gk.lua, the bpf_programs variable will look like this:

local bpf_programs = {
  [0] = "granted.bpf",
  [1] = "declined.bpf",
  [2] = "grantedv2.bpf",
  [3] = "web.bpf",
  [4] = "tcp-services.bpf",
  -- Add the line below:
  [100] = "dns-recursive.bpf",
}

The new BPF program will be loaded when Gatekeeper is restarted, but it is possible to load it dynamically using gkctl. Create the following Lua script in a file named insert-bpf-program.lua:

require "gatekeeper/staticlib"

local dyc = staticlib.c.get_dy_conf()

local path = "/etc/gatekeeper/bpf/dns-recursive.bpf"
local index = 100
local ret = dylib.c.gk_load_bpf_flow_handler(dyc.gk, index, path, true)
if ret < 0 then
  return "gk: failed to load BPF program " .. path .. " (" .. index .. ") in runtime"
end

return "gk: done"

Then load it into a running Gatekeeper instance with gkctl:

# gkctl insert-bpf-program.lua

Update the policy script

Now that we have a new BPF program installed in the Gatekeeper server, we can adapt our policy to use it. First, add the grant_dns function:

local function grant_dns(policy)
  return policylib.decision_grantedv2_will_full_params(
    100,    -- dns-recursive.bpf index in bpf_programs in gk.lua
    policy,
    10240,  -- primary bandwidth limit = 10 MiB/s
    512,    -- secondary bandwidth limit (5% of primary bandwidth)
    300,    -- cap_expire_sec = 5 minutes
    240000, -- next_renewal_ms = 4 minutes
    3000,   -- renewal_step_ms = 3 seconds
    true    -- direct_if_possible
  )
end

Next, add this function to the policy_decision_by_label table so that is looks like this:

local policy_decision_by_label = {
  [1] = decline,
  [2] = grant,
  [3] = grant_web,
  -- Add the line below:
  [4] = grant_dns,
}

Install the new policy

Copy the new policy.lua file to the Grantor servers, replacing the previous one in /etc/gatekeeper/policy.lua. The new policy will be read when the gatekeeper service is restarted on the Grantor server, but we can also use gkctl to reload it on a running server. If Gatekeeper was installed using the provided Debian packages, the script /usr/share/gatekeeper/reload_policy.lua should be available in the Grantor server. Otherwise, it can be found in the gkctl/scripts directory in the Gatekeeper repository. Simply run the command below.

# gkctl /usr/share/gatekeeper/reload_policy.lua

Feed the new IPv4 range to the Grantor server

If Drib is being used to manage IP address ranges, add the recursive DNS IPv4 range to the servers block in /etc/drib/drib.yaml. Note the use of class 4 to match the index added to the policy_decision_by_label variable in the policy script.

servers: {
  # ...
  dns: {
    range: "1.2.3.64/29",
    class: "4",
  },
},

Complete custom BPF program

For completeness' sake, the complete code for the dns-recursive.c program can be found below.

#include <net/ethernet.h>
#include <netinet/tcp.h>
#include <netinet/udp.h>

#include "grantedv2.h"
#include "libicmp.h"

SEC("init") uint64_t
dns_init(struct gk_bpf_init_ctx *ctx)
{
	return grantedv2_init_inline(ctx);
}

SEC("pkt") uint64_t
dns_pkt(struct gk_bpf_pkt_ctx *ctx)
{
	struct grantedv2_state *state =
		(struct grantedv2_state *)pkt_ctx_to_cookie(ctx);
	struct rte_mbuf *pkt = pkt_ctx_to_pkt(ctx);
	uint32_t pkt_len = pkt->pkt_len;
	struct tcphdr *tcp_hdr;
	uint64_t ret = grantedv2_pkt_begin(ctx, state, pkt_len);

	if (ret != GK_BPF_PKT_RET_FORWARD) {
		/* Primary budget exceeded. */
		return ret;
	}

	/* Allowed L4 protocols. */
	switch (ctx->l4_proto) {
	case IPPROTO_ICMP:
		ret = check_icmp(ctx, pkt);
		if (ret != GK_BPF_PKT_RET_FORWARD)
			return ret;
		goto secondary_budget;

	case IPPROTO_ICMPV6:
		ret = check_icmp6(ctx, pkt);
		if (ret != GK_BPF_PKT_RET_FORWARD)
			return ret;
		goto secondary_budget;

	case IPPROTO_UDP: {
		struct udphdr *udp_hdr;

		if (ctx->fragmented)
			goto secondary_budget;
		if (unlikely(pkt->l4_len < sizeof(*udp_hdr))) {
			/* Malformed UDP header. */
			return GK_BPF_PKT_RET_DECLINE;
		}
		udp_hdr = rte_pktmbuf_mtod_offset(pkt, struct udphdr *,
		       pkt->l2_len + pkt->l3_len);

		/* Authorized external services. */
		switch (ntohs(udp_hdr->uh_sport)) {
		case 53:	/* DNS */
			break;
		default:
			return GK_BPF_PKT_RET_DECLINE;
		}

		goto forward;
	}

	case IPPROTO_TCP:
		break;

	default:
		return GK_BPF_PKT_RET_DECLINE;
	}

	/*
	 * Only TCP packets from here on.
	 */

	if (ctx->fragmented)
		goto secondary_budget;
	if (unlikely(pkt->l4_len < sizeof(*tcp_hdr))) {
		/* Malformed TCP header. */
		return GK_BPF_PKT_RET_DECLINE;
	}
	tcp_hdr = rte_pktmbuf_mtod_offset(pkt, struct tcphdr *,
	       pkt->l2_len + pkt->l3_len);

	/* Authorized external services. */
	switch (ntohs(tcp_hdr->th_sport)) {
	case 53:	/* DNS */
		if (tcp_hdr->syn && !tcp_hdr->ack) {
			/* No listening ports. */
			return GK_BPF_PKT_RET_DECLINE;
		}
		break;

	default:
		return GK_BPF_PKT_RET_DECLINE;
	}

	goto forward;

secondary_budget:
	ret = grantedv2_pkt_test_2nd_limit(state, pkt_len);
	if (ret != GK_BPF_PKT_RET_FORWARD)
		return ret;
forward:
	return grantedv2_pkt_end(ctx, state);
}
Clone this wiki locally