Skip to content

Commit

Permalink
feat: revamp site + update docs (#762)
Browse files Browse the repository at this point in the history
# Description

Please provide a brief description of the changes made in this pull
request.

- Revamped the site
- Update docs which should closes
#755 and
#507

## Related Issue

If this pull request is related to any issue, please mention it here.
Additionally, make sure that the issue is assigned to you before
submitting this pull request.

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [x] I have followed the project's style guidelines.
- [x] I have updated the documentation, if necessary.
- [x] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes
made.

## Additional Notes

Add any additional notes or context about the pull request here.

---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.
  • Loading branch information
nddq authored Sep 23, 2024
1 parent 10a1bc5 commit dc8b505
Show file tree
Hide file tree
Showing 59 changed files with 2,813 additions and 573 deletions.
1 change: 1 addition & 0 deletions .github/.markdownlint.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"MD013": false,
"MD010": false,
"MD033": false,
"MD024": {
"siblings_only": true
}
Expand Down
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# Retina
<h1 align="center">
<picture>
<source media="(prefers-color-scheme: light)" srcset="site/static/img/retina-logo.svg">
<img src="site/static/img/retina-logo.svg" alt="Retina Logo" width="30%">
</picture>
</h1>

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=746962176)

Expand Down
28 changes: 16 additions & 12 deletions docs/02-Installation/03-Config.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,25 @@

## Overview

To customize metrics and other options, you can modify Retina's ConfigMap called `retina-config`.
Defaults are specified for each component in *deploy/legacy/manifests/controller/helm/retina/values.yaml*.
To customize metrics and other options, modify the `retina-config` ConfigMap. Default settings for each component are specified in *deploy/legacy/manifests/controller/helm/retina/values.yaml*.

## Agent Config

* `enablePodLevel`: When this toggle is set to true, Retina will gather Advanced/Pod-Level metrics. Advanced metrics can attach Pod metadata to Retina's metrics.
* `remoteContext`: When this toggle is set to true, retina will watch Pods on the cluster.
* `enableAnnotations`: When this toggle is set to true, retina will gather metrics for the annotated resources. Namespaces or Pods can be annotated with `retina.sh=observe`. The operator and enableRetinaEndpoint for the operator should be enabled.
* `enabledPlugin_linux`: Array of enabled plugins for linux.
* `enabledPlugin_win`: Array of enabled plugins for windows.
* `metricsInterval`: the interval for which metrics will be gathered (in seconds). (@deprecated, use metricsIntervalDuration instead)
* `metricsIntervalDuration`: the interval for which metrics will be gathered (in duration)
* `dataAggregationLevel`: This config defines the level of data aggregation for Retina. See [Data Aggregation](../05-Concepts/data-aggregation.md) for more details.
* `enableTelemetry`: Enables telemetry for the agent for managed AKS clusters. Requires `buildinfo.ApplicationInsightsID` to be set if enabled.
* `enablePodLevel`: Enables gathering of advanced pod-level metrics, attaching pods' metadata to Retina's metrics.
* `remoteContext`: Enables Retina to watch Pods on the cluster.
* `enableAnnotations`: Enables gathering of metrics for annotated resources. Resources can be annotated with `retina.sh=observe`. Requires the operator and `enableRetinaEndpoint` to be enabled.
* `enabledPlugin`: List of enabled plugins.
* `metricsInterval`: Interval for gathering metrics (in seconds). (@deprecated, use `metricsIntervalDuration` instead)
* `metricsIntervalDuration`: Interval for gathering metrics (in `time.Duration`).
* `bypassLookupIPOfInterest`: If true, plugins like `packetparser` and `dropreason` will bypass IP lookup, generating an event for each packet regardless. `enableAnnotations` will not work if this is true.
* `dataAggregationLevel`: Defines the level of data aggregation for Retina. See [Data Aggregation](../05-Concepts/data-aggregation.md) for more details.

## Operator Config

* `installCRDs`: When this toggle is set, the operator will handle installing Retina-related CRDs.
* `enableRetinaEndpoint`: When this toggle is set, the operator will watch and update the cache with Pod metadata.
* `installCRDs`: Allows the operator to manage the installation of Retina-related CRDs.
* `enableTelemetry`: Enables telemetry for the operator in managed AKS clusters. Requires `buildinfo.ApplicationInsightsID` to be set if enabled.
* `captureDebug`: Toggles debug mode for captures. If true, the operator uses the image from the test container registry for the capture workload. Refer to *pkg/capture/utils/capture_image.go* for details on how the debug capture image version is selected.
* `captureJobNumLimit`: Sets the maximum number of jobs that can be created for each Capture.
* `enableRetinaEndpoint`: Allows the operator to monitor and update the cache with Pod metadata.
* `enableManagedStorageAccount`: Enables the use of a managed storage account for storing artifacts.
2 changes: 1 addition & 1 deletion docs/03-Metrics/modes/basic.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Possible values for `statistic_name` (for metric `tcp_connection_stats`):
- `TCPTimeouts`
- `TCPTSReorder`
- `ResetCount`
- and many others (full list [here](../plugins/linuxutil.md#label-values-for-tcp_connection_stats))
- and many others (full list [here](../plugins/Linux/linuxutil.md#label-values-for-tcp_connection_stats))

Possible values for `statistic_name` (for metric `ip_connection_stats`):

Expand Down
Original file line number Diff line number Diff line change
@@ -1,30 +1,31 @@
# `ciliumEventObserver` (Linux)
# `ciliumEventObserver`

Collect agent and perf events from cilium via monitor1_2 socket. This allows us to serve additional metrics and flows alongside Cilium events.

## Architecture

Cilium collects events and sends these events through the cilium monitor1_2 socket. These events can be categorized as Event Sample or Lost Record. Event samples can be broken down into different categories: Agent events or Perf Events.
Access Log events are events such as DNS resolutions matching a cilium node policy while Agent Events can be any cilium agent events.
Perf Events are bpf related events such as drop, trace, policy verdict, or capture events.

The cilium plugin will listen on this socket for these events, decode the payload and reconstruct either an Agent Event or a Perf Event. These events are then decoded using a lightweight cilium parser. Once these events are decoded into a flow object, it is then passed to the external channel. The retina daemon listens for these events and send it to our monitor agent. Our hubble observer will consume these events and process the flows using our own custom [parsers](https://github.com/microsoft/retina/tree/main/pkg/hubble/parser).

### Code locations

- Plugin and eBPF code: *pkg/plugin/ciliumeventobserver/*

## Metrics

The metrics will be dependent on our custom parsers. For now, we have L34 parser and L7 parser for dns and http.
We currently do not support Agent or Access Log events from cilium itself.
This [metrics reference](https://docs.cilium.io/en/stable/observability/metrics/#metrics-reference) from cilium can give an idea of what metrics can be added.

At the moment, we can see metrics such as:

| Name | Description | Extra Labels |
| ----------------------- | ----------------------- | ------------- |
| `hubble_drop_total` | Number of drops | destination, protocol, reason, source |
| `hubble_tcp_flags_total` | TCP flag occurrences | destination, family, flag, source |
| `hubble_metrics_http_handler_request_duration_seconds` | A histogram of latencies of Hubble metrics handler. | code, le |
| `hubble_flows_processed_total` | Total number of flows processed | destination, protocol, subtype, type, verdict |
| `hubble_metrics_http_handler_requests_total` | A counter for requests to Hubble metrics handler. | code |

## Architecture

Cilium collects events and sends these events through the cilium monitor1_2 socket. These events can be categorized as Event Sample or Lost Record. Event samples can be broken down into different categories: Agent events or Perf Events.
Access Log events are events such as DNS resolutions matching a cilium node policy while Agent Events can be any cilium agent events.
Perf Events are bpf related events such as drop, trace, policy verdict, or capture events.

The cilium plugin will listen on this socket for these events, decode the payload and reconstruct either an Agent Event or a Perf Event. These events are then decoded using a lightweight cilium parser. Once these events are decoded into a flow object, it is then passed to the external channel. The retina daemon listens for these events and send it to our monitor agent. Our hubble observer will consume these events and process the flows using our own custom [parsers](https://github.com/microsoft/retina/tree/main/pkg/hubble/parser).

### Code locations

- Plugin and eBPF code: *pkg/plugin/ciliumeventobserver/*
Original file line number Diff line number Diff line change
@@ -1,11 +1,7 @@
# `dns` (Linux)
# `dns`

Tracks incoming and outgoing DNS traffic, providing various metrics and details about the DNS queries and responses.

## Metrics

See metrics for [Basic Mode](../modes/basic.md#plugin-dns-linux) or [Advanced Mode](../modes/advanced.md#plugin-dns-linux).

## Architecture

This plugin uses [Inspektor Gadget](https://github.com/inspektor-gadget/inspektor-gadget)'s DNS Tracer to track DNS traffic and generate basic metrics derived from the captured events.
Expand All @@ -16,3 +12,7 @@ In [Advanced mode](https://retina.sh/docs/metrics/modes), the plugin further pro

- Plugin and eBPF code: *pkg/plugin/dns/*
- Module for extra Advanced metrics: *pkg/module/metrics/dns.go*

## Metrics

See metrics for [Basic Mode](../../modes/basic.md#plugin-dns-linux) or [Advanced Mode](../../modes/advanced.md#plugin-dns-linux).
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
# `dropreason` (Linux)
# `dropreason`

Counts number of packets/bytes dropped on a Node, along with the direction and reason for drop.

## Metrics

See metrics for [Basic Mode](../modes/basic.md#plugin-dropreason-linux) or [Advanced Mode](../modes/advanced.md#plugin-dropreason-linux).

## Architecture

The plugin utilizes eBPF to gather data.
The plugin generates Basic metrics from an eBPF result.
In Advanced mode (see [Metric Modes](../modes/modes.md)), the plugin turns this eBPF result into an enriched `Flow` (adding Pod information based on IP), then sends the `Flow` to an external channel so that a drops module can create extra Pod-Level metrics.
In Advanced mode (see [Metric Modes](../../modes/modes.md)), the plugin turns this eBPF result into an enriched `Flow` (adding Pod information based on IP), then sends the `Flow` to an external channel so that a drops module can create extra Pod-Level metrics.

### Code locations

- Plugin and eBPF code: *pkg/plugin/dropreason/*
- Module for extra Advanced metrics: *pkg/module/metrics/drops.go*

## Metrics

See metrics for [Basic Mode](../../modes/basic.md#plugin-dropreason-linux) or [Advanced Mode](../../modes/advanced.md#plugin-dropreason-linux).

### Data sources

This plugin reads data from variable eBPF progs writing into the same eBPF map called `metrics_map`.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,7 @@
# `infiniband` (Linux)
# `infiniband`

Gathers Nvidia Infiniband port counters and debug status parameters from /sys/class/infiniband and /sys/class/net (respectively).

## Metrics

Infiniband Port Counter Statistics

Infiniband Status Parameter Statistics

## Architecture

The plugin uses the following data sources:
Expand All @@ -19,6 +13,12 @@ The plugin uses the following data sources:

- Plugin code interfacing with the Infiniband driver: *pkg/plugin/infiniband/*

## Metrics

- Infiniband Port Counter Statistics

- Infiniband Status Parameter Statistics

## Label Values for Infiniband Port Counters

Below is a running list of all statistics for Infiniband port counters
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,7 @@
# `linuxutil` (Linux)
# `linuxutil`

Gathers TCP/UDP statistics and network interface statistics from the `netstats` and `ethtool` Node utilities (respectively).

## Metrics

See metrics for [Basic Mode](../modes/basic.md#plugin-linuxutil-linux) (Advanced modes have identical metrics).

## Architecture

The plugin uses the following utilities as data sources:
Expand All @@ -21,30 +17,34 @@ The plugin uses the following utilities as data sources:

- Plugin code interfacing with the Node utilities: *pkg/plugin/linuxutil/*

## Metrics

See metrics for [Basic Mode](../../modes/basic.md#plugin-linuxutil-linux) (Advanced modes have identical metrics).

### Configuration (in Code)

Both `ethtool` and `netstat` data can be curated to remove unwanted data. Below options in a struct in *linuxutil.go* can be used to configure the same.

```go
type EthtoolOpts struct {
// when true will only include keys with err or drop in its name
errOrDropKeysOnly bool
// when true will only include keys with err or drop in its name
errOrDropKeysOnly bool

// when true will include all keys with value 0
addZeroVal bool
// when true will include all keys with value 0
addZeroVal bool
}
```

```go
type NetstatOpts struct {
// when true only includes curated list of keys
CuratedKeys bool
// when true only includes curated list of keys
CuratedKeys bool

// when true will include all keys with value 0
AddZeroVal bool
// when true will include all keys with value 0
AddZeroVal bool

// get only listening sockets
ListenSock bool
// get only listening sockets
ListenSock bool
}
```

Expand Down
22 changes: 22 additions & 0 deletions docs/03-Metrics/plugins/Linux/packetforward.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# `packetforward`

Counts number of packets/bytes passing through the `eth0` interface of a Node, along with the direction of the packets.

## Architecture

`packetforward` uses an eBPF socket filter program on the host's primary interface to capture packets and generate basic metrics from the captured data.

### Code locations

- Plugin and eBPF code: *pkg/plugin/packetforward/*

## Metrics

See metrics for [Basic Mode](../../modes/basic.md#plugin-packetforward-linux) (Advanced modes have identical metrics).

:::note

`adv_forward_count` and `adv_forward_bytes` metrics are NOT associated with `packetforward` plugin despite similarities in name.
These metrics are associated with [`packetparser`](./packetparser.md).

:::
48 changes: 48 additions & 0 deletions docs/03-Metrics/plugins/Linux/packetparser.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# `packetparser`

Captures TCP and UDP packets traveling to and from pods and nodes.

## Architecture

`packetparser` attached a [`qdisc` (Queuing Discipline)](https://www.man7.org/linux/man-pages/man8/tc.8.html) of type `clsact` to each pod's virtual interface (`veth`) and the host's default interface (`device`). This setup enabled the attachment of eBPF filter programs for both ingress and egress directions, allowing `packetparser` to capture individual packets traveling to and from the interfaces.

`packetparser` does not produce Basic metrics. In Advanced mode (refer to [Metric Modes](../../modes/modes.md)), the plugin transforms an eBPF result into an enriched `Flow` by adding Pod information based on IP. It then sends the `Flow` to an external channel, enabling *several modules* to generate Pod-Level metrics.

### Code locations

- Plugin and eBPF code: *pkg/plugin/packetparser/*
- Modules for extra Advanced metrics: see section below.

## Metrics

See metrics for [Advanced Mode](../../modes/advanced.md#plugin-packetparser-linux). For module information, see [below](#modules).

### Modules

#### Module: forward

Code path: *pkg/module/metrics/forward.go*

Metrics produced:

- `adv_forward_count`
- `adv_forward_bytes`

#### Module: tcpflags

Code path: *pkg/module/metrics/tcpflags.go*

Metrics produced:

- `adv_forward_count`
- `adv_forward_bytes`

#### Module: latency (API Server)

Code path: *pkg/module/metrics/latency.go*

Metrics produced:

- `adv_node_apiserver_latency`
- `adv_node_apiserver_no_response`
- `adv_node_apiserver_tcp_handshake_latency`
18 changes: 18 additions & 0 deletions docs/03-Metrics/plugins/Linux/tcpretrans.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# `tcpretrans`

Measures retransmitted TCP packets.

## Architecture

The plugin utilizes eBPF to gather data.
The plugin does not generate Basic metrics.
In Advanced mode (see [Metric Modes](../../modes/modes.md)), the plugin turns an eBPF result into an enriched `Flow` (adding Pod information based on IP), then sends the `Flow` to an external channel so that a tcpretrans module can create Pod-Level metrics.

### Code locations

- Plugin and eBPF code: *pkg/plugin/tcpretrans/*
- Module for extra Advanced metrics: *pkg/module/metrics/tcpretrans.go*

## Metrics

See metrics for [Advanced Mode](../../modes/advanced.md#plugin-tcpretrans-linux).
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# `hnsstats` (Windows)
# `hnsstats`

Gathers TCP statistics and counts number of packets/bytes forwarded or dropped in HNS and VFP.

## Metrics

See metrics for [Basic Mode](../modes/basic.md#plugin-hnsstats-windows) (Advanced modes have identical metrics).

## Architecture

Interfaces with a Windows Node's HNS (Host Networking System) and VFP (Virtual Filtering Platform).

### Code Locations

- Plugin code interfacing with HNS/VFP: *pkg/plugin/windows/hnsstats*

## Metrics

See metrics for [Basic Mode](../../modes/basic.md#plugin-hnsstats-windows) (Advanced modes have identical metrics).
19 changes: 0 additions & 19 deletions docs/03-Metrics/plugins/packetforward.md

This file was deleted.

Loading

0 comments on commit dc8b505

Please sign in to comment.