Skip to content

Commit

Permalink
Merge branch 'main' into feat/vethwatcher
Browse files Browse the repository at this point in the history
  • Loading branch information
nddq authored Jan 17, 2025
2 parents cfe881e + 2d30432 commit 537f194
Show file tree
Hide file tree
Showing 7 changed files with 117 additions and 70 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,5 @@ image-metadata-*.json
*results*.json
netperf-*.json
netperf-*.csv

.certs/
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -549,6 +549,9 @@ get-certs:
hubble config set tls true
hubble config set tls-server-name instance.hubble-relay.cilium.io

# Replaces every '.' in $(1) with '\.'
escape_dot = $(subst .,\.,$(1))

.PHONY: clean-certs
clean-certs:
rm -rf $(CERT_DIR)
Expand Down
4 changes: 3 additions & 1 deletion docs/02-Installation/01-Setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ Note: you can also run captures with just the [CLI](./02-CLI.md).

## Installation

Requires Helm version >= v3.8.0.
### Requirements

- Helm version >= v3.8.0.

### Basic Mode

Expand Down
59 changes: 47 additions & 12 deletions docs/02-Installation/03-Config.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,60 @@

## Overview

To customize metrics and other options, modify the `retina-config` ConfigMap. Default settings for each component are specified in *deploy/legacy/manifests/controller/helm/retina/values.yaml*.
### Default Configuration

## Agent Config
Default settings for each component are specified in [Values file](../../deploy/legacy/manifests/controller/helm/retina/values.yaml).

### Deployed Configuration

Configuration of an active Retina deployment can be seen in `retina-config` and `retina-operator-config` configmaps.

```shell
kubectl get configmap retina-config -n kube-system -o yaml
kubectl get configmap retina-operator-config -n kube-system -o yaml
```

### Updating Configuration

If the Retina installation was done via Helm, configuration updates should be done via `helm upgrade` defining the specific attribute name and value as part of the command.

The example below enables gathering of advance pod-level metrics.

```shell
VERSION=$( curl -sL https://api.github.com/repos/microsoft/retina/releases/latest | jq -r .name)
helm upgrade --install retina oci://ghcr.io/microsoft/retina/charts/retina \
--version $VERSION \
--namespace kube-system \
--set image.tag=$VERSION \
--set operator.tag=$VERSION \
--set logLevel=info \
--set enabledPlugin_linux="\[dropreason\,packetforward\,linuxutil\,dns\]"
--set enablePodLevel=true
```

## General Configuration

Apply to both Agent and Operator.

* `enableTelemetry`: Enables telemetry for the agent for managed AKS clusters. Requires `buildinfo.ApplicationInsightsID` to be set if enabled.
* `enablePodLevel`: Enables gathering of advanced pod-level metrics, attaching pods' metadata to Retina's metrics.
* `remoteContext`: Enables Retina to watch Pods on the cluster.
* `enableAnnotations`: Enables gathering of metrics for annotated resources. Resources can be annotated with `retina.sh=observe`. Requires the operator and `enableRetinaEndpoint` to be enabled.
* `enabledPlugin`: List of enabled plugins.

## Agent Configuration

* `logLevel`: Define the level of logs to store.
* `enabledPlugin_linux`: List of enabled plugins.
* `metricsInterval`: Interval for gathering metrics (in seconds). (@deprecated, use `metricsIntervalDuration` instead)
* `metricsIntervalDuration`: Interval for gathering metrics (in `time.Duration`).
* `enablePodLevel`: Enables gathering of advanced pod-level metrics, attaching pods' metadata to Retina's metrics.
* `enableConntrackMetrics`: Enables conntrack metrics for packets and bytes forwarded/received.
* `enableAnnotations`: Enables gathering of metrics for annotated resources. Resources can be annotated with `retina.sh=observe`. Requires the operator and `operator.enableRetinaEndpoint` to be enabled.
* `bypassLookupIPOfInterest`: If true, plugins like `packetparser` and `dropreason` will bypass IP lookup, generating an event for each packet regardless. `enableAnnotations` will not work if this is true.
* `dataAggregationLevel`: Defines the level of data aggregation for Retina. See [Data Aggregation](../05-Concepts/data-aggregation.md) for more details.

## Operator Config
## Operator Configuration

* `installCRDs`: Allows the operator to manage the installation of Retina-related CRDs.
* `enableTelemetry`: Enables telemetry for the operator in managed AKS clusters. Requires `buildinfo.ApplicationInsightsID` to be set if enabled.
* `captureDebug`: Toggles debug mode for captures. If true, the operator uses the image from the test container registry for the capture workload. Refer to *pkg/capture/utils/capture_image.go* for details on how the debug capture image version is selected.
* `captureJobNumLimit`: Sets the maximum number of jobs that can be created for each Capture.
* `enableRetinaEndpoint`: Allows the operator to monitor and update the cache with Pod metadata.
* `enableManagedStorageAccount`: Enables the use of a managed storage account for storing artifacts.
* `operator.installCRDs`: Allows the operator to manage the installation of Retina-related CRDs.
* `operator.enableRetinaEndpoint`: Allows the operator to monitor and update the cache with Pod metadata.
* `capture.captureDebug`: Toggles debug mode for captures. If true, the operator uses the image from the test container registry for the capture workload. Refer to [Capture Image file](../../pkg/capture/utils/capture_image.go) for details on how the debug capture image version is selected.
* `capture.captureJobNumLimit`: Sets the maximum number of jobs that can be created for each Capture.
* `capture.enableManagedStorageAccount`: Enables the use of a managed storage account for storing artifacts.
9 changes: 7 additions & 2 deletions docs/02-Installation/04-prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Prometheus is an open-source system monitoring and alerting toolkit originally b

1. Create a Kubernetes cluster.
2. Install Retina DaemonSet (see [Quick Installation](./01-Setup.md)).
3. Clone [Retina Repository](https://github.com/microsoft/retina) or download [Prometheus Values File](../../deploy/legacy/prometheus/values.yaml).

## Install Prometheus via Helm

Expand All @@ -19,13 +20,17 @@ Prometheus is an open-source system monitoring and alerting toolkit originally b
1. Install the Prometheus chart

```shell
helm install prometheus -n kube-system -f deploy/legacy/prometheus/values.yaml prometheus-community/kube-prometheus-stack
# The value of VALUE_FILE_PATH is relative to the repo root folder. Update this according to the location of your file.
VALUE_FILE_PATH=deploy/legacy/prometheus/values.yaml
helm install prometheus -n kube-system -f $VALUE_FILE_PATH prometheus-community/kube-prometheus-stack
```

Or if you already have the chart installed, upgrade how you see fit, providing the new job name as an additional scrape config, ex:

```shell
helm upgrade prometheus -n kube-system -f deploy/legacy/prometheus/values.yaml prometheus-community/kube-prometheus-stack
# The value of VALUE_FILE_PATH is relative to the repo root folder. Update this according to the location of your file.
VALUE_FILE_PATH=deploy/legacy/prometheus/values.yaml
helm upgrade prometheus -n kube-system -f $VALUE_FILE_PATH prometheus-community/kube-prometheus-stack
```

> Note: Grafana and kube-state metrics may schedule on Windows nodes, the current chart doesn't have node affinity for those components. Some manual intervention may be required.
Expand Down
34 changes: 17 additions & 17 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ require (
github.com/Azure/go-autorest/autorest/date v0.3.0 // indirect
github.com/Azure/go-autorest/logger v0.2.1 // indirect
github.com/Azure/go-autorest/tracing v0.6.0 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.3.1 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.3.2 // indirect
github.com/BurntSushi/toml v1.3.2 // indirect
github.com/MakeNowJust/heredoc v1.0.0 // indirect
github.com/Masterminds/goutils v1.1.1 // indirect
Expand All @@ -48,18 +48,18 @@ require (
github.com/armon/go-metrics v0.4.1 // indirect
github.com/asaskevich/govalidator v0.0.0-20230301143203-a9d515a09cc2 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.6.7 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.23 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.27 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.27 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.24 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.28 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.28 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.1 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.3.27 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.3.28 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.1 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.4.8 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.8 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.18.8 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.24.9 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.8 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.33.7 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.5.0 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.9 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.18.9 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.24.10 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.9 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.33.8 // indirect
github.com/aws/smithy-go v1.22.1 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/blang/semver/v4 v4.0.0 // indirect
Expand Down Expand Up @@ -262,7 +262,7 @@ require (
golang.org/x/sync v0.10.0
golang.org/x/sys v0.29.0
golang.org/x/term v0.28.0 // indirect
google.golang.org/protobuf v1.36.1
google.golang.org/protobuf v1.36.3
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/api v0.30.3
Expand All @@ -279,7 +279,7 @@ require (
github.com/Azure/azure-container-networking/zapai v0.0.3
github.com/Azure/azure-sdk-for-go v68.0.0+incompatible
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.17.0
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.8.0
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.8.1
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/containerservice/armcontainerservice/v4 v4.8.0
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/dashboard/armdashboard v1.2.0
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/monitor/armmonitor v0.11.0
Expand All @@ -289,10 +289,10 @@ require (
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.5.0
github.com/Microsoft/hcsshim v0.12.0-rc.3
github.com/Sytten/logrus-zap-hook v0.1.0
github.com/aws/aws-sdk-go-v2 v1.32.8
github.com/aws/aws-sdk-go-v2/config v1.28.11
github.com/aws/aws-sdk-go-v2/credentials v1.17.52
github.com/aws/aws-sdk-go-v2/service/s3 v1.72.2
github.com/aws/aws-sdk-go-v2 v1.33.0
github.com/aws/aws-sdk-go-v2/config v1.29.0
github.com/aws/aws-sdk-go-v2/credentials v1.17.53
github.com/aws/aws-sdk-go-v2/service/s3 v1.73.0
github.com/cakturk/go-netstat v0.0.0-20200220111822-e5b49efee7a5
github.com/cilium/cilium v1.16.0-pre.1.0.20240403152809-b9853ecbcaeb
github.com/cilium/ebpf v0.16.0
Expand Down
Loading

0 comments on commit 537f194

Please sign in to comment.