Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(mc): Multi-Cloud multi-cluster single Grafana #1322

Merged
merged 11 commits into from
Feb 11, 2025
62 changes: 46 additions & 16 deletions test/multicloud/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,29 @@ This project leverages [OpenTofu](https://opentofu.org/docs/intro/) Infrastructu

![Architecture Diagram](./diagrams/diagram.svg)

An example Hubble UI visualization on GKE dataplane v1 (no Cilium). [See GKE network overview doc](https://cloud.google.com/kubernetes-engine/docs/concepts/network-overview).

## Modules available

* [aks](./modules/aks/)
* [gke](./modules/gke/)
* [kind](./modules/kind/)
* [retina](./modules/retina/)
* [aks](./modules/aks/): Deploy Azure Kubernetes Service cluster.
* [gke](./modules/gke/): Deploy Google Kubernetes Engine cluster.
* [kind](./modules/kind/): Deploy KIND cluster.
* [helm-release](./modules/helm-release/): Deploy a Helm Chart, used to deploy Retina and Prometheus.
* [kubernetes-lb](./modules/kubernetes-lb/): Create a Kubernetes Service of type Load Balancer, used to expose Prometheus.
* [grafana](./modules/grafana/): Set up multiple Prometheus data sources in Grafana Cloud.
* [aks-nsg](./modules/aks-nsg/): Inboud and outbount rules for AKS Load Balancer.
* [gke-firewall](./modules/gke-firewall/): Inboud and outbount rules for GKE Load Balancer.

## Prerequisites

* [OpenTofu installation guide](https://opentofu.org/docs/intro/install/)

* AKS:

1. create an Azure account
2. [Install az](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli)
1. Create an Azure account.
2. [Install az](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli).

To deploy an AKS cluster and install retina, create file `live/retina-aks/terraform.tfvars` with the Azure TenantID and SubscriptionID
To deploy an AKS cluster and install retina, create file `live/retina-aks/terraform.tfvars` with the Azure TenantID and SubscriptionID.

```sh
# example values
Expand All @@ -30,10 +36,10 @@ This project leverages [OpenTofu](https://opentofu.org/docs/intro/) Infrastructu

* GKE:

1. create a gcloud account, project and enable billing
2. create a service account and service account key
3. [Enable Kubernetes Engine API](https://console.developers.google.com/apis/api/container.googleapis.com/overview?project=mc-retina)
4. [Install gcloud](https://cloud.google.com/sdk/docs/install)
1. create a gcloud account, project and enable billing.
2. create a service account and service account key.
3. [Enable Kubernetes Engine API](https://console.developers.google.com/apis/api/container.googleapis.com/overview?project=mc-retina).
4. [Install gcloud](https://cloud.google.com/sdk/docs/install).

To deploy a GKE cluster export `GOOGLE_APPLICATION_CREDENTIALS` env variable to point to the path where your [service account key](https://cloud.google.com/iam/docs/keys-create-delete) is located.

Expand All @@ -42,12 +48,25 @@ This project leverages [OpenTofu](https://opentofu.org/docs/intro/) Infrastructu
export GOOGLE_APPLICATION_CREDENTIALS=/Users/srodi/src/retina/test/multicloud/live/retina-gke/service-key.json
```

* Grafana

1. Set up a [Grafana Cloud free account](https://grafana.com/pricing/) and start an instance.
2. Create a [Service Account](https://grafana.com/docs/grafana/latest/administration/service-accounts/#create-a-service-account-in-grafana).
3. Export `GRAFANA_AUTH` environmnet variable containing the service account token.

```sh
# example
export GRAFANA_AUTH=glsa_s0MeRan0mS7r1ng_1ab2c345
```

* Kind:

1. Docker installed on the host machine

## Quickstart

![Hubble on GKE v1 dataplane (no Cilium)](./diagrams/mc-gke-hubble.png)

The following Make targets can be used to manage each stack lifecycle.

### Create
Expand Down Expand Up @@ -93,15 +112,20 @@ make test

## Providers references

* [GKE resource documentation](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster)
* [AKS resource documentation](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_cluster)
* [Kind resource documentation](https://registry.terraform.io/providers/tehcyx/kind/latest/docs/resources/cluster)
Resources documentation:

* [GKE](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster)
* [AKS](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_cluster)
* [Kind](https://registry.terraform.io/providers/tehcyx/kind/latest/docs/resources/cluster)
* [Helm Release](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release)
* [Kubernetes LB Service](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/service)
* [Grafana Data Source](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/data_source)

## Troubleshooting

In case the test fails due to timeout, validate the resource was created by the provider, and if it is, you can import into OpenTofu state.

Here is an example on how to import resources for `modules/gke`
Here is an example on how to import resources for `modules/gke`:

```sh
# move to the stack directory
Expand All @@ -110,4 +134,10 @@ tofu import module.gke.google_container_cluster.gke europe-west2/test-gke-cluste
tofu import module.gke.google_service_account.default projects/mc-retina/serviceAccounts/[email protected]
```

>Note: each resource documentation contains a section on how to import resources into the State. [Example for google_container_cluster resource](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster#import)
>Note: each resource documentation contains a section on how to import resources into the State. [Example for google_container_cluster resource](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster#import).

## Multi-Cloud

The [live/](./live/) directory contains the multi-cloud / multi-cluster stacks to deploy clusters, install Retina, install Prometheus, expose all Prometheus using load blanaces, and configure a Grafana Cloud instance to consume prometheus data sources to visualize multiple cluster in a single Grafana dashboard.

![Architecture Diagram](./diagrams/diagram-mc.svg)
2 changes: 2 additions & 0 deletions test/multicloud/diagrams/diagram-mc.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test/multicloud/diagrams/mc-gke-hubble-ui.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test/multicloud/diagrams/mc-gke-hubble.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading