Skip to content

Commit

Permalink
test(mc): Multi-Cloud multi-cluster single Grafana (#1322)
Browse files Browse the repository at this point in the history
# Description

Create a multi cloud multi cluster deployment where each cluster has a
deployment of Prometheus and Retina. Each cluster exposes Prometheus as
a load balancer. Both load balancers are connected to a single instance
of Grafana to visualize retina network observability metrics.

* Add module for Kubernetes load balancer service used by AKS and GKE
Prometheus instances
* Automate the data source config via Grafana module
* Add module for Azure Network Security Group
* Add module for Google Cloud Firewall
* Update retina-gke and retina-aks live stacks


![grafana-mc](https://github.com/user-attachments/assets/b24138cb-9b03-4d46-8231-ebba530ce486)

## Related Issue

#1267 

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [x] I have followed the project's style guidelines.
- [x] I have updated the documentation, if necessary.
- [x] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed


![image](https://github.com/user-attachments/assets/fc9ec2b5-9ca5-4a41-bff4-bb97c23bd67d)


![image](https://github.com/user-attachments/assets/2f4779cc-8677-4bc0-9a65-faebcddb0c94)

## Additional Notes

Add any additional notes or context about the pull request here.

---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.
  • Loading branch information
SRodi authored Feb 11, 2025
1 parent 819a9ee commit fce4894
Show file tree
Hide file tree
Showing 56 changed files with 2,610 additions and 362 deletions.
62 changes: 46 additions & 16 deletions test/multicloud/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,29 @@ This project leverages [OpenTofu](https://opentofu.org/docs/intro/) Infrastructu

![Architecture Diagram](./diagrams/diagram.svg)

An example Hubble UI visualization on GKE dataplane v1 (no Cilium). [See GKE network overview doc](https://cloud.google.com/kubernetes-engine/docs/concepts/network-overview).

## Modules available

* [aks](./modules/aks/)
* [gke](./modules/gke/)
* [kind](./modules/kind/)
* [retina](./modules/retina/)
* [aks](./modules/aks/): Deploy Azure Kubernetes Service cluster.
* [gke](./modules/gke/): Deploy Google Kubernetes Engine cluster.
* [kind](./modules/kind/): Deploy KIND cluster.
* [helm-release](./modules/helm-release/): Deploy a Helm Chart, used to deploy Retina and Prometheus.
* [kubernetes-lb](./modules/kubernetes-lb/): Create a Kubernetes Service of type Load Balancer, used to expose Prometheus.
* [grafana](./modules/grafana/): Set up multiple Prometheus data sources in Grafana Cloud.
* [aks-nsg](./modules/aks-nsg/): Inboud and outbount rules for AKS Load Balancer.
* [gke-firewall](./modules/gke-firewall/): Inboud and outbount rules for GKE Load Balancer.

## Prerequisites

* [OpenTofu installation guide](https://opentofu.org/docs/intro/install/)

* AKS:

1. create an Azure account
2. [Install az](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli)
1. Create an Azure account.
2. [Install az](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli).

To deploy an AKS cluster and install retina, create file `live/retina-aks/terraform.tfvars` with the Azure TenantID and SubscriptionID
To deploy an AKS cluster and install retina, create file `live/retina-aks/terraform.tfvars` with the Azure TenantID and SubscriptionID.

```sh
# example values
Expand All @@ -30,10 +36,10 @@ This project leverages [OpenTofu](https://opentofu.org/docs/intro/) Infrastructu

* GKE:

1. create a gcloud account, project and enable billing
2. create a service account and service account key
3. [Enable Kubernetes Engine API](https://console.developers.google.com/apis/api/container.googleapis.com/overview?project=mc-retina)
4. [Install gcloud](https://cloud.google.com/sdk/docs/install)
1. create a gcloud account, project and enable billing.
2. create a service account and service account key.
3. [Enable Kubernetes Engine API](https://console.developers.google.com/apis/api/container.googleapis.com/overview?project=mc-retina).
4. [Install gcloud](https://cloud.google.com/sdk/docs/install).

To deploy a GKE cluster export `GOOGLE_APPLICATION_CREDENTIALS` env variable to point to the path where your [service account key](https://cloud.google.com/iam/docs/keys-create-delete) is located.

Expand All @@ -42,12 +48,25 @@ This project leverages [OpenTofu](https://opentofu.org/docs/intro/) Infrastructu
export GOOGLE_APPLICATION_CREDENTIALS=/Users/srodi/src/retina/test/multicloud/live/retina-gke/service-key.json
```

* Grafana

1. Set up a [Grafana Cloud free account](https://grafana.com/pricing/) and start an instance.
2. Create a [Service Account](https://grafana.com/docs/grafana/latest/administration/service-accounts/#create-a-service-account-in-grafana).
3. Export `GRAFANA_AUTH` environmnet variable containing the service account token.

```sh
# example
export GRAFANA_AUTH=glsa_s0MeRan0mS7r1ng_1ab2c345
```

* Kind:

1. Docker installed on the host machine

## Quickstart

![Hubble on GKE v1 dataplane (no Cilium)](./diagrams/mc-gke-hubble.png)

The following Make targets can be used to manage each stack lifecycle.

### Create
Expand Down Expand Up @@ -93,15 +112,20 @@ make test

## Providers references

* [GKE resource documentation](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster)
* [AKS resource documentation](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_cluster)
* [Kind resource documentation](https://registry.terraform.io/providers/tehcyx/kind/latest/docs/resources/cluster)
Resources documentation:

* [GKE](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster)
* [AKS](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_cluster)
* [Kind](https://registry.terraform.io/providers/tehcyx/kind/latest/docs/resources/cluster)
* [Helm Release](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release)
* [Kubernetes LB Service](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/service)
* [Grafana Data Source](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/data_source)

## Troubleshooting

In case the test fails due to timeout, validate the resource was created by the provider, and if it is, you can import into OpenTofu state.

Here is an example on how to import resources for `modules/gke`
Here is an example on how to import resources for `modules/gke`:

```sh
# move to the stack directory
Expand All @@ -110,4 +134,10 @@ tofu import module.gke.google_container_cluster.gke europe-west2/test-gke-cluste
tofu import module.gke.google_service_account.default projects/mc-retina/serviceAccounts/[email protected]
```

>Note: each resource documentation contains a section on how to import resources into the State. [Example for google_container_cluster resource](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster#import)
>Note: each resource documentation contains a section on how to import resources into the State. [Example for google_container_cluster resource](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster#import).

## Multi-Cloud

The [live/](./live/) directory contains the multi-cloud / multi-cluster stacks to deploy clusters, install Retina, install Prometheus, expose all Prometheus using load blanaces, and configure a Grafana Cloud instance to consume prometheus data sources to visualize multiple cluster in a single Grafana dashboard.

![Architecture Diagram](./diagrams/diagram-mc.svg)
2 changes: 2 additions & 0 deletions test/multicloud/diagrams/diagram-mc.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test/multicloud/diagrams/mc-gke-hubble-ui.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added test/multicloud/diagrams/mc-gke-hubble.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit fce4894

Please sign in to comment.