-
Notifications
You must be signed in to change notification settings - Fork 217
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
test(mc): Multi-Cloud multi-cluster single Grafana (#1322)
# Description Create a multi cloud multi cluster deployment where each cluster has a deployment of Prometheus and Retina. Each cluster exposes Prometheus as a load balancer. Both load balancers are connected to a single instance of Grafana to visualize retina network observability metrics. * Add module for Kubernetes load balancer service used by AKS and GKE Prometheus instances * Automate the data source config via Grafana module * Add module for Azure Network Security Group * Add module for Google Cloud Firewall * Update retina-gke and retina-aks live stacks ![grafana-mc](https://github.com/user-attachments/assets/b24138cb-9b03-4d46-8231-ebba530ce486) ## Related Issue #1267 ## Checklist - [x] I have read the [contributing documentation](https://retina.sh/docs/contributing). - [x] I signed and signed-off the commits (`git commit -S -s ...`). See [this documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification) on signing commits. - [x] I have correctly attributed the author(s) of the code. - [x] I have tested the changes locally. - [x] I have followed the project's style guidelines. - [x] I have updated the documentation, if necessary. - [x] I have added tests, if applicable. ## Screenshots (if applicable) or Testing Completed ![image](https://github.com/user-attachments/assets/fc9ec2b5-9ca5-4a41-bff4-bb97c23bd67d) ![image](https://github.com/user-attachments/assets/2f4779cc-8677-4bc0-9a65-faebcddb0c94) ## Additional Notes Add any additional notes or context about the pull request here. --- Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more information on how to contribute to this project.
- Loading branch information
Showing
56 changed files
with
2,610 additions
and
362 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,23 +4,29 @@ This project leverages [OpenTofu](https://opentofu.org/docs/intro/) Infrastructu | |
|
||
![Architecture Diagram](./diagrams/diagram.svg) | ||
|
||
An example Hubble UI visualization on GKE dataplane v1 (no Cilium). [See GKE network overview doc](https://cloud.google.com/kubernetes-engine/docs/concepts/network-overview). | ||
|
||
## Modules available | ||
|
||
* [aks](./modules/aks/) | ||
* [gke](./modules/gke/) | ||
* [kind](./modules/kind/) | ||
* [retina](./modules/retina/) | ||
* [aks](./modules/aks/): Deploy Azure Kubernetes Service cluster. | ||
* [gke](./modules/gke/): Deploy Google Kubernetes Engine cluster. | ||
* [kind](./modules/kind/): Deploy KIND cluster. | ||
* [helm-release](./modules/helm-release/): Deploy a Helm Chart, used to deploy Retina and Prometheus. | ||
* [kubernetes-lb](./modules/kubernetes-lb/): Create a Kubernetes Service of type Load Balancer, used to expose Prometheus. | ||
* [grafana](./modules/grafana/): Set up multiple Prometheus data sources in Grafana Cloud. | ||
* [aks-nsg](./modules/aks-nsg/): Inboud and outbount rules for AKS Load Balancer. | ||
* [gke-firewall](./modules/gke-firewall/): Inboud and outbount rules for GKE Load Balancer. | ||
|
||
## Prerequisites | ||
|
||
* [OpenTofu installation guide](https://opentofu.org/docs/intro/install/) | ||
|
||
* AKS: | ||
|
||
1. create an Azure account | ||
2. [Install az](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) | ||
1. Create an Azure account. | ||
2. [Install az](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli). | ||
|
||
To deploy an AKS cluster and install retina, create file `live/retina-aks/terraform.tfvars` with the Azure TenantID and SubscriptionID | ||
To deploy an AKS cluster and install retina, create file `live/retina-aks/terraform.tfvars` with the Azure TenantID and SubscriptionID. | ||
|
||
```sh | ||
# example values | ||
|
@@ -30,10 +36,10 @@ This project leverages [OpenTofu](https://opentofu.org/docs/intro/) Infrastructu | |
|
||
* GKE: | ||
|
||
1. create a gcloud account, project and enable billing | ||
2. create a service account and service account key | ||
3. [Enable Kubernetes Engine API](https://console.developers.google.com/apis/api/container.googleapis.com/overview?project=mc-retina) | ||
4. [Install gcloud](https://cloud.google.com/sdk/docs/install) | ||
1. create a gcloud account, project and enable billing. | ||
2. create a service account and service account key. | ||
3. [Enable Kubernetes Engine API](https://console.developers.google.com/apis/api/container.googleapis.com/overview?project=mc-retina). | ||
4. [Install gcloud](https://cloud.google.com/sdk/docs/install). | ||
|
||
To deploy a GKE cluster export `GOOGLE_APPLICATION_CREDENTIALS` env variable to point to the path where your [service account key](https://cloud.google.com/iam/docs/keys-create-delete) is located. | ||
|
||
|
@@ -42,12 +48,25 @@ This project leverages [OpenTofu](https://opentofu.org/docs/intro/) Infrastructu | |
export GOOGLE_APPLICATION_CREDENTIALS=/Users/srodi/src/retina/test/multicloud/live/retina-gke/service-key.json | ||
``` | ||
|
||
* Grafana | ||
|
||
1. Set up a [Grafana Cloud free account](https://grafana.com/pricing/) and start an instance. | ||
2. Create a [Service Account](https://grafana.com/docs/grafana/latest/administration/service-accounts/#create-a-service-account-in-grafana). | ||
3. Export `GRAFANA_AUTH` environmnet variable containing the service account token. | ||
|
||
```sh | ||
# example | ||
export GRAFANA_AUTH=glsa_s0MeRan0mS7r1ng_1ab2c345 | ||
``` | ||
|
||
* Kind: | ||
|
||
1. Docker installed on the host machine | ||
|
||
## Quickstart | ||
|
||
![Hubble on GKE v1 dataplane (no Cilium)](./diagrams/mc-gke-hubble.png) | ||
|
||
The following Make targets can be used to manage each stack lifecycle. | ||
|
||
### Create | ||
|
@@ -93,15 +112,20 @@ make test | |
|
||
## Providers references | ||
|
||
* [GKE resource documentation](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster) | ||
* [AKS resource documentation](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_cluster) | ||
* [Kind resource documentation](https://registry.terraform.io/providers/tehcyx/kind/latest/docs/resources/cluster) | ||
Resources documentation: | ||
|
||
* [GKE](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster) | ||
* [AKS](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_cluster) | ||
* [Kind](https://registry.terraform.io/providers/tehcyx/kind/latest/docs/resources/cluster) | ||
* [Helm Release](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | ||
* [Kubernetes LB Service](https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs/resources/service) | ||
* [Grafana Data Source](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/data_source) | ||
|
||
## Troubleshooting | ||
|
||
In case the test fails due to timeout, validate the resource was created by the provider, and if it is, you can import into OpenTofu state. | ||
|
||
Here is an example on how to import resources for `modules/gke` | ||
Here is an example on how to import resources for `modules/gke`: | ||
|
||
```sh | ||
# move to the stack directory | ||
|
@@ -110,4 +134,10 @@ tofu import module.gke.google_container_cluster.gke europe-west2/test-gke-cluste | |
tofu import module.gke.google_service_account.default projects/mc-retina/serviceAccounts/[email protected] | ||
``` | ||
|
||
>Note: each resource documentation contains a section on how to import resources into the State. [Example for google_container_cluster resource](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster#import) | ||
>Note: each resource documentation contains a section on how to import resources into the State. [Example for google_container_cluster resource](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster#import). | ||
|
||
## Multi-Cloud | ||
|
||
The [live/](./live/) directory contains the multi-cloud / multi-cluster stacks to deploy clusters, install Retina, install Prometheus, expose all Prometheus using load blanaces, and configure a Grafana Cloud instance to consume prometheus data sources to visualize multiple cluster in a single Grafana dashboard. | ||
|
||
![Architecture Diagram](./diagrams/diagram-mc.svg) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.