High level summary: This setup assumes you're creating the "host" of ArgoCD self-managed and walks through the bootstrap setup. Once completed, adding clusters and managing clusters should be only done via git operations.
The setup is broken into a few different steps to prevent a self-managed ArgoCD from accidentally destroying the "core" foundation (mainly CRDs and namespace). Once setup, the entire setup is managed via changes in this repo (or yours).
Note: This repo uses features that require >= ArgoCD 2.9 (set in the config) for inline kustomize patching & kustomize oci registry support.
- Secret data lives in GCP (Secret Manager)
- All clusters can talk to the GCP via Workload Identity (TODO: instructions)
- A GitHub application is used to represent the connection between ArgoCD and GitHub (scanning repos, leaving comments, etc.)
- https://kubernetes.io/docs/tasks/tools/#kubectl installed
By doing so, future bootstrapping is simplified and, for the most part, cloud agnostic.
GCP Secret Manager can be substituted for any other secret manager, but each new provider requires an answer to IAM, security, and access. Another consideration is how to get "secret zero" into this setup, which is easily solved with gcloud
.
Install ArgoCD using kustomize (the preferred method) and then use an ArgoCD Application to wire-up the self-management.
Install any CRDs that you might use OUTSIDE of the ArgoCD automation. This prevents any accidental deletions or chicken/egg problems cleaning up essential resources.
View exactly what will be installed with:
# gets the names of each CRD yaml generated
kubectl kustomize workloads/WIP-00-crds/config/base | grep '^ name: '
Or review workloads/WIP-00-crds/config/base
Actually install the CRDs
IMPORTANT: Don't forget to upgrade the CRDs as the applications that use them are updated.
# versioned argocd crds - move to script to update pre-installed versions
kubectl apply -k "https://github.com/argoproj/argo-cd/manifests/crds?ref=v2.13.0-rc1"
kubectl apply -k workloads/WIP-00-crds/config/base
Output similar to:
customresourcedefinition.apiextensions.k8s.io/applications.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/applicationsets.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/appprojects.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/gatewayclasses.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/gateways.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/grpcroutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/httproutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/referencegrants.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/tcproutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/tlsroutes.gateway.networking.k8s.io created
customresourcedefinition.apiextensions.k8s.io/udproutes.gateway.networking.k8s.io created`
Again, a little out-of-band to prevent any errors / issues.
# create namespace outside of workload to prevent deletion on cleanup
kubectl create ns argocd
# install argoCD, manually, the first time
# dry-run with `kubectl kustomize workloads/01-argocd/config/base/`
kubectl kustomize workloads/01-argocd/config/base/ | kubectl apply -f -
Wait for pods to be running: while true; do kubectl get pods -n argocd; sleep 5; done
Expected output:
NAME READY STATUS RESTARTS AGE
argocd-application-controller-0 1/1 Running 0 105s
argocd-applicationset-controller-68575b9586-h9pdl 1/1 Running 0 98s
argocd-bouncer-gglvp 0/1 Completed 0 105s
argocd-dex-server-5f7559bf46-rb9nv 1/1 Running 0 98s
argocd-notifications-controller-56b9589db6-442zl 1/1 Running 0 90s
argocd-redis-566dfdccd6-k9qgb 1/1 Running 0 98s
argocd-repo-server-7c5bc489dc-zvm5g 1/1 Running 0 98s
argocd-server-f8f577d6d-fk46g 2/2 Running 0 98s
Getting the admin
password and logging in:
kubectl -n argocd get secrets argocd-initial-admin-secret \
-o jsonpath='{.data.password}' | base64 -d
kubectl port-forward svc/argocd-server -n argocd 8080:80
# LOGIN
# User: admin
# Password: (From above get secret)
Add the ApplicationSets to create apps and reinstall ArgoCD to be self-management.
IMPORTANT NOTE: This will install any applications that are represented by "in-cluster" (in our case: https://github.com/jimangel/cd/blob/main/clusters/argocd-us-tx-local-gpu-box.yaml). To be specific, this install anything under "workloads:" + "deployed".
kubectl apply -f workloads/01-argocd/applicationset/argocd.yaml
From this point forward, adding ApplicationSets within the workloads/* directory are discovered by ArgoCD and all changes should be done via git.
WHY? for certs / dns
- If not on GCP, you can BYO-workload-id via: https://www.jimangel.io/posts/gcp-workload-id-baremetal-kubernetes/
- NOTEBOOK: -gcloud + checks + create permissions plus yaml...
- If on GCP, follow the docs: https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
I'm using a metal host, so I'll do the bare metal setup.
NOTE: This topic can get complex, but there are no requirements for which GCP project this belongs in. We're creating a pool of identities and the subsequent BINDINGS we create are what is important.
We're essentially saying, "I trust this Kubernetes API server to mint tokens (service accounts) that we (cluster operators) can bind GCP IAM permissions to." Such as individual bindings to Google secrets.
DO WE MANUALLY CREATE THE LB WILDCARD CERTS? If so, cert manager is not a requirement...
How can we manage certificates?
Our main goal is to allow the cluster that hosts argoCD the ability to read GCP secrets, access GCP resources, and/or optionally deploy apps to other GCP clusters using GCP GKE Connect.
The first use is for cert-manager and uses Google DNS for certificate creation.
ONLY FIX CERT MANAGER / ATTRIBUTES? // ONLY FIX ARGOCD reaching? // whats the main use case here?
[IMG]
At this point you have a cluster with Gateway API CRDs and ARGO CD installed. Next, setup GitHub app for interacting with GitHub.
IMPORTANT: Grant the app "Commit statuses (read and write)" and "Contents (read only)"
Install it on your repo:
Configure Argo:
# https://github.com/settings/apps
export REPO_NAME="https://github.com/jimangel/cd.git"
export GH_APP_ID=########
# go to https://github.com/settings/installations and check the URL in "configure"
# might be different for orgs
export GH_INSTALL_ID=#
export PRIV_KEY_PATH="$HOME/Downloads/argocdbot-512.2023-10-06.private-key.pem"
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
name: github-app-repo
namespace: argocd
labels:
argocd.argoproj.io/secret-type: repository
stringData:
type: git
url: "${REPO_NAME}"
githubAppID: "${GH_APP_ID}"
githubAppInstallationID: "${GH_INSTALL_ID}"
githubAppPrivateKey: |
$(cat $PRIV_KEY_PATH | sed 's/^/ /')
EOF
Note: When using GitHub Apps, always use an HTTP URL for "repoURL" (to match here)
Expose /api/webhook (details in "rotate secrets" script for local / ngrok)
# Create webhook secret
echo -ne '123HA$H123' | gcloud secrets create gh-webhook-string --data-file=-
echo -ne '{"password2":"itsasecret2"}' | gcloud secrets create gh-webhook-string --data-file=-
export PROJECT_ID=YOUR_PROJECT
gcloud secrets add-iam-policy-binding gh-webhook-string --member "serviceAccount:cloudydemo-secret-admin@$PROJECT_ID.iam.gserviceaccount.com" --role "roles/secretmanager.secretAccessor"
# create external secrets to append
cat <<EOF | kubectl apply -f -
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: github-webhook-password
namespace: argocd
spec:
secretStoreRef:
kind: ClusterSecretStore
name: gcp-backend
target:
name: argocd-secret
creationPolicy: Merge
data:
- secretKey: "webhook.github.secret"
remoteRef:
key: gh-webhook-string
EOF
# validate
kubectl get secret argocd-secret -n argocd -oyaml
export GITHUB_HOOK_URL="https://external-domain.example.com"
# returns 400 because not a "real" webhook event
curl -s -o /dev/null -w "%{http_code}" -d '{}' ${GITHUB_HOOK_URL}
# returns 200 because it's a "real" webhook event (ping)
curl -s -o /dev/null -w "%{http_code}" -H 'X-GitHub-Event: ping' -d '{}' ${GITHUB_HOOK_URL}
# the first one produces no logs, the second one produces "Ignoring webhook event" in the logs of the argocd-server.
# reboot argocd
Workload Name | Sync Wave | Sync Type | Core Config | Custom Shared Config | README.md |
---|---|---|---|---|---|
argocd | 0 | kustomize | base/ | overlays/ (Metal, GKE) | link |
argorollouts | 0 | kustomize-remote | base/ | n/a | link |
certmanager | 0 | helm-remote | (Helm URL in AppSet) | n/a | link |
echo-server | 0 | helm-local | local-helm/ | n/a | link |
external-secrets-operator | 0 | helm-remote | (Helm URL in AppSet) | n/a | link |
gatekeeper | 0 | helm-remote-plus-yaml-local | (Helm URL in AppSet), raw/ (yaml) | n/a | link |
gateway-api-crds | -1 | kustomize-remote | base/ | overlays/ (Staging) | link |
gateway-api-istio-ingress | 0 | kustomize | (GitHub URL in AppSet) | overlays/ (Staging) | link |
oss-istio | 0 | kustomize-nested-helm-remote | base/ | overlays/ (Not In Use) | link |
kube-prometheus-stack | 0 | helm-remote-plus-helm-local | (Helm URL in AppSet) | environments/ (Prod / Staging), (addon-chart-gateway/ & helm-values/) | link |
metrics-server | 0 | helm-remote | (Helm URL in AppSet) | n/a | link |
- IF YOU USE IT, LABEL IT (label application CRDs with the selectors used so it's easy to debug ... for filtering, searching, etc)
- flat structure for easy debugging (clusters or workloads)
- clusters opt-in to workloads
- the config for workloads should be similar in structure
- kustomize have base / overlays
- helm has values files for env
- do yourself a favor, document how to test local workloads in the folder (README.md).
- Include why's, hows, and whats
- Maybe include a bash one-liner to show what values are used (grep + bash on the cluster struct or appset)
Best approach is creating a lower environment and playing "wide open"
- All workloads include
README.md
with dry-run example - Group like applications and objects in single workloads
- AppSet required, config preferred, environment subsets only if required
- each workload has a defined "sync type" for future reference
# TODO: debugging workloads (flowchart)
# helpful to look at the appset controller logs (# of apps synced)
kubectl -n argocd logs -f -l app.kubernetes.io/name=argocd-applicationset-controller
# helpful to get the most recent status of events (maybe template issue.)
kubectl get appset self-managed-argocd -n argocd -o yaml
Detected changes to resource applications.argoproj.io which is currently being deleted.
From https://stackoverflow.com/questions/71164538/argocd-application-resource-stuck-at-deletion
# kubectl patch crd applications.argoproj.io -p '{"metadata": {"finalizers": null}}' --type merge
kubectl delete -k "https://github.com/argoproj/argo-cd/manifests/crds?ref=v2.13.0-rc1"
kubectl apply -k "https://github.com/argoproj/argo-cd/manifests/crds?ref=v2.13.0-rc1"
Maybe create 2 flow charts (creating workloads, debugging workloads)
- leave as much "defaults" as you can (copy upstream repos with regular sync / diff audit)
- try to avoid unused files or resources
- try to avoid confusing names
- try to avoid deviating too far from the standard ops (like creating too many selectors or assuming new ways to selects apps)
- important to keep it as simple as possible
- unique cluster env in dir/yaml
- shared cluster(s) env in workloads/*/config
- unique application configuration in workloads/*/ApplicationSets
- by design, ApplicationSets should be horizontally scalable (adding new clusters can opt-in and inherit accordingly)
- pay attention to resource management (prune vs. not + appset vs app pruning + secrets)
- Include a bit about ignoreDifferences and when it comes into play.
Since the first install we "bootstrapped," adding new clusters is a matter of:
- Create the config file yaml in
cluster/
- Add the connection information to a secret in ArgoCD
- Sync!
export Project ID
# create fleet service account for interacting with the clusters
gcloud iam service-accounts create argocd-fleet-admin --project $PROJECT_ID
gcloud projects add-iam-policy-binding $PROJECT_ID --member "serviceAccount:argocd-fleet-admin@${PROJECT_ID}.iam.gserviceaccount.com" --role roles/gkehub.gatewayEditor
TBD (one:one / one:many / take-what-you-need)
I ran into an issue where the etcd didn't match the state / status of ArgoCD. I ultimately had to remove finalizers from resources and reapply...
kubectl edit crd applications.argoproj.io
# reapply arogcd
Security branch settings / lint requirements:
- remove charts created folder via .gitignore
- improve workload selection / declaration + helm values (argoproj/argo-cd#11982 - allow selector + nested helm workload values )
- create secrets rotation / creation tooling
- update external secrets to use helm values (simplify setup): https://external-secrets.io/v0.7.0/api/secretstore/
- Document the service accounts used in GCP / dns a bit better (gcloud iam service-accounts create cloudydemo-dns01-solver + GCP GSM)
- Move certmanager secret to git (
kubectl -n cert-manager create secret generic clouddns-dns01-solver-svc-acct --from-file=$HOME/key.json
)- Add additional certmanager resources to workload (cluster issuer, wildcard requests, etc)
- move
environments
config for appset into "config" directory and update kube-prometheus-stack - rename or append 01, 02 to high priority workloads / sync wave (like argoCD -> secrets -> istio -> gateway -> certs etc)