Skip to main content

Command Palette

Search for a command to run...

Push to GitHub, Deployed to Kubernetes: CI/CD and Canary Rollouts on Raspberry Pi

Updated
7 min read
Push to GitHub, Deployed to Kubernetes: CI/CD and Canary Rollouts on Raspberry Pi
O

DevOps Engineer with a proven track record of streamlining software development and delivery processes. Skilled in automation, configuration management, and continuous integration and delivery (CI/CD), with expertise in cloud infrastructure and containerization technologies. Possess strong communication and collaboration skills, able to work effectively across development, operations, and business teams to achieve common goals. Dedicated to staying current with the latest technologies and tools in the DevOps field to drive continuous improvement and innovation.

Previously, I built a GitOps pipeline where every change flows through Git. This week, I closed the loop: code push to GitHub triggers a CI pipeline, the image gets scanned for vulnerabilities, lands in a container registry, and Argo Rollouts delivers it to the cluster using canary traffic splitting — all without touching kubectl.

If you're coming in fresh, start with the GitOps article. This post picks up where that left off.

Forking Podinfo: owning the image

Podinfo was already running in my cluster as the demo workload, deployed via the upstream Helm chart. But the upstream chart pulls Stefanprodan's image from GHCR — I don't control when it changes, I can't attach a CI pipeline to it, and I can't scan it before it hits the cluster.

So I forked stefanprodan/podinfo to charliepoker/podinfo and pointed the Helm values at my own GHCR image. From this point, every push to my fork triggers a build, and ArgoCD only deploys images I've built and scanned myself.

The CI pipeline

The workflow does three things: build a multi-architecture container image (amd64 + arm64), scan it with Trivy, and push to GHCR. The order matters. Trivy scans the image between the build and the push — if the scan finds HIGH or CRITICAL vulnerabilities, the pipeline fails, and nothing gets published.

The first successful run took almost 15 minutes. Nearly all of that was the arm64 build step — GitHub Actions runners are amd64, so building an arm64 image means QEMU emulation, and QEMU is slow.

Cross-compiling: 15 minutes to 2 minutes

The fix was cross-compilation. Instead of emulating an arm64 environment to compile Go code, I used Docker Buildx's --platform argument and Go's built-in cross-compile support (GOOS=linux GOARCH=arm64). The Go compiler is fast at cross-compiling natively; QEMU is not.

The result: the same pipeline went from 15 minutes down to under 3.

The Trivy gate

A CI pipeline that only scans when you remember to check the results isn't a gate. It's a suggestion. I wanted the pipeline to fail — red X, blocked push, no image in the registry — when a vulnerability exists.

Proving this was harder than I expected. My first instinct was to use an old Alpine base image, thinking end-of-life images would be full of known CVEs. They're not. Trivy has no advisory data for unsupported distros, so alpine:3.14 scans completely clean. Same with ubuntu:20.04. EOL doesn't mean vulnerable; it means unmaintained, which is a different problem that Trivy doesn't catch.

I also tried adding a requirements.txt with a known-vulnerable Python package. Nothing. Trivy's image scanner looks at OS packages and compiled Go binaries — it doesn't parse lockfiles sitting loose in the filesystem.

What actually worked: compiling a tiny Go binary that imports github.com/dgrijalva/jwt-go@v3.2.0, which has CVE-2020-26160 (an access restriction bypass, severity HIGH). Trivy's gobinary analyzer picks up the vulnerability from the compiled binary, flags it, and the pipeline exits with code 1.

That red X is the whole point. The image never reaches GHCR. Once I'd captured the proof, I removed the test binary and pushed a clean Dockerfile. The next run passed, and the gate was proven both ways.

ArgoCD Image Updater: closing the automation gap

At this point, a new image lands in GHCR every time I push code. But someone still needs to update the image tag in the Helm values file and push that change to Gitea. That's what Image Updater automates.

The current version (v1.2.1) uses a CRD-based approach — you create an ImageUpdater custom resource that tells the controller which image to watch, which tag pattern to match, and where to write the update. I configured it to watch ghcr.io/charliepoker/podinfo for tags matching sha-*, and write back to the Gitea repo via Git.

The first time a new image tag appeared in GHCR, the Image Updater picked it up, generated a commit, and pushed it directly to Gitea. The commit author shows as argocd-image-updater and the message reads "build: automatic update of podinfo" — a bot commit that updates the image tag in values.yaml and nothing else.

ArgoCD sees the new commit, syncs, and the cluster updates. Push code → CI builds → Trivy scans → GHCR stores → Image Updater commits → ArgoCD deploys. No human in the loop after the initial push.

Argo Rollouts: canary with Gateway API traffic splitting

The last piece is progressive delivery. Instead of flipping 100% of traffic to the new version instantly, Argo Rollouts gradually shifts traffic using the Gateway API. 10% to the canary, pause, observe. 50%, pause, observe. Then 100% and promote.

Setting this up broke in about eight different ways. Most of them weren't Argo Rollouts bugs — they were integration-boundary failures between the Rollout controller, the Gateway API plugin, ArgoCD's multi-source sync, and the existing cluster state.

The biggest one: the Gateway API plugin needs RBAC access to ConfigMaps in the workload namespace, not just HTTPRoutes. Without it, the controller loops thousands of times, with configmaps is forbidden and never scales a single pod. The docs' example RBAC uses */* ("for test environments only"), and when I narrowed it to just HTTPRoutes, I missed the ConfigMap dependency entirely. Pods sat at zero for almost a day before I found it in the controller logs.

Other highlights: the podinfo Helm chart has no deployment.enabled key (I guessed wrong and wasted a render cycle), multi-source ArgoCD apps with ref: values only sync the values file — not sibling manifests in the same directory, and the first rollout always skips the canary steps because there's no stable baseline to canary against.

Once everything was wired, the canary worked exactly as advertised. Push a colour change to trigger a new revision, and watch the weights shift in real time.

The proof is in the HTTPRoute weights and the browser. Refresh the page during the 50/50 pause, and you get a roughly even split between the two colours. The "Served by" line shows which pod answered — canary or stable. That's real traffic steering through the Gateway API, not just a counter incrementing.

The full pipeline

Here's the complete flow, from code push to canary delivery:

What I actually learned

Trivy doesn't catch what you think it catches. EOL base images scan clean. Lockfiles are ignored. The only reliable trigger for Go projects is a compiled binary with a vulnerable dependency baked in. If your gate doesn't fire when it should, it's not a gate.

Image Updater v1.2.1 is CRD-based. The internet is full of tutorials using annotations on the ArgoCD Application. Those don't work anymore. You need an ImageUpdater custom resource, and the schema has its own quirks — writeBackTarget is nested three levels deep under spec.writeBackConfig.gitConfig.writeBackTarget, not where you'd guess.

Multi-source sync is narrower than it looks. A ref: values source in a multi-source Application pulls one file, not a directory. If you add new manifests alongside the values file, they sit in Git forever unless you create a separate Application to sync them.

RBAC surface area is bigger than the feature name. The Gateway API plugin touches ConfigMaps, not just HTTPRoutes. Narrowing RBAC below the docs' recommended */* requires actually reading the plugin source or learning the hard way from controller logs.

The first rollout doesn't canary. With no stable ReplicaSet, there's nothing to split traffic against. The first deploy goes straight to 100%. The clean canary demo happens on the second change.

What's next

Prometheus and Grafana are going in for metrics. Loki for logs. Kyverno for policy enforcement.

Repo: github.com/charliepoker/homelab-k8s-gitops

Thank You!