Back to Blog
Zero-Downtime Deploys: A Blue–Green Playbook for Lean Teams

Zero-Downtime Deploys: A Blue–Green Playbook for Lean Teams

Parth Shah

Why Zero-Downtime Still Feels Like Black Magic

Nothing kills demo day like a 502. Yet many startups still “ship” by SSH’ing into a lone EC2 box. Blue–green deployments remove that single point of failure: run two identical environments, flip traffic when green is healthy, and your customers never see a blip.

TL;DR: Blue = current prod, Green = new version. Cut over when the health checks sing.


Architecture at a Glance

Blue-Green Deployment Architecture

Works the same on Cloud Run, Fly.io, or bare-metal K8s—just replace ALB with your load balancer.


1 | GitHub Actions Workflow

name: blue-green-deploy
on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-qemu-action@v3
      - uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ env.ECR_REGISTRY }}/todo:${{ github.sha }}

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: aws-actions/configure-aws-credentials@v4
      - name: "Helm upgrade to green"
        run: |
          helm upgrade todo chart/ \
            --set image.tag=${{ github.sha }} \
            --set color=green
      - name: "Smoke test green"
        run: ./scripts/smoke.sh
      - name: "Shift traffic"
        run: |
          kubectl patch ingress todo \
            -p '{"metadata":{"annotations":{"alb.ingress.kubernetes.io/target-group-attributes":"blue:0,green:100"}}}'

Key bits

  • color value triggers Helm’s Deployment.todo-green.
  • Smoke test hits /health on the green ingress hostname.
  • Traffic shift happens only if smoke passes + Prometheus SLOs are green.

2 | Database Schema Without Downtime

StepPatternReason
1Add new column nullableOld code ignores it.
2Deploy green reading + writing both columnsGreen stays backward-compatible.
3Migrate data in backgroundCran-job → 10 k rows/s.
4Drop column from blue & flipAll traffic on green; old column cold.
5Delete legacy columnAfter 1–2 weeks, post-logs confirm zero reads.

Pro tip: wrap each migration in a [transactionally idempotent] step; failures roll back gracefully.


3 | Feature Flags Keep You Honest

Blue–green solves infra rollbacks; feature flags solve product rollbacks.

  • Use GrowthBook or LaunchDarkly; DIY with a flags table if broke.
  • Default flag OFF in both blue & green → ramp once green is live.
  • Combine with userId % 100 < 1 to expose new features to 1 % traffic first.

4 | Observability: The Three Lights

LayerToolSLO
InfraPrometheus ➝ GrafanaPod restart rate < 2 / h
AppOpenTelemetry tracesp95 latency < 250 ms
UserSentry + RumJS error rate < 0.5 %

Automation: GitHub Action checks Alertmanager silence before traffic shift; any open severity=page alert aborts deploy.


5 | Cost Check (Side Projects Scale)

ResourceBlueGreenTotal / month
EKS t3.small (2 nodes each)$24$24$48
ALB$18
ECR storage$4
Grand total$70

Tight budget? Use Kubernetes horizontalPodAutoscaler to scale BLUE to zero replicas ~5 min after cutover.


6 | When Things Go Sideways

  1. Smoke test fails → Action cancels, green pods scale → 0, incident Slack ping.
  2. Post-cutover error rate spikeskubectl patch ingress … blue:100,green:0 (one-liner rollback).
  3. DB migration stuck → Pause traffic shift; blue keeps running.

Mean time to recovery in prod: < 3 min over nine incidents last year.


Highlight Reel

KPIBefore (classic in-place)After blue-green
Deploys / week214
User-visible errors during deploy7–12 504s0
Avg. rollback time38 min1.2 min

Final Checklist

  • Health probes on /ready & /live endpoints
  • Idempotent schema migrations with feature flags
  • Automated rollback cut-over script
  • Alert block gate in the CI workflow
  • Post-deploy smoke + synthetic user tests

Zero downtime is less about magic and more about boring, repeatable scripts. Bake them once, and shipping becomes as routine as "git push".

CI/CDDevOpsKubernetes