traefiksreproductionkubernetes

Why I Run Traefik With Static Config in Production

Why I Run Traefik With Static Config in Production

Dynamic config looks flexible on the slide deck, but every outage I've traced in the last year came from a CRD reconcile loop. So I'm defaulting to static config now — and this post walks through why.

The pitch, and its quiet cost

Dynamic providers let Traefik discover services the moment they appear: label a deployment, and the IngressRoute materialises. Beautiful in staging; brittle in prod. Every controller you wire up adds:

  • A reconcile loop that cannot be dry-run from the operator's laptop.
  • A fan-in of changes from every namespace a serviceaccount can see.
  • A surface for "the operator typed the wrong label" to become "every request 502'd for 90 seconds."

The pattern I've landed on

Static traefik.yml + a file-provider watching /etc/traefik/dynamic/*.yml. Every route for every service is in a YAML file that lives next to the systemd unit of the service that owns it. Changes are:

  1. Edit the YAML file in the repo that owns the service.
  2. Open a PR. The diff is reviewable — each file is scoped to one host.
  3. make deploy rsyncs the YAML onto the file provider directory. Traefik hot-reloads.
http:
  routers:
    api:
      rule: Host(`api.example.com`) && PathPrefix(`/api/v1`)
      service: api
      tls:
        certResolver: letsencrypt
  services:
    api:
      loadBalancer:
        servers:
          - url: "http://127.0.0.1:8200"

What I keep dynamic

Two things earn dynamic config: cert rotation (ACME via Let's Encrypt) and rate-limit config for the auth endpoint. Everything else is declarative YAML under git.

The concrete wins

  • A route review is a file diff, not a controller-state diff.
  • The on-call pager went quiet for ingress-shaped incidents.
  • Rolling back a bad route is git revert && make deploy — no kubectl rollout undo required.

Static config isn't sexy. It is, however, readable at 3am.