Deployment
Kubernetes deployment guide for ai-agents. The runtime is delivered as three Helm charts plus shared infra (CRDs, RBAC, secrets) installed via FluxCD.
Prerequisites
- Kubernetes cluster with FluxCD
- Traefik ingress controller
- Authelia identity provider
- External Secrets Operator (or SOPS-encrypted secrets)
- Longhorn storage (for the workspace + uploads PVC)
Namespace
All workloads run in ai-agents.
apiVersion: v1
kind: Namespace
metadata:
name: ai-agents
Helm Charts (charts/)
| Chart | Templates | Image | Purpose |
|---|---|---|---|
ai-agents-main | API Deployment, App Deployment, Services, Ingress, ConfigMap, RBAC, NetworkPolicies, PVC, Grafana dashboard, ServiceMonitor, CronJobs (cronjob-cleanup, cronjob-token-refresh) | apps.ai-agents + apps.ai-agents/app | Express API + operator + Next.js frontend |
ai-agents-processor | Deployment, ConfigMap, RBAC, NetworkPolicy, PVC | apps.ai-agents/processor | Operator-only reconcile loop, can scale independently |
ai-agents-webhook | Deployment, Service, Ingress, ConfigMap, RBAC, NetworkPolicy | apps.ai-agents/webhook | Webhook receiver — auto-instantiated per WebhookSource |
Webhook deployments are normally provisioned automatically: when a WebhookSource CR is created, the API server's WebhookSourceWatcher reconciles a Deployment + Service + Ingress for that source using the same image tag as the running API pod. The standalone ai-agents-webhook chart is available for manual deployments.
CRDs and Cluster Infra (k8s/infra/)
CRDs live in k8s/infra/ and are installed cluster-wide ahead of the workloads:
k8s/infra/
├── crd-inferencerequest.yaml
├── crd-aimodel.yaml
├── crd-aiagent.yaml
├── crd-aiconfig.yaml
├── crd-aiinstruction.yaml
├── crd-webhooksource.yaml
├── externalsecret-*.yaml
├── secretstore.yaml
└── networkpolicy-*.yaml
CRD group/version: labrats.work/v1alpha1.
Kustomize Layout (k8s/)
k8s/
├── base/ # Base ai-agents-main resources (Deployment-style)
├── infra/ # CRDs, ExternalSecrets, SecretStore, NetworkPolicies
├── instructions/ # AiInstruction CRs (global-* and local-*)
├── releases/ # Flux HelmRelease objects for the three charts
├── envs/
│ ├── dev/
│ └── prod/
└── overlays/
└── production/
Shipped instructions:
| Type | Name | Priority | Focus |
|---|---|---|---|
| global | global-code-quality | 10 | Naming, clean code, TS patterns |
| global | global-commit-format | 20 | Commit conventions |
| global | global-security | 30 | Security best practices |
| global | global-documentation | 40 | Documentation requirements |
| global | global-testing | 50 | Testing standards |
| global | global-repo-compliance | 60 | Repository structure |
| local | local-developer-workflow | — | developer / codex-developer |
| local | local-reviewer-checklist | — | reviewer / codex-reviewer |
| local | local-docs-standards | — | docs |
| local | local-architect-design | — | architect |
| local | local-ops-runbook | — | ops |
| local | local-security-audit | — | security |
| local | local-triage-process | — | triage |
| local | local-testing-strategy | — | testing |
Required Secrets
The API server reads the following keys from the deployment-level Secret (managed via External Secrets Operator or SOPS):
| Key | Purpose |
|---|---|
OIDC_CLIENT_SECRET | Authelia OIDC client secret |
JWT_SECRET | HS256 signing key for the auth_token cookie |
SUBMIT_API_KEYS | Comma-separated API keys for fast-path submission auth |
Backend credential secrets are optional and referenced per-model by AiModel.spec.credentialsRef (used only by the claude executor). Caller identity flows via the IR apiKeyRef (an API-key Secret); GitHub uses the App credentials in github-app-credentials. There is no AiAccount pool (removed in spec 0003).
Configuration (env vars)
Each process reads from src/config.ts:
| Var | Default | Used by |
|---|---|---|
PORT | 3001 | all |
LOG_LEVEL | info | all |
WORKER_CONCURRENCY | 5 | API, Processor |
APP_VERSION | dev | API |
WORKSPACE_DIR | /workspace/jobs | API |
ACCOUNTS_DIR | /workspace/.accounts | API |
UPLOADS_DIR | /workspace/uploads | API, Processor |
UPLOADS_PVC_NAME | — | API, Processor (mounts onto executor Jobs) |
EXECUTOR_IMAGE | repo-pinned tag | API, Processor (executor Job spec) |
EXECUTOR_SERVICE_ACCOUNT | — | API, Processor (executor Jobs) |
JWT_SECRET | — | API |
OIDC_ISSUER, OIDC_CLIENT_ID, OIDC_CLIENT_SECRET, OIDC_REDIRECT_URI | — | API |
PROMETHEUS_URL | — | API (/api/stats/timeseries) |
WEBHOOK_SOURCE_NAME | — | Webhook (required) |
Ingress
The unified deployment uses two Ingresses, split by auth requirement:
Public (ai-agents-public)
No middleware. Used by external services and probes.
| Path | Backend |
|---|---|
/api/jobs/submit | API (:3001) |
/api/events/ingest | API |
/api/webhooks/* | API or per-source webhook pod |
/health, /ready, /metrics | API |
Protected (ai-agents-protected)
Authelia ForwardAuth via:
traefik.ingress.kubernetes.io/router.middlewares: authelia-authelia@kubernetescrd
| Path | Backend |
|---|---|
/api/* | API (:3001) |
/* | App (:3000) |
The cross-namespace middleware reference requires allowCrossNamespace: true on Traefik's Kubernetes CRD provider.
Network Policies
Strict pod-to-pod and egress rules:
- API pod ingress: Traefik, App pod, plus client namespaces (e.g.
github-ai-agents,reader-ai-agents). - API pod egress: K8s API (CRDs/Jobs/Secrets), DNS, HTTPS (OIDC, GitHub, callbacks).
- App pod ingress: Traefik only.
- App pod egress: API pod, DNS.
- Executor Jobs: egress to the K8s API (credential write-back), DNS, HTTPS to the AI provider, plus optional GitHub.
Storage
The shared workspace PVC is ReadWriteMany (or ReadWriteOnce in single-replica setups). Layout:
/workspace/
├── .accounts/ # Account credential files (legacy path)
├── jobs/ # Per-job workspaces (ephemeral, cleanup CronJob below)
└── uploads/ # PDF and workspace uploads — mounted into Executor Jobs read-only
UPLOADS_PVC_NAME is mounted into every executor Job at /uploads.
An optional models-cache PVC (pvc-models-cache.yaml) can be mounted at /models on executor Jobs for local inference providers. When AiConfig.spec.providers[provider].modelUrl is set, the job executor adds an init container that downloads the GGUF model to the cache PVC (only if not already present). This avoids re-downloading large models for each job.
CronJobs
Shipped from ai-agents-main:
| CronJob | Purpose |
|---|---|
cronjob-cleanup | Removes stale workspace directories that survived crashes. |
cronjob-token-refresh | Triggers an OAuth refresh on idle accounts to prevent token expiry. |
Health Probes
All three processes expose the same liveness/readiness contract:
- Liveness:
GET /health→200 {"status":"ok"} - Readiness:
GET /ready→200 {"status":"ok"}
Probes have no external dependencies — they confirm the process is up. Operator readiness is implicit (the informer caches populate at boot).
Security Context
Standard hardened defaults:
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
allowPrivilegeEscalation: false
privileged: false
capabilities: { drop: [ALL] }
FluxCD Integration
The flux repo (labrats.work.hetzner.cluster.flux) references the three HelmReleases plus the kustomization at k8s/overlays/production. Image automation watches the latest tag in GHCR and rolls deployments forward when image.yml publishes a new tag.
CI/CD Workflows (.github/workflows/)
| Workflow | Trigger | Action |
|---|---|---|
image.yml | Push/PR on src/, app/, executor/, Dockerfiles; workflow_dispatch | Cut a semver release (no v prefix), then build & push 8 images via matrix: API, App, Executor-Claude, Executor-Codex, Executor-Bonsai-8B, Executor-Llama, Webhook, Processor. |
build.yml | Push / PR to main | TypeScript lint, test, and build check. Node 20. |
pr-checks.yml | PR | Branch-name (type/description) and PR-title (conventional commit) validation plus required labels. |
pr-no-secrets.yml | PR | Secret scan for API keys, tokens, credentials in changed files. |
docs-update.yml | Daily 03:15 UTC + manual | Submit a docs-update job to ai-agents that opens / updates docs/update-latest. |
gemma-inference.yml | Manual dispatch | Submit a Gemma inference job. |
qwen-inference.yml | Manual dispatch | Submit a Qwen inference job. |
llama-cpp-builder.yml | Manual + weekly (Mon 02:00 UTC) | Build upstream llama.cpp binary for gemma/qwen executors. |
All workflows run on k8s-hetzner-arc self-hosted runners; images push to ghcr.io/labrats-work/.