Deployment

Kubernetes deployment guide for ai-agents. The runtime is delivered as three Helm charts plus shared infra (CRDs, RBAC, secrets) installed via FluxCD.

Prerequisites

Kubernetes cluster with FluxCD
Traefik ingress controller
Authelia identity provider
External Secrets Operator (or SOPS-encrypted secrets)
Longhorn storage (for the workspace + uploads PVC)

Namespace

All workloads run in ai-agents.

apiVersion: v1
kind: Namespace
metadata:
  name: ai-agents

Helm Charts (`charts/`)

Chart	Templates	Image	Purpose
`ai-agents-main`	API Deployment, App Deployment, Services, Ingress, ConfigMap, RBAC, NetworkPolicies, PVC, Grafana dashboard, ServiceMonitor, CronJobs (`cronjob-cleanup`, `cronjob-token-refresh`)	`apps.ai-agents` + `apps.ai-agents/app`	Express API + operator + Next.js frontend
`ai-agents-processor`	Deployment, ConfigMap, RBAC, NetworkPolicy, PVC	`apps.ai-agents/processor`	Operator-only reconcile loop, can scale independently
`ai-agents-webhook`	Deployment, Service, Ingress, ConfigMap, RBAC, NetworkPolicy	`apps.ai-agents/webhook`	Webhook receiver — auto-instantiated per `WebhookSource`

Webhook deployments are normally provisioned automatically: when a WebhookSource CR is created, the API server's WebhookSourceWatcher reconciles a Deployment + Service + Ingress for that source using the same image tag as the running API pod. The standalone ai-agents-webhook chart is available for manual deployments.

CRDs and Cluster Infra (`k8s/infra/`)

CRDs live in k8s/infra/ and are installed cluster-wide ahead of the workloads:

k8s/infra/
├── crd-inferencerequest.yaml
├── crd-aimodel.yaml
├── crd-aiagent.yaml
├── crd-aiconfig.yaml
├── crd-aiinstruction.yaml
├── crd-webhooksource.yaml
├── externalsecret-*.yaml
├── secretstore.yaml
└── networkpolicy-*.yaml

CRD group/version: labrats.work/v1alpha1.

Kustomize Layout (`k8s/`)

k8s/
├── base/                # Base ai-agents-main resources (Deployment-style)
├── infra/               # CRDs, ExternalSecrets, SecretStore, NetworkPolicies
├── instructions/        # AiInstruction CRs (global-* and local-*)
├── releases/            # Flux HelmRelease objects for the three charts
├── envs/
│   ├── dev/
│   └── prod/
└── overlays/
    └── production/

Shipped instructions:

Type	Name	Priority	Focus
global	`global-code-quality`	10	Naming, clean code, TS patterns
global	`global-commit-format`	20	Commit conventions
global	`global-security`	30	Security best practices
global	`global-documentation`	40	Documentation requirements
global	`global-testing`	50	Testing standards
global	`global-repo-compliance`	60	Repository structure
local	`local-developer-workflow`	—	developer / codex-developer
local	`local-reviewer-checklist`	—	reviewer / codex-reviewer
local	`local-docs-standards`	—	docs
local	`local-architect-design`	—	architect
local	`local-ops-runbook`	—	ops
local	`local-security-audit`	—	security
local	`local-triage-process`	—	triage
local	`local-testing-strategy`	—	testing

Required Secrets

The API server reads the following keys from the deployment-level Secret (managed via External Secrets Operator or SOPS):

Key	Purpose
`OIDC_CLIENT_SECRET`	Authelia OIDC client secret
`JWT_SECRET`	HS256 signing key for the `auth_token` cookie
`SUBMIT_API_KEYS`	Comma-separated API keys for fast-path submission auth

Backend credential secrets are optional and referenced per-model by AiModel.spec.credentialsRef (used only by the claude executor). Caller identity flows via the IR apiKeyRef (an API-key Secret); GitHub uses the App credentials in github-app-credentials. There is no AiAccount pool (removed in spec 0003).

Configuration (env vars)

Each process reads from src/config.ts:

Var	Default	Used by
`PORT`	`3001`	all
`LOG_LEVEL`	`info`	all
`WORKER_CONCURRENCY`	`5`	API, Processor
`APP_VERSION`	`dev`	API
`WORKSPACE_DIR`	`/workspace/jobs`	API
`ACCOUNTS_DIR`	`/workspace/.accounts`	API
`UPLOADS_DIR`	`/workspace/uploads`	API, Processor
`UPLOADS_PVC_NAME`	—	API, Processor (mounts onto executor Jobs)
`EXECUTOR_IMAGE`	repo-pinned tag	API, Processor (executor Job spec)
`EXECUTOR_SERVICE_ACCOUNT`	—	API, Processor (executor Jobs)
`JWT_SECRET`	—	API
`OIDC_ISSUER`, `OIDC_CLIENT_ID`, `OIDC_CLIENT_SECRET`, `OIDC_REDIRECT_URI`	—	API
`PROMETHEUS_URL`	—	API (`/api/stats/timeseries`)
`WEBHOOK_SOURCE_NAME`	—	Webhook (required)

Ingress

The unified deployment uses two Ingresses, split by auth requirement:

Public (`ai-agents-public`)

No middleware. Used by external services and probes.

Path	Backend
`/api/jobs/submit`	API (`:3001`)
`/api/events/ingest`	API
`/api/webhooks/*`	API or per-source webhook pod
`/health`, `/ready`, `/metrics`	API

Protected (`ai-agents-protected`)

Authelia ForwardAuth via:

traefik.ingress.kubernetes.io/router.middlewares: authelia-authelia@kubernetescrd

Path	Backend
`/api/*`	API (`:3001`)
`/*`	App (`:3000`)

The cross-namespace middleware reference requires allowCrossNamespace: true on Traefik's Kubernetes CRD provider.

Network Policies

Strict pod-to-pod and egress rules:

API pod ingress: Traefik, App pod, plus client namespaces (e.g. github-ai-agents, reader-ai-agents).
API pod egress: K8s API (CRDs/Jobs/Secrets), DNS, HTTPS (OIDC, GitHub, callbacks).
App pod ingress: Traefik only.
App pod egress: API pod, DNS.
Executor Jobs: egress to the K8s API (credential write-back), DNS, HTTPS to the AI provider, plus optional GitHub.

Storage

The shared workspace PVC is ReadWriteMany (or ReadWriteOnce in single-replica setups). Layout:

/workspace/
├── .accounts/    # Account credential files (legacy path)
├── jobs/         # Per-job workspaces (ephemeral, cleanup CronJob below)
└── uploads/      # PDF and workspace uploads — mounted into Executor Jobs read-only

UPLOADS_PVC_NAME is mounted into every executor Job at /uploads.

An optional models-cache PVC (pvc-models-cache.yaml) can be mounted at /models on executor Jobs for local inference providers. When AiConfig.spec.providers[provider].modelUrl is set, the job executor adds an init container that downloads the GGUF model to the cache PVC (only if not already present). This avoids re-downloading large models for each job.

CronJobs

Shipped from ai-agents-main:

CronJob	Purpose
`cronjob-cleanup`	Removes stale workspace directories that survived crashes.
`cronjob-token-refresh`	Triggers an OAuth refresh on idle accounts to prevent token expiry.

Health Probes

All three processes expose the same liveness/readiness contract:

Liveness: GET /health → 200 {"status":"ok"}
Readiness: GET /ready → 200 {"status":"ok"}

Probes have no external dependencies — they confirm the process is up. Operator readiness is implicit (the informer caches populate at boot).

Security Context

Standard hardened defaults:

securityContext:
  runAsNonRoot: true
  runAsUser: 1001
  runAsGroup: 1001
  fsGroup: 1001
  allowPrivilegeEscalation: false
  privileged: false
  capabilities: { drop: [ALL] }

FluxCD Integration

The flux repo (labrats.work.hetzner.cluster.flux) references the three HelmReleases plus the kustomization at k8s/overlays/production. Image automation watches the latest tag in GHCR and rolls deployments forward when image.yml publishes a new tag.

CI/CD Workflows (`.github/workflows/`)

Workflow	Trigger	Action
`image.yml`	Push/PR on `src/`, `app/`, `executor/`, Dockerfiles; workflow_dispatch	Cut a semver release (no `v` prefix), then build & push 8 images via matrix: API, App, Executor-Claude, Executor-Codex, Executor-Bonsai-8B, Executor-Llama, Webhook, Processor.
`build.yml`	Push / PR to main	TypeScript lint, test, and build check. Node 20.
`pr-checks.yml`	PR	Branch-name (type/description) and PR-title (conventional commit) validation plus required labels.
`pr-no-secrets.yml`	PR	Secret scan for API keys, tokens, credentials in changed files.
`docs-update.yml`	Daily 03:15 UTC + manual	Submit a docs-update job to ai-agents that opens / updates `docs/update-latest`.
`gemma-inference.yml`	Manual dispatch	Submit a Gemma inference job.
`qwen-inference.yml`	Manual dispatch	Submit a Qwen inference job.
`llama-cpp-builder.yml`	Manual + weekly (Mon 02:00 UTC)	Build upstream llama.cpp binary for gemma/qwen executors.

All workflows run on k8s-hetzner-arc self-hosted runners; images push to ghcr.io/labrats-work/.