AI Agents

Deployment

Kubernetes deployment guide for ai-agents. The runtime is delivered as three Helm charts plus shared infra (CRDs, RBAC, secrets) installed via FluxCD.

Prerequisites

  • Kubernetes cluster with FluxCD
  • Traefik ingress controller
  • Authelia identity provider
  • External Secrets Operator (or SOPS-encrypted secrets)
  • Longhorn storage (for the workspace + uploads PVC)

Namespace

All workloads run in ai-agents.

apiVersion: v1
kind: Namespace
metadata:
  name: ai-agents

Helm Charts (charts/)

ChartTemplatesImagePurpose
ai-agents-mainAPI Deployment, App Deployment, Services, Ingress, ConfigMap, RBAC, NetworkPolicies, PVC, Grafana dashboard, ServiceMonitor, CronJobs (cronjob-cleanup, cronjob-token-refresh)apps.ai-agents + apps.ai-agents/appExpress API + operator + Next.js frontend
ai-agents-processorDeployment, ConfigMap, RBAC, NetworkPolicy, PVCapps.ai-agents/processorOperator-only reconcile loop, can scale independently
ai-agents-webhookDeployment, Service, Ingress, ConfigMap, RBAC, NetworkPolicyapps.ai-agents/webhookWebhook receiver — auto-instantiated per WebhookSource

Webhook deployments are normally provisioned automatically: when a WebhookSource CR is created, the API server's WebhookSourceWatcher reconciles a Deployment + Service + Ingress for that source using the same image tag as the running API pod. The standalone ai-agents-webhook chart is available for manual deployments.

CRDs and Cluster Infra (k8s/infra/)

CRDs live in k8s/infra/ and are installed cluster-wide ahead of the workloads:

k8s/infra/
├── crd-inferencerequest.yaml
├── crd-aimodel.yaml
├── crd-aiagent.yaml
├── crd-aiconfig.yaml
├── crd-aiinstruction.yaml
├── crd-webhooksource.yaml
├── externalsecret-*.yaml
├── secretstore.yaml
└── networkpolicy-*.yaml

CRD group/version: labrats.work/v1alpha1.

Kustomize Layout (k8s/)

k8s/
├── base/                # Base ai-agents-main resources (Deployment-style)
├── infra/               # CRDs, ExternalSecrets, SecretStore, NetworkPolicies
├── instructions/        # AiInstruction CRs (global-* and local-*)
├── releases/            # Flux HelmRelease objects for the three charts
├── envs/
│   ├── dev/
│   └── prod/
└── overlays/
    └── production/

Shipped instructions:

TypeNamePriorityFocus
globalglobal-code-quality10Naming, clean code, TS patterns
globalglobal-commit-format20Commit conventions
globalglobal-security30Security best practices
globalglobal-documentation40Documentation requirements
globalglobal-testing50Testing standards
globalglobal-repo-compliance60Repository structure
locallocal-developer-workflowdeveloper / codex-developer
locallocal-reviewer-checklistreviewer / codex-reviewer
locallocal-docs-standardsdocs
locallocal-architect-designarchitect
locallocal-ops-runbookops
locallocal-security-auditsecurity
locallocal-triage-processtriage
locallocal-testing-strategytesting

Required Secrets

The API server reads the following keys from the deployment-level Secret (managed via External Secrets Operator or SOPS):

KeyPurpose
OIDC_CLIENT_SECRETAuthelia OIDC client secret
JWT_SECRETHS256 signing key for the auth_token cookie
SUBMIT_API_KEYSComma-separated API keys for fast-path submission auth

Backend credential secrets are optional and referenced per-model by AiModel.spec.credentialsRef (used only by the claude executor). Caller identity flows via the IR apiKeyRef (an API-key Secret); GitHub uses the App credentials in github-app-credentials. There is no AiAccount pool (removed in spec 0003).

Configuration (env vars)

Each process reads from src/config.ts:

VarDefaultUsed by
PORT3001all
LOG_LEVELinfoall
WORKER_CONCURRENCY5API, Processor
APP_VERSIONdevAPI
WORKSPACE_DIR/workspace/jobsAPI
ACCOUNTS_DIR/workspace/.accountsAPI
UPLOADS_DIR/workspace/uploadsAPI, Processor
UPLOADS_PVC_NAMEAPI, Processor (mounts onto executor Jobs)
EXECUTOR_IMAGErepo-pinned tagAPI, Processor (executor Job spec)
EXECUTOR_SERVICE_ACCOUNTAPI, Processor (executor Jobs)
JWT_SECRETAPI
OIDC_ISSUER, OIDC_CLIENT_ID, OIDC_CLIENT_SECRET, OIDC_REDIRECT_URIAPI
PROMETHEUS_URLAPI (/api/stats/timeseries)
WEBHOOK_SOURCE_NAMEWebhook (required)

Ingress

The unified deployment uses two Ingresses, split by auth requirement:

Public (ai-agents-public)

No middleware. Used by external services and probes.

PathBackend
/api/jobs/submitAPI (:3001)
/api/events/ingestAPI
/api/webhooks/*API or per-source webhook pod
/health, /ready, /metricsAPI

Protected (ai-agents-protected)

Authelia ForwardAuth via:

traefik.ingress.kubernetes.io/router.middlewares: authelia-authelia@kubernetescrd
PathBackend
/api/*API (:3001)
/*App (:3000)

The cross-namespace middleware reference requires allowCrossNamespace: true on Traefik's Kubernetes CRD provider.

Network Policies

Strict pod-to-pod and egress rules:

  • API pod ingress: Traefik, App pod, plus client namespaces (e.g. github-ai-agents, reader-ai-agents).
  • API pod egress: K8s API (CRDs/Jobs/Secrets), DNS, HTTPS (OIDC, GitHub, callbacks).
  • App pod ingress: Traefik only.
  • App pod egress: API pod, DNS.
  • Executor Jobs: egress to the K8s API (credential write-back), DNS, HTTPS to the AI provider, plus optional GitHub.

Storage

The shared workspace PVC is ReadWriteMany (or ReadWriteOnce in single-replica setups). Layout:

/workspace/
├── .accounts/    # Account credential files (legacy path)
├── jobs/         # Per-job workspaces (ephemeral, cleanup CronJob below)
└── uploads/      # PDF and workspace uploads — mounted into Executor Jobs read-only

UPLOADS_PVC_NAME is mounted into every executor Job at /uploads.

An optional models-cache PVC (pvc-models-cache.yaml) can be mounted at /models on executor Jobs for local inference providers. When AiConfig.spec.providers[provider].modelUrl is set, the job executor adds an init container that downloads the GGUF model to the cache PVC (only if not already present). This avoids re-downloading large models for each job.

CronJobs

Shipped from ai-agents-main:

CronJobPurpose
cronjob-cleanupRemoves stale workspace directories that survived crashes.
cronjob-token-refreshTriggers an OAuth refresh on idle accounts to prevent token expiry.

Health Probes

All three processes expose the same liveness/readiness contract:

  • Liveness: GET /health200 {"status":"ok"}
  • Readiness: GET /ready200 {"status":"ok"}

Probes have no external dependencies — they confirm the process is up. Operator readiness is implicit (the informer caches populate at boot).

Security Context

Standard hardened defaults:

securityContext:
  runAsNonRoot: true
  runAsUser: 1001
  runAsGroup: 1001
  fsGroup: 1001
  allowPrivilegeEscalation: false
  privileged: false
  capabilities: { drop: [ALL] }

FluxCD Integration

The flux repo (labrats.work.hetzner.cluster.flux) references the three HelmReleases plus the kustomization at k8s/overlays/production. Image automation watches the latest tag in GHCR and rolls deployments forward when image.yml publishes a new tag.

CI/CD Workflows (.github/workflows/)

WorkflowTriggerAction
image.ymlPush/PR on src/, app/, executor/, Dockerfiles; workflow_dispatchCut a semver release (no v prefix), then build & push 8 images via matrix: API, App, Executor-Claude, Executor-Codex, Executor-Bonsai-8B, Executor-Llama, Webhook, Processor.
build.ymlPush / PR to mainTypeScript lint, test, and build check. Node 20.
pr-checks.ymlPRBranch-name (type/description) and PR-title (conventional commit) validation plus required labels.
pr-no-secrets.ymlPRSecret scan for API keys, tokens, credentials in changed files.
docs-update.ymlDaily 03:15 UTC + manualSubmit a docs-update job to ai-agents that opens / updates docs/update-latest.
gemma-inference.ymlManual dispatchSubmit a Gemma inference job.
qwen-inference.ymlManual dispatchSubmit a Qwen inference job.
llama-cpp-builder.ymlManual + weekly (Mon 02:00 UTC)Build upstream llama.cpp binary for gemma/qwen executors.

All workflows run on k8s-hetzner-arc self-hosted runners; images push to ghcr.io/labrats-work/.