AI Agents

Architecture

Internals of the ai-agents execution engine. The engine uses Kubernetes CRDs as the sole state store — no Redis, no BullMQ, no database.

Three Deployable Processes

One TypeScript codebase produces three images, each with its own entrypoint:

ProcessEntrypointDockerfileImageRole
API serversrc/index.tsDockerfileapps.ai-agentsExpress API + operator + WebSocket terminal
Processorsrc/processor.tsDockerfile.processorapps.ai-agents/processorOperator-only reconcile loop with /health and /ready
Webhooksrc/webhook.tsDockerfile.webhookapps.ai-agents/webhookPer-source webhook receiver that creates InferenceRequest CRs

Webhook deployments are auto-provisioned by the API server when a WebhookSource CR is created (see src/k8s/webhook-sources.ts). Each WebhookSource gets its own Deployment, Service, and Ingress.

The frontend (app/) is a separate Next.js process that proxies /api/* to the API server.

GitHub event ──► Webhook pod ──► InferenceRequest CR
                                         │
                                         ▼
            ┌──────────────────────────────────────────┐
            │  API server / Processor (operator loop)  │
            │  watches IR CRs, creates k8s Jobs        │
            └──────────────────────────────────────────┘
                                         │
                                         ▼
                              ephemeral k8s Job
                              (executor container)

CRDs (group: labrats.work, version: v1alpha1)

KindPurpose
InferenceRequestOne job record. Phases: PendingRunningSucceeded / Failed / Cancelled.
AiModelOne per logical model. Holds providers[] (inference endpoints) and tracks activeJobs / activeJobIds[] against maxConcurrency for concurrency control. Auto-populated by model discovery (spec 0010).
AiAgentNamed agent role. Holds default model/effort/provider, assigned instructions, event triggers, and chain configuration.
AiConfigSingleton named default. Holds global model/effort, per-provider overrides, executor image and resource overrides.
AiInstructionReusable prompt fragment. Scope global (applied to all) or local (assigned per-agent).
WebhookSourceConfigures a webhook receiver. Auto-provisions Deployment + Service + Ingress per source.

Prompts are not stored inline in IR specs (CRD size limit ≈1 MiB). Each IR holds a promptRef pointing to a per-job ConfigMap ({irName}-prompt).

InferenceRequest (full shape, src/k8s/inference-request-types.ts)

spec: {
  source: string                    // "github", "manual", "dashboard", "github-actions", ...
  type: string                      // "implement-issue", "code-review", "document-extraction", ...
  agentRole?: string                // matches AiAgent.metadata.name or trigger
  priority?: number                 // default 2; lower = higher priority
  promptRef: { configMapName, key } // ConfigMap holding the prompt
  workspaceSetup?: {
    gitRepo, gitRef, gitDepth, githubToken
  }
  workspaceFiles?:   [{ name, data }]      // base64-encoded
  workspaceUploads?: [{ id, name }]        // RWX uploads PVC references
  config?: { model, effort, fast, provider, timeoutMs, maxTurns, executorImage, reactEnabled }
  callbackUrl?: string
  pipeline?: { type: "pdf-sections" }
  chain?: {
    chainId, chainDepth, parentRequestName, chainPath[]
    nextAgentTrigger, nextAgentCondition, nextFailurePolicy
  }
  retryPolicy?: { maxAttempts, backoffMs }
  metadata?: Record<string, unknown>
  errors?: [{ context, message, stack, timestamp }]
}

status: {
  phase: "Pending"|"Running"|"Succeeded"|"Failed"|"Cancelled"
  attempts, accountName, k8sJobName, executorPodName
  startedAt, completedAt
  result: { exitCode, timedOut, budgetExceeded, budgetReason, usage{...}, provider, model, effort, fast }
  failedReason, logs[], conditions[]
}

Operator (src/operator/)

Controller (controller.ts)

  • Informer-driven plus a 5-second poll fallback (DEFAULT_POLL_INTERVAL_MS = 5_000).
  • Default concurrency is 5; override with WORKER_CONCURRENCY.
  • Tracks in-flight reconciles in an internal Set of IR names (processing).
  • Pending IRs are sorted by priority ascending, then by creationTimestamp.
  • Running IRs are reconciled before Pending IRs so status polling stays fresh.
  • New events (informer add/update) are debounced via setImmediate into the next reconcile pass.

Reconciler (reconciler.ts)

Phase inActionPhase out
Pendingacquire AiAccount → write prompt ConfigMap → create k8s Job → record accountName/k8sJobNameRunning
Runningpoll Job status → stream pod logs → on terminal status, parse usage and run callbacksSucceeded / Failed
Failed (cancel/retry)Retry endpoint resets to Pending; cancel sets Cancelled and deletes Job

On terminal status the reconciler also calls maybeChainNextAgent() to optionally enqueue a child IR.

Prompt Builder (prompt-builder.ts)

Resolution at job start:

config.model    : IR.spec.config.model
                  → AiAgent.spec.defaultModel
                  → AiConfig.spec.providers[provider].model
                  → AiConfig.spec.model
                  → CLI default (no flag passed)

config.effort   : IR.spec.config.effort
                  → AiAgent.spec.defaultEffort
                  → AiConfig.spec.providers[provider].effort
                  → AiConfig.spec.effort
                  → "medium"

config.provider : IR.spec.config.provider
                  → AiAgent.spec.provider
                  → "claude"

Instruction injection (instructionWatcher.resolveForAgent()):

  1. Collect all global AiInstructions sorted by priority ascending.
  2. Append local instructions listed in the agent's spec.instructions[].
  3. Prepend each as an <instruction name="…" scope="…">…</instruction> XML block to the prompt.

Pipeline jobs (currently only pdf-sections) skip instruction injection.

Chain (chain.ts)

After job completion, maybeChainNextAgent():

  1. Returns immediately if spec.chain.nextAgentTrigger is unset.
  2. Loop guard — refuses if nextAgentTrigger already appears in chainPath.
  3. Depth guardMAX_CHAIN_DEPTH = 10.
  4. Condition — evaluates nextAgentCondition (always | on-success | on-failure) against exitCode === 0.
  5. Failure policystop | skip | notify (only blocks chaining when policy says so).
  6. Captures up to 4,000 chars of parent stdout, wraps it in <chain-context>…</chain-context>, and prepends it to the parent prompt to form the child prompt.
  7. Creates a child IR with chainDepth + 1, extended chainPath, and the next agent's own next config (resolved from its eventTriggers).

See Agent Chains for configuration details.

Job Executor (src/k8s/job-executor.ts)

Each IR transition to Running creates one ephemeral k8s batch/v1 Job.

ResourceCreated per IRPurpose
ConfigMap {jobName}-promptyesHolds prompt text at key prompt.txt, mounted at /prompt.
Secret {jobName}-gitconditionalHolds repo-url (clone URL with token), mounted at /git-config.
Job {jobName}yesRuns the executor container.

Mounts on the executor pod:

PathSource
/credentialsThe AiModel's optional credentialsRef Secret (key CREDENTIALS_SECRET_KEY, default token). Mounted only when set — used by the claude executor; qwen-code/codex authenticate from the caller's apiKeyRef.
/promptPrompt ConfigMap, key prompt.txt.
/git-configOptional git clone URL Secret.
/uploadsRead-only mount of the shared RWX uploads PVC, when workspaceUploads is set.
/modelsOptional models-cache PVC mount for local inference model downloads (when modelUrl is configured).

Defaults:

  • Image: EXECUTOR_IMAGE env (e.g. ghcr.io/labrats-work/apps.ai-agents/executor:<tag>), overridable per-job via AiConfig.spec.executorImage or per-model via AiConfig.spec.providers[provider].modelImages[model].
  • Resources: requests cpu: 500m, memory: 1Gi; limits cpu: 2, memory: 4Gi (Claude/Codex). Local inference providers (vllm/gemma/qwen) use cpu: 2, memory: 2Gi requests; cpu: 4, memory: 4Gi limits. Override with AiConfig.spec.executorResources.
  • activeDeadlineSeconds = ceil(timeoutMs / 1000) + 60 (default timeoutMs = 3 300 000 ms / 55 min).
  • Labels: app.kubernetes.io/managed-by=ai-agents, app.kubernetes.io/component=executor, ai-agents.labrats.work/job-id=<truncated jobId>.
  • ServiceAccount: from EXECUTOR_SERVICE_ACCOUNT env. Must allow PATCHing the credentials Secret (used by the executor entrypoint to persist OAuth refresh on exit).

The Job is cleaned up by the executor on completion via JSON patch back to the IR status.

Executor Container (executor/)

One image per provider. Claude/Codex use node:20-alpine with pre-installed CLIs. Gemma/Qwen use python:3.11-slim with compiled llama-cli and quantized GGUF models. All run as user executor (uid 1001).

Entrypoint flow (executor/entrypoint.sh):

  1. Raise file descriptor limit (claude-code spawns many fsnotify watchers).
  2. Credentials:
    • PROVIDER=claude → copy /credentials/token into $HOME/.claude/.credentials.json (mode 0600). On EXIT, PATCH the Secret to persist refreshed tokens.
    • PROVIDER=codex → read /credentials/token into CODEX_API_KEY.
    • PROVIDER=gemma|qwen → skip (no credentials needed for local inference).
  3. Git clone: read repo URL from /git-config/repo-url (or GIT_REPO_URL env); shallow clone with depth from GIT_DEPTH (default 1).
  4. Workspace uploads: if WORKSPACE_UPLOADS env is set (JSON [{id,name}]), copy /uploads/{id}/{name} into the workspace.
  5. Prompt: read from /prompt/prompt.txt (or PROMPT_TEXT env). For local inference providers, staged uploads are prepended to the prompt so the model sees the full context.
  6. CLI invocation (wrapped in timeout ${TIMEOUT_SEC}s):
    • claudeclaude --print --verbose --dangerously-skip-permissions --output-format stream-json --no-session-persistence [--model M] [--effort E] [--fast] [--max-turns N]
    • codexcodex exec --json --full-auto --skip-git-repo-check [--model M] [--config model_reasoning_effort=E]
    • gemma|qwenllama-cli -m <model.gguf> -p <prompt> -st -n TOKENS -t THREADS -c CTX --temp T --top-p P --top-k K --no-display-prompt -ngl 0. When REACT_ENABLED=true, runs react_loop.py instead (iterative tool-calling via ReAct pattern with workspace auto-scan).
    • mock — log metadata, exit 0 (used in tests).
  7. Exit code is propagated; 124 means timeout.

Model Slots & Credentials (src/k8s/models.ts)

There is no AiAccount pool — it was removed in spec 0003; credential and concurrency tracking moved onto AiModel. A watcher-driven informer keeps a local cache of AiModel CRs.

  • Status states (AiModelCR.status.state): idle, busy, unavailable, error. spec.paused: true keeps a model out of rotation.
  • Concurrency: spec.maxConcurrency caps simultaneous jobs per model (0 = unlimited); status.activeJobs / status.activeJobIds[] track in-flight IRs.
  • Acquisition (acquire()) uses an optimistic JSON Patch with a test op on status.activeJobs to avoid races; on conflict the next candidate is tried.
  • Selection filters out paused / error / unavailable models and those at max concurrency, optionally filters by model name, then sorts by load ratio ascending, then least-recently-used. A provider URL is chosen from spec.providers[] (skipping unavailable ones).
  • Crash recovery: on startup, models whose activeJobIds reference IRs no longer Running are reconciled back to idle.

Credentials

  • Caller identity: every IR carries spec.apiKeyRef (required) → executor AI_AGENTS_API_KEY for the MCP callback. Sourced from the request's Bearer token (submit/events/ingest), WebhookSource.spec.apiKeyRef (webhook events), AiAgent.apiKeyRef (triggers), or inherited (chains). No long-lived executor service credential. (spec 0007)
  • Backend token: optional AiModel.spec.credentialsRef/credentials/token, consumed only by the claude executor. qwen-code/codex use the caller's key, so discovered models need none.
  • GitHub: short-lived App installation tokens minted per webhook event; webhooks HMAC-verified with the App webhookSecret. No PAT.

Webhook Ingest

GitHub webhooks land at POST /api/webhooks/github on the webhook pod (or the API pod for the unified deployment).

src/sources/github/index.ts flow:

  1. Look up the WebhookSource CR by spec.type === "github".
  2. Read webhookSecret, appId, privateKey, installationId from the credentials Secret referenced by spec.credentialsRef.
  3. Verify HMAC-SHA-256 (x-hub-signature-256 header vs raw body).
  4. Normalize x-github-event + payload.action into a canonical event name.
  5. Public-repo filter: if payload.repository.private === false, deny unless allowPublicRepos: true or repository.full_name is in allowedPublicRepos. Returns 200 with matched: 0 on deny (not 4xx).
  6. Match agents via agentWatcher.findByEvent(source, event) plus mention parsing (@ai-{role}, @agents-assemble).
  7. Create one InferenceRequest per matched agent via irClient.create(...).

Workspace and Uploads

The shared RWX uploads PVC is mounted read-write by the API pod and read-only by executor Jobs. Files referenced via workspaceUploads are copied from /uploads/{id}/{name} into the workspace before the CLI runs. The PVC name comes from UPLOADS_PVC_NAME.

Per-job ConfigMaps and git Secrets are deleted after Job termination via owner references and TTL.

Metrics and Logs

  • GET /metrics exposes Prometheus metrics (registry from src/metrics.ts). The dashboard timeseries page reads from PROMETHEUS_URL.
  • GET /api/jobs/:id/logs is a Server-Sent Events stream that:
    1. Streams executor pod logs live (follow: true) when an executorPodName is recorded.
    2. Falls back to tailing the persisted log file in the uploads PVC (used by pipeline jobs).
    3. Closes with {done:true,status:…} once the IR reaches a terminal phase.

Config Defaults Resolution

Job submission spec.config
        │
        ▼ (per-field fallback)
AiAgent.spec.defaultModel / defaultEffort / provider
        │
        ▼
AiConfig.spec.providers[provider].model / effort
        │
        ▼
AiConfig.spec.model / effort
        │
        ▼
Hard defaults: provider="claude", effort="medium"

AiConfig.spec.executorImage and executorResources apply uniformly to all jobs unless an IR explicitly overrides them.