Architecture
Internals of the ai-agents execution engine. The engine uses Kubernetes CRDs as the sole state store — no Redis, no BullMQ, no database.
Three Deployable Processes
One TypeScript codebase produces three images, each with its own entrypoint:
| Process | Entrypoint | Dockerfile | Image | Role |
|---|---|---|---|---|
| API server | src/index.ts | Dockerfile | apps.ai-agents | Express API + operator + WebSocket terminal |
| Processor | src/processor.ts | Dockerfile.processor | apps.ai-agents/processor | Operator-only reconcile loop with /health and /ready |
| Webhook | src/webhook.ts | Dockerfile.webhook | apps.ai-agents/webhook | Per-source webhook receiver that creates InferenceRequest CRs |
Webhook deployments are auto-provisioned by the API server when a WebhookSource CR is created (see src/k8s/webhook-sources.ts). Each WebhookSource gets its own Deployment, Service, and Ingress.
The frontend (app/) is a separate Next.js process that proxies /api/* to the API server.
GitHub event ──► Webhook pod ──► InferenceRequest CR
│
▼
┌──────────────────────────────────────────┐
│ API server / Processor (operator loop) │
│ watches IR CRs, creates k8s Jobs │
└──────────────────────────────────────────┘
│
▼
ephemeral k8s Job
(executor container)
CRDs (group: labrats.work, version: v1alpha1)
| Kind | Purpose |
|---|---|
InferenceRequest | One job record. Phases: Pending → Running → Succeeded / Failed / Cancelled. |
AiModel | One per logical model. Holds providers[] (inference endpoints) and tracks activeJobs / activeJobIds[] against maxConcurrency for concurrency control. Auto-populated by model discovery (spec 0010). |
AiAgent | Named agent role. Holds default model/effort/provider, assigned instructions, event triggers, and chain configuration. |
AiConfig | Singleton named default. Holds global model/effort, per-provider overrides, executor image and resource overrides. |
AiInstruction | Reusable prompt fragment. Scope global (applied to all) or local (assigned per-agent). |
WebhookSource | Configures a webhook receiver. Auto-provisions Deployment + Service + Ingress per source. |
Prompts are not stored inline in IR specs (CRD size limit ≈1 MiB). Each IR holds a promptRef pointing to a per-job ConfigMap ({irName}-prompt).
InferenceRequest (full shape, src/k8s/inference-request-types.ts)
spec: {
source: string // "github", "manual", "dashboard", "github-actions", ...
type: string // "implement-issue", "code-review", "document-extraction", ...
agentRole?: string // matches AiAgent.metadata.name or trigger
priority?: number // default 2; lower = higher priority
promptRef: { configMapName, key } // ConfigMap holding the prompt
workspaceSetup?: {
gitRepo, gitRef, gitDepth, githubToken
}
workspaceFiles?: [{ name, data }] // base64-encoded
workspaceUploads?: [{ id, name }] // RWX uploads PVC references
config?: { model, effort, fast, provider, timeoutMs, maxTurns, executorImage, reactEnabled }
callbackUrl?: string
pipeline?: { type: "pdf-sections" }
chain?: {
chainId, chainDepth, parentRequestName, chainPath[]
nextAgentTrigger, nextAgentCondition, nextFailurePolicy
}
retryPolicy?: { maxAttempts, backoffMs }
metadata?: Record<string, unknown>
errors?: [{ context, message, stack, timestamp }]
}
status: {
phase: "Pending"|"Running"|"Succeeded"|"Failed"|"Cancelled"
attempts, accountName, k8sJobName, executorPodName
startedAt, completedAt
result: { exitCode, timedOut, budgetExceeded, budgetReason, usage{...}, provider, model, effort, fast }
failedReason, logs[], conditions[]
}
Operator (src/operator/)
Controller (controller.ts)
- Informer-driven plus a 5-second poll fallback (
DEFAULT_POLL_INTERVAL_MS = 5_000). - Default concurrency is 5; override with
WORKER_CONCURRENCY. - Tracks in-flight reconciles in an internal
Setof IR names (processing). - Pending IRs are sorted by
priorityascending, then bycreationTimestamp. - Running IRs are reconciled before Pending IRs so status polling stays fresh.
- New events (informer
add/update) are debounced viasetImmediateinto the next reconcile pass.
Reconciler (reconciler.ts)
| Phase in | Action | Phase out |
|---|---|---|
Pending | acquire AiAccount → write prompt ConfigMap → create k8s Job → record accountName/k8sJobName | Running |
Running | poll Job status → stream pod logs → on terminal status, parse usage and run callbacks | Succeeded / Failed |
Failed (cancel/retry) | Retry endpoint resets to Pending; cancel sets Cancelled and deletes Job | — |
On terminal status the reconciler also calls maybeChainNextAgent() to optionally enqueue a child IR.
Prompt Builder (prompt-builder.ts)
Resolution at job start:
config.model : IR.spec.config.model
→ AiAgent.spec.defaultModel
→ AiConfig.spec.providers[provider].model
→ AiConfig.spec.model
→ CLI default (no flag passed)
config.effort : IR.spec.config.effort
→ AiAgent.spec.defaultEffort
→ AiConfig.spec.providers[provider].effort
→ AiConfig.spec.effort
→ "medium"
config.provider : IR.spec.config.provider
→ AiAgent.spec.provider
→ "claude"
Instruction injection (instructionWatcher.resolveForAgent()):
- Collect all
globalAiInstructions sorted bypriorityascending. - Append
localinstructions listed in the agent'sspec.instructions[]. - Prepend each as an
<instruction name="…" scope="…">…</instruction>XML block to the prompt.
Pipeline jobs (currently only pdf-sections) skip instruction injection.
Chain (chain.ts)
After job completion, maybeChainNextAgent():
- Returns immediately if
spec.chain.nextAgentTriggeris unset. - Loop guard — refuses if
nextAgentTriggeralready appears inchainPath. - Depth guard —
MAX_CHAIN_DEPTH = 10. - Condition — evaluates
nextAgentCondition(always|on-success|on-failure) againstexitCode === 0. - Failure policy —
stop|skip|notify(only blocks chaining when policy says so). - Captures up to 4,000 chars of parent stdout, wraps it in
<chain-context>…</chain-context>, and prepends it to the parent prompt to form the child prompt. - Creates a child IR with
chainDepth + 1, extendedchainPath, and the next agent's ownnextconfig (resolved from itseventTriggers).
See Agent Chains for configuration details.
Job Executor (src/k8s/job-executor.ts)
Each IR transition to Running creates one ephemeral k8s batch/v1 Job.
| Resource | Created per IR | Purpose |
|---|---|---|
ConfigMap {jobName}-prompt | yes | Holds prompt text at key prompt.txt, mounted at /prompt. |
Secret {jobName}-git | conditional | Holds repo-url (clone URL with token), mounted at /git-config. |
Job {jobName} | yes | Runs the executor container. |
Mounts on the executor pod:
| Path | Source |
|---|---|
/credentials | The AiModel's optional credentialsRef Secret (key CREDENTIALS_SECRET_KEY, default token). Mounted only when set — used by the claude executor; qwen-code/codex authenticate from the caller's apiKeyRef. |
/prompt | Prompt ConfigMap, key prompt.txt. |
/git-config | Optional git clone URL Secret. |
/uploads | Read-only mount of the shared RWX uploads PVC, when workspaceUploads is set. |
/models | Optional models-cache PVC mount for local inference model downloads (when modelUrl is configured). |
Defaults:
- Image:
EXECUTOR_IMAGEenv (e.g.ghcr.io/labrats-work/apps.ai-agents/executor:<tag>), overridable per-job viaAiConfig.spec.executorImageor per-model viaAiConfig.spec.providers[provider].modelImages[model]. - Resources: requests
cpu: 500m, memory: 1Gi; limitscpu: 2, memory: 4Gi(Claude/Codex). Local inference providers (vllm/gemma/qwen) usecpu: 2, memory: 2Girequests;cpu: 4, memory: 4Gilimits. Override withAiConfig.spec.executorResources. activeDeadlineSeconds = ceil(timeoutMs / 1000) + 60(defaulttimeoutMs= 3 300 000 ms / 55 min).- Labels:
app.kubernetes.io/managed-by=ai-agents,app.kubernetes.io/component=executor,ai-agents.labrats.work/job-id=<truncated jobId>. - ServiceAccount: from
EXECUTOR_SERVICE_ACCOUNTenv. Must allow PATCHing the credentials Secret (used by the executor entrypoint to persist OAuth refresh on exit).
The Job is cleaned up by the executor on completion via JSON patch back to the IR status.
Executor Container (executor/)
One image per provider. Claude/Codex use node:20-alpine with pre-installed CLIs. Gemma/Qwen use python:3.11-slim with compiled llama-cli and quantized GGUF models. All run as user executor (uid 1001).
Entrypoint flow (executor/entrypoint.sh):
- Raise file descriptor limit (claude-code spawns many fsnotify watchers).
- Credentials:
PROVIDER=claude→ copy/credentials/tokeninto$HOME/.claude/.credentials.json(mode 0600). On EXIT, PATCH the Secret to persist refreshed tokens.PROVIDER=codex→ read/credentials/tokenintoCODEX_API_KEY.PROVIDER=gemma|qwen→ skip (no credentials needed for local inference).
- Git clone: read repo URL from
/git-config/repo-url(orGIT_REPO_URLenv); shallow clone with depth fromGIT_DEPTH(default 1). - Workspace uploads: if
WORKSPACE_UPLOADSenv is set (JSON[{id,name}]), copy/uploads/{id}/{name}into the workspace. - Prompt: read from
/prompt/prompt.txt(orPROMPT_TEXTenv). For local inference providers, staged uploads are prepended to the prompt so the model sees the full context. - CLI invocation (wrapped in
timeout ${TIMEOUT_SEC}s):claude—claude --print --verbose --dangerously-skip-permissions --output-format stream-json --no-session-persistence [--model M] [--effort E] [--fast] [--max-turns N]codex—codex exec --json --full-auto --skip-git-repo-check [--model M] [--config model_reasoning_effort=E]gemma|qwen—llama-cli -m <model.gguf> -p <prompt> -st -n TOKENS -t THREADS -c CTX --temp T --top-p P --top-k K --no-display-prompt -ngl 0. WhenREACT_ENABLED=true, runsreact_loop.pyinstead (iterative tool-calling via ReAct pattern with workspace auto-scan).mock— log metadata, exit 0 (used in tests).
- Exit code is propagated;
124means timeout.
Model Slots & Credentials (src/k8s/models.ts)
There is no AiAccount pool — it was removed in spec 0003; credential and concurrency tracking moved onto AiModel. A watcher-driven informer keeps a local cache of AiModel CRs.
- Status states (
AiModelCR.status.state):idle,busy,unavailable,error.spec.paused: truekeeps a model out of rotation. - Concurrency:
spec.maxConcurrencycaps simultaneous jobs per model (0 = unlimited);status.activeJobs/status.activeJobIds[]track in-flight IRs. - Acquisition (
acquire()) uses an optimistic JSON Patch with atestop onstatus.activeJobsto avoid races; on conflict the next candidate is tried. - Selection filters out
paused/error/unavailablemodels and those at max concurrency, optionally filters by model name, then sorts by load ratio ascending, then least-recently-used. A provider URL is chosen fromspec.providers[](skippingunavailableones). - Crash recovery: on startup, models whose
activeJobIdsreference IRs no longerRunningare reconciled back to idle.
Credentials
- Caller identity: every IR carries
spec.apiKeyRef(required) → executorAI_AGENTS_API_KEYfor the MCP callback. Sourced from the request's Bearer token (submit/events/ingest),WebhookSource.spec.apiKeyRef(webhook events),AiAgent.apiKeyRef(triggers), or inherited (chains). No long-lived executor service credential. (spec 0007) - Backend token: optional
AiModel.spec.credentialsRef→/credentials/token, consumed only by theclaudeexecutor.qwen-code/codexuse the caller's key, so discovered models need none. - GitHub: short-lived App installation tokens minted per webhook event; webhooks HMAC-verified with the App
webhookSecret. No PAT.
Webhook Ingest
GitHub webhooks land at POST /api/webhooks/github on the webhook pod (or the API pod for the unified deployment).
src/sources/github/index.ts flow:
- Look up the
WebhookSourceCR byspec.type === "github". - Read
webhookSecret,appId,privateKey,installationIdfrom the credentials Secret referenced byspec.credentialsRef. - Verify HMAC-SHA-256 (
x-hub-signature-256header vs raw body). - Normalize
x-github-event+payload.actioninto a canonical event name. - Public-repo filter: if
payload.repository.private === false, deny unlessallowPublicRepos: trueorrepository.full_nameis inallowedPublicRepos. Returns200withmatched: 0on deny (not 4xx). - Match agents via
agentWatcher.findByEvent(source, event)plus mention parsing (@ai-{role},@agents-assemble). - Create one
InferenceRequestper matched agent viairClient.create(...).
Workspace and Uploads
The shared RWX uploads PVC is mounted read-write by the API pod and read-only by executor Jobs. Files referenced via workspaceUploads are copied from /uploads/{id}/{name} into the workspace before the CLI runs. The PVC name comes from UPLOADS_PVC_NAME.
Per-job ConfigMaps and git Secrets are deleted after Job termination via owner references and TTL.
Metrics and Logs
GET /metricsexposes Prometheus metrics (registryfromsrc/metrics.ts). The dashboard timeseries page reads fromPROMETHEUS_URL.GET /api/jobs/:id/logsis a Server-Sent Events stream that:- Streams executor pod logs live (
follow: true) when anexecutorPodNameis recorded. - Falls back to tailing the persisted log file in the uploads PVC (used by pipeline jobs).
- Closes with
{done:true,status:…}once the IR reaches a terminal phase.
- Streams executor pod logs live (
Config Defaults Resolution
Job submission spec.config
│
▼ (per-field fallback)
AiAgent.spec.defaultModel / defaultEffort / provider
│
▼
AiConfig.spec.providers[provider].model / effort
│
▼
AiConfig.spec.model / effort
│
▼
Hard defaults: provider="claude", effort="medium"
AiConfig.spec.executorImage and executorResources apply uniformly to all jobs unless an IR explicitly overrides them.