Agent Chains & Cascading Delegation
Cascading agent delegation enables multi-step workflows where completing one agent can automatically trigger the next agent in a defined chain. This pattern supports pipelines like triage → developer → reviewer without manual intervention.
Overview
When a job completes successfully, the system can automatically queue a child job with the next agent in the chain. The child job receives context from the parent's execution, enabling each agent to build on previous work.
Initial event
↓
[Triage Agent] → analyzes issue
↓
[Developer Agent] ← receives triage output + issue
↓
[Reviewer Agent] ← receives dev output + issue
↓
Completion
Defining Chains
Chains are defined in an agent's eventTriggers array as part of the trigger configuration. Each trigger can define the next agent to run and conditions for progression.
ChainStep Structure
interface ChainStep {
agentTriggerName: string // Trigger name of next agent (e.g., "ai-developer")
condition: "always" // Progression condition
| "on-success" // Only if parent exits with code 0
| "on-failure" // Only if parent exits with non-zero code
failurePolicy: "stop" // Stop the chain on failure
| "skip" // Skip this agent and continue with next
| "notify" // Continue but send notification
}
Example Agent Configuration
apiVersion: labrats.work/v1alpha1
kind: AiAgent
metadata:
name: triage
namespace: ai-agents
spec:
name: Issue Triage
triggerName: ai-triage
defaultModel: haiku
defaultEffort: low
enabled: true
eventTriggers:
- source: github
events: ["issues.opened"]
next:
agentTriggerName: ai-developer
condition: on-success
failurePolicy: stop
Chain Tracking
Each job carries metadata about its position in a chain:
| Field | Type | Purpose |
|---|---|---|
chainId | string | Unique identifier for the entire chain (shared by all jobs in it) |
chainDepth | number | Position in chain (0 = initiator, 1 = first delegated, etc.) |
parentJobId | string | ID of job that triggered this one (or null if initiator) |
chainPath | string[] | Ordered list of agent trigger names traversed (loop detection) |
These fields are exposed in the job API for filtering and tracking multi-step workflows:
# Get all jobs in a chain
curl https://ai-agents.hcl.labrats.work/api/jobs?chainId=abc123 \
-H "Authorization: Bearer <token>"
# Response includes chain metadata
{
"id": "job-2",
"chainId": "abc123",
"chainDepth": 1,
"parentJobId": "job-1",
"chainPath": ["ai-triage", "ai-developer"],
...
}
Context Propagation
When the next agent is queued, the parent job's output is captured and passed as context. This enables the child agent to build on previous work:
Parent Output Capture
- Capture size: Up to 4,000 characters of parent's stdout
- Format: Wrapped in
<chain-context>block - Placement: Prepended to child's original prompt
Example
Parent (Triage) stdout:
Issue Analysis:
- Type: Bug in authentication flow
- Priority: High
- Affected components: LoginForm, AuthService
Child (Developer) receives:
<chain-context>
Previous agent output:
Issue Analysis:
- Type: Bug in authentication flow
- Priority: High
- Affected components: LoginForm, AuthService
</chain-context>
[Original developer prompt...]
Safety Limits
Chains have built-in safeguards to prevent infinite loops and resource exhaustion:
| Limit | Value | Purpose |
|---|---|---|
| Max chain depth | 10 | Prevents runaway delegation chains |
| Output capture | 4000 chars | Bounds context size |
| Loop detection | chainPath dedup | Prevents agent A → B → A cycles |
If a chain exceeds max depth or would create a loop, the job fails with a clear error message.
Failure Handling
Each chain step can define how to handle failures from the previous agent:
Stop (default)
Chain terminates immediately on failure. Useful for strict dependency chains.
failurePolicy: stop
Skip
If agent fails, move to next agent in chain (if defined). Useful for optional enrichment steps.
failurePolicy: skip
Notify
Continue chain but emit an event for monitoring/alerting.
failurePolicy: notify
Progression Conditions
Control when the next agent runs based on parent's exit status:
Always
Run next agent regardless of parent outcome. Useful for notifications or cleanup agents.
condition: always
On Success
Run next agent only if parent exits with code 0. Most common for dependent workflows.
condition: on-success
On Failure
Run next agent only if parent exits with non-zero code. Useful for error handling/escalation.
condition: on-failure
Monitoring Chains
View Chain in Dashboard
The job detail page shows:
- Full chain path (all agents traversed)
- Chain ID for grouping related jobs
- Parent job ID for tracing context
- Current depth in chain
Query Chains via API
# All jobs in a chain
GET /api/jobs?chainId=abc123
# Jobs by agent
GET /api/jobs?agentRole=ai-developer
# Single job with chain metadata
GET /api/jobs/job-id
Response includes:
{
"id": "job-2",
"type": "implement-issue",
"agentRole": "ai-developer",
"chainId": "abc123",
"chainDepth": 1,
"parentJobId": "job-1",
"chainPath": ["ai-triage", "ai-developer"],
"status": "completed",
"result": "...",
...
}
Troubleshooting
Chain stuck at depth 0
Problem: Next agent never queued despite on-success condition.
Debug steps:
- Check parent job's exit code (should be 0 for on-success)
- Verify next agent's trigger name exists
- Check next agent is enabled (
enabled: true)
Loop detected / Chain depth exceeded
Problem: Job failed with "Loop detected" or "Max chain depth exceeded"
Causes:
- Agent configured to call itself (directly or indirectly)
- Chain too long (>10 steps)
Resolution:
- Review
next.agentTriggerNamein all agents' eventTriggers - Simplify chain design (break into multiple separate workflows)
Child job missing parent context
Problem: Child doesn't have parent output in <chain-context> block.
Causes:
- Parent output was empty or shorter than 4000 chars (should still appear)
- Pipeline jobs (e.g., pdf-sections) don't use chains — they use different execution path
Check:
- View parent job logs in API detail endpoint
- Verify parent executed successfully
Best Practices
-
Keep chains shallow — Prefer 2-3 agents per chain. Longer chains are harder to debug and troubleshoot.
-
Use descriptive trigger names — Names like
ai-triage,ai-developer,ai-reviewerare clearer thanagent-1,agent-2. -
Plan failure modes — Decide upfront whether a failure in step 1 should block step 2 (use
stop) or proceed anyway (useskip). -
Monitor chain metrics — Track how often chains complete successfully, where they fail, and adjust conditions/policies accordingly.
-
Test chains in dev — Use the API directly to test a new chain before deploying to production.
-
Document the workflow — Add comments to agent manifests explaining the chain and its purpose.
Example:
# Triage agent for initial assessment
# ↓ on-success → Developer agent (implements fix)
# ↓ on-success → Reviewer agent (code review)
Examples
Issue → Fix → Review
---
# 1. Triage agent analyzes issue
apiVersion: labrats.work/v1alpha1
kind: AiAgent
metadata:
name: issue-triage
spec:
name: Issue Triage
triggerName: ai-triage
defaultModel: haiku
eventTriggers:
- source: github
events: ["issues.opened"]
next:
agentTriggerName: ai-developer
condition: on-success
---
# 2. Developer implements fix
apiVersion: labrats.work/v1alpha1
kind: AiAgent
metadata:
name: developer
spec:
name: Developer
triggerName: ai-developer
defaultModel: sonnet
eventTriggers:
- source: manual
next:
agentTriggerName: ai-reviewer
condition: on-success
---
# 3. Reviewer reviews code
apiVersion: labrats.work/v1alpha1
kind: AiAgent
metadata:
name: reviewer
spec:
name: Code Reviewer
triggerName: ai-reviewer
defaultModel: sonnet
Error Escalation
Run escalation agent only on failures:
apiVersion: labrats.work/v1alpha1
kind: AiAgent
metadata:
name: primary-agent
spec:
name: Primary Agent
triggerName: ai-primary
eventTriggers:
- source: github
events: ["issues.labeled"]
filters:
label: "help-wanted"
next:
agentTriggerName: ai-escalate
condition: on-failure
failurePolicy: stop