Using GX10 local models from your machine (codex / Qwen Code)
Run your local codex or Qwen Code CLI against the on-prem GX10 models
(e.g. gpt-oss-120b) through the ai-agents inference proxy — no OpenAI/Anthropic
cloud, no GPU on your machine. Inference runs on the GX10; the proxy handles
auth, routing, and tool-call formatting.
your laptop (codex / qwen)
│ OpenAI-compatible HTTPS + API key
▼
ai-agents proxy (chat.ai-agents-dev.hcl.labrats.work)
│ resolves the model → backend, normalizes tool calls
▼
GX10 vLLM (gpt-oss-120b, …)
Prerequisites
- codex CLI ≥ 0.13 (uses the Responses API) and/or Qwen Code CLI (
qwen). - Network access to
https://chat.ai-agents-dev.hcl.labrats.work(reachable off-cluster; the proxy enforces API-key auth). - A proxy API key (below).
1. Get an API key
The proxy rejects unauthenticated requests (401). Get a key one of:
- From the cluster (if you have
kubectl):kubectl -n ai-agents-dev get secret qwen-code-proxy-credentials \ -o jsonpath='{.data.key}' | base64 -d - From an admin, or issue a personal key via the dashboard (
POST /api/api-keys).
Then export it (both CLIs read OPENAI_API_KEY):
export OPENAI_API_KEY="<proxy-key>"
2. See available models
curl -s -H "Authorization: Bearer $OPENAI_API_KEY" \
https://chat.ai-agents-dev.hcl.labrats.work/v1/models | jq '.data[].id'
⚠️ The GX10 serves one model at a time (single-active).
gpt-oss-120bis the currently active model — use it.qwen3-coder-30bis registered but resolves to an inactive backend right now; requesting it will fail until the GX10 active model is swapped. Ask in #infra before relying on a non-active model.
3. codex CLI (Responses API)
codex ≥ 0.13 only speaks the Responses API (wire_api = "responses"), which the
proxy now supports.
Add to ~/.codex/config.toml:
[model_providers.gx10]
name = "GX10 ai-agents proxy" # required — codex errors if empty
base_url = "https://chat.ai-agents-dev.hcl.labrats.work/v1"
wire_api = "responses"
env_key = "OPENAI_API_KEY"
model = "gpt-oss-120b"
model_provider = "gx10"
Run it:
export OPENAI_API_KEY="<proxy-key>"
codex # interactive (prompts before running tools)
codex exec "summarize README.md" # non-interactive
Per-invocation (no config edit):
codex -m gpt-oss-120b \
-c model_provider=gx10 \
-c 'model_providers.gx10.name="GX10"' \
-c 'model_providers.gx10.base_url="https://chat.ai-agents-dev.hcl.labrats.work/v1"' \
-c 'model_providers.gx10.wire_api="responses"' \
-c 'model_providers.gx10.env_key="OPENAI_API_KEY"'
4. Qwen Code CLI (chat-completions)
qwen uses chat-completions. Add an openai provider in ~/.qwen/settings.json:
{
"security": { "auth": { "selectedType": "openai" } },
"model": { "name": "gpt-oss-120b" },
"modelProviders": {
"openai": [{
"id": "gpt-oss-120b",
"name": "GX10 ai-agents proxy",
"envKey": "OPENAI_API_KEY",
"baseUrl": "https://chat.ai-agents-dev.hcl.labrats.work/v1"
}]
}
}
export OPENAI_API_KEY="<proxy-key>"
qwen -p "list the files here and summarize"
5. Verify (one-shot tool call)
codex exec --dangerously-bypass-approvals-and-sandbox \
"run 'echo it-works' with your shell tool and report the output"
You should see the command execute and it-works reported back — that confirms
the model is reaching the GX10 and tool calling works.
Troubleshooting
| Symptom | Cause / fix |
|---|---|
404 Cannot POST /v1/responses | Proxy too old, or wire_api not responses. Needs proxy ≥ 0.122.0 (dev) and wire_api = "responses". |
401 | Missing/invalid OPENAI_API_KEY. |
provider name must not be empty (codex) | Set name on the [model_providers.*] block. |
wire_api = "chat" is no longer supported (codex) | codex ≥ 0.13 is responses-only — use wire_api = "responses". |
Model not found / backend errors | You requested a non-active GX10 model (see §2) — use gpt-oss-120b. |
| Falls back to OpenAI/Aliyun cloud | base_url/provider not selected — ensure model_provider/provider points at gx10 and the base_url is set. |
Notes
- This targets the dev proxy (
ai-agents-dev). A prod endpoint requires a separate promotion. - Tool calling is normalized by the proxy (the GX10 emits Hermes-style calls, surfaced as standard OpenAI/Responses
tool_calls), so standard OpenAI-compatible clients work. - gpt-oss reasoning currently appears inline in message content rather than as a separate reasoning item — harmless for tool loops, occasionally chatty.