Gateway Queue And Recovery¶

This page explains how the live gateway process persists queue state, tracks the currently attached upstream instance, and avoids replaying work across unsafe continuity changes.

Mental Model¶

The gateway is small on purpose.

It keeps a durable queue plus a read-optimized status snapshot.
It owns one active execution slot for terminal-mutating work.
It treats the managed agent behind it as a replaceable upstream instance, not as the durable identity of the session.
That is why it tracks an epoch and blocks replay when continuity becomes uncertain.

Queue Storage Model¶

The durable queue lives in queue.sqlite.

Current stored request states:

accepted
running
completed
failed
coalesced

Current queue-depth reporting counts only accepted and running items. Completed, failed, and coalesced records remain useful for history and diagnostics, but they are not part of active queue depth.

Opt-in gateway diagnostic logs do not live in queue.sqlite. They are cleanup-sensitive JSONL files under logs/diagnostics/ and are useful for route-boundary and mailbox-operation postmortems. The queue database remains the durable authority for accepted work, terminal state, and gateway-owned notifier audit history.

Provider-native pending input is a different state. surface.pending_input=yes comes only from the tracked provider TUI and means that the CLI visibly holds submitted user input behind its active turn. An unsubmitted composer draft, an accepted row in queue.sqlite, and an explicit Houmao prompt-submission note are each separate facts. None of them sets provider-native pending input.

Current-Instance State¶

The gateway writes run/current-instance.json with:

process id,
bound host,
bound port,
managed_agent_instance_epoch,
optional managed_agent_instance_id.

This file tells the runtime which live gateway process published the current listener, and it lets the gateway notice when the upstream managed-agent instance behind the same session changed.

Request Admission And Serial Execution¶

The gateway worker loop is intentionally serialized.

only one queue item can hold the active terminal-mutation slot at a time,
new requests are first persisted as accepted,
the worker coalesces adjacent pending control intents before promotion,
the worker promotes the next effective eligible request to running,
completion updates the record to completed or failed and appends an event.

sequenceDiagram
    participant CLI as Runtime CLI
    participant GW as Gateway
    participant Q as queue.sqlite
    participant Be as Agent terminal
    CLI->>GW: POST /v1/requests
    GW->>Q: insert accepted record
    GW-->>CLI: accepted response
    opt adjacent control-intent run
        GW->>Q: mark superseded records<br/>coalesced
    end
    GW->>Q: promote effective record<br/>to running
    GW->>Be: submit_prompt or interrupt
    alt backend call succeeds
        GW->>Q: mark completed
    else backend call fails
        GW->>Q: mark failed
    end

POST /v1/control/prompt bypasses this durable worker queue. Its TUI admission policies inspect the latest tracked snapshot: ready_only requires prompt-ready posture and no pending input, if_no_pending checks only that pending input is decisively absent, and always bypasses both tracked checks. This decision is observational. The gateway does not reserve a provider queue slot or hold its lock while waiting for a repaint, so two closely spaced conditional calls can both dispatch before the provider surface changes.

Control-Intent Coalescing¶

The gateway treats a narrow set of queued records as coalescible control intents:

interrupt,
submit_prompt whose entire trimmed prompt is exactly /compact,
submit_prompt whose entire trimmed prompt is exactly /clear,
submit_prompt whose entire trimmed prompt is exactly /new.

This policy is intentionally conservative. It does not parse command prefixes inside ordinary prose, it does not coalesce multiline prompts that merely mention commands, and it does not apply to direct /v1/control/prompt because that route bypasses the durable queue.

When the oldest accepted queue record is a control intent, the worker scans the adjacent accepted control-intent run for the same managed_agent_instance_epoch. Ordinary prompts, internal mail_notifier_prompt records, unsupported request kinds, and different epochs stop the scan. Within the run, duplicate interrupts collapse to one interrupt, context-control prompts collapse to the strongest effective context action, /new supersedes /clear and /compact, and /clear supersedes /compact. If both interrupt and context action remain effective, the interrupt executes first and the context action executes afterward.

Rows removed from execution are not deleted. They are marked coalesced, get finished_at_utc, and store result_json with the superseding request or action. The gateway also appends a coalesced event listing the coalesced request ids and effective actions.

Health Versus Upstream Availability¶

This split is easy to miss the first time you debug the system.

GET /health only asks whether the gateway control plane is alive.
GET /v1/status adds the managed-agent view: connectivity, recovery state, request admission, and surface eligibility.

That means a healthy gateway can still report:

managed_agent_connectivity=unavailable,
managed_agent_recovery=awaiting_rebind,
request_admission=blocked_unavailable.

The gateway is alive; the upstream session it fronts is not currently ready.

Epochs, Reconciliation, And Replay Blocking¶

The gateway increments managed_agent_instance_epoch when it sees a different current upstream instance id than the last one it recorded.

Consequences:

if the upstream instance did not change, the epoch stays stable,
if the upstream instance changed, the gateway enters reconciliation-oriented status,
requests accepted for the old epoch are not replayed blindly against the replacement instance.

Representative status after an instance change:

{
  "gateway_health": "healthy",
  "managed_agent_connectivity": "connected",
  "managed_agent_recovery": "reconciliation_required",
  "request_admission": "blocked_reconciliation",
  "managed_agent_instance_epoch": 2
}

This is a safety boundary, not just bookkeeping. It prevents the sidecar from silently delivering old queued intent to a new upstream instance whose continuity has not been positively established.

Restart Recovery¶

Gateway restarts do not discard already accepted queued work by default.

Current behavior:

requests left in accepted state remain eligible after restart,
requests left in running state are marked failed on startup because the old process died mid-execution,
accepted work can be recovered, coalesced, and executed after restart if the upstream instance continuity is still valid,
accepted work is preserved but not replayed when the new startup detects an epoch change that requires reconciliation.

sequenceDiagram
    participant Old as Old gateway
    participant Q as queue.sqlite
    participant New as New gateway
    participant Up as Upstream
    Old->>Q: leave accepted work<br/>durably stored
    Old-x New: process restart
    New->>Q: fail leftover running work
    New->>Up: inspect current instance id
    alt same instance
        New->>Q: execute accepted work
    else replacement instance
        New->>Q: keep accepted work<br/>blocked by reconciliation
    end

Current Execution-Adapter Boundary¶

The live gateway process now selects an execution adapter from manifest-backed authority plus internal bootstrap metadata instead of assuming a single callback path.

Legacy REST-backed adapters may still appear when inspecting old manifests, but new public launches no longer create cao_rest or houmao_server_rest sessions.
A local tmux-backed adapter covers runtime-owned native headless sessions and runtime-owned local_interactive sessions, and resumes that runtime through runtime-owned control.
A passive-server-managed headless adapter covers native headless sessions whose attach metadata publishes managed_api_base_url plus managed_agent_ref, and routes prompt or interrupt work back through the managed-agent API rather than bypassing passive-server-owned turn authority.
Queue durability, reconciliation checks, and gateway-local epoch handling stay the same across those adapters.