Shared Registry Resolution And Ownership¶

This page explains the dynamic rules around the shared registry: how canonical names relate to authoritative ids, what makes a record fresh, how duplicate publishers are handled, and when registry state is treated as stale versus as a hard error.

Mental Model¶

The registry resolves in two stages:

storage-level lookup decides whether there is a fresh usable record for a logical name,
runtime-level control validation decides whether the pointed-to manifest and agent-definition data are still safe to trust.

That distinction is why some failures collapse into “stale” while others still fail fast.

Canonical Names And Agent IDs¶

Registry-facing input accepts either:

canonical HOUMAO-gpu, or
unprefixed gpu.

The runtime canonicalizes both to HOUMAO-gpu before:

reading a record,
publishing a record,
comparing logical ownership.

Important consequences:

prefixed and unprefixed input refer to the same logical identity,
canonical names remain the human-stable lookup surface,
agent_id is the authoritative runtime-wide directory key and direct lookup surface,
convenience lookup by canonical name may need to scan live records and can report ambiguity when multiple live agent_ids share one canonical name.

Freshness And Lease Semantics¶

Freshness is lease-based, not directory-based.

v1 uses a 24-hour soft lease,
lease_expires_at >= now means the record is fresh,
expired records are treated as stale even if the directory still exists,
stale directories are expected after crashes and are cleaned later by houmao-mgr admin cleanup registry.

Timezone matters:

published_at and lease_expires_at must include timezone information,
naive timestamps are rejected rather than interpreted relative to the local machine timezone.

Generation Ownership¶

generation_id answers “which live session instance currently owns this logical name?”

Rules:

a new tmux-backed live session gets one generation_id,
refreshes keep the same generation_id,
resume reuses the persisted generation_id when the same live session is being reclaimed,
a replacement publisher must use a different generation_id,
ownership conflicts are enforced on agent_id, not on canonical name alone.

Fresh duplicate ownership is not allowed.

If a fresh record already exists for the same canonical agent_name with a different generation_id:

the new publish attempt is rejected,
later refreshes also re-check ownership so a losing publisher can stand down instead of quietly coexisting.

Stale Versus Hard-Invalid Outcomes¶

The registry intentionally distinguishes “unusable stale state” from “this would target the wrong session.”

Storage-level stale outcomes¶

resolve_live_agent_record() returns no live record when the stored record.json is:

missing,
malformed JSON,
schema-invalid under the strict model,
expired,
published under a different canonical name than the requested name lookup,
ambiguous because more than one fresh agent_id matches the requested canonical name.

Those cases are treated as not-found or stale discovery state rather than as a live result.

Runtime-level hard validation outcomes¶

After a fresh record is found, name-based control still validates the pointers it is about to trust.

That path still fails explicitly when:

runtime.manifest_path is not absolute,
the manifest file no longer exists,
the resolved manifest backend is not tmux-backed,
the persisted tmux session handle in the manifest does not match the addressed tmux session,
the canonical agent name must be learned from manifest or registry metadata rather than inferred from tmux session naming alone,
runtime.agent_def_dir is required but missing, non-absolute, or stale and no explicit --agent-def-dir override was supplied.

That split is intentional:

malformed or expired registry state should not block recovery forever,
but a fresh record that points at the wrong session should not silently recover to some other target.

Known-Name Resolution Flow¶

At the registry-storage layer, two lookup modes exist:

direct lookup by authoritative agent_id using live_agents/<agent-id>/record.json,
convenience lookup by canonical agent name by scanning live records and returning one unique fresh match.

Name-based lookup can therefore report ambiguity when multiple live agent_ids share one canonical name.

Remove And Cleanup Ownership Boundaries¶

Two cleanup paths exist:

targeted removal during authoritative runtime teardown,
stale-directory cleanup through houmao-mgr admin cleanup registry.

Targeted teardown removal is guarded by generation_id:

if the stored record still belongs to another generation, the caller does not remove it.

Stale cleanup is broader:

missing record.json,
malformed records,
expired records beyond the cleanup grace period

are removal candidates regardless of who originally created them.

Current Implementation Notes¶

Publication is lock-free and uses compare-then-replace semantics, so the design tolerates a narrow race window while still requiring the losing generation to stand down later.
Storage-level resolution returning None does not mean name-based runtime control will always succeed when a record exists; manifest and agent-definition validation still happens after record lookup.
Explicit --agent-def-dir overrides beat registry-published runtime.agent_def_dir.