Shared Registry Resolution And Ownership¶
This page explains the dynamic rules around the shared registry: how canonical names relate to authoritative ids, what makes a record fresh, how duplicate publishers are handled, and when registry state is treated as stale versus as a hard error.
Mental Model¶
The registry resolves in two stages:
- storage-level lookup decides whether there is a fresh usable record for a logical name,
- runtime-level control validation decides whether the pointed-to manifest and agent-definition data are still safe to trust.
That distinction is why some failures collapse into “stale” while others still fail fast.
Canonical Names And Agent IDs¶
Registry-facing input accepts either:
- canonical
HOUMAO-gpu, or - unprefixed
gpu.
The runtime canonicalizes both to HOUMAO-gpu before:
- reading a record,
- publishing a record,
- comparing logical ownership.
Important consequences:
- prefixed and unprefixed input refer to the same logical identity,
- canonical names remain the human-stable lookup surface,
agent_idis the authoritative runtime-wide directory key and direct lookup surface,- convenience lookup by canonical name may need to scan live records and can report ambiguity when multiple live
agent_ids share one canonical name.
Freshness And Lease Semantics¶
Freshness is lease-based, not directory-based.
- v1 uses a 24-hour soft lease,
lease_expires_at >= nowmeans the record is fresh,- expired records are treated as stale even if the directory still exists,
- stale directories are expected after crashes and are cleaned later by
houmao-mgr admin cleanup registry.
Timezone matters:
published_atandlease_expires_atmust include timezone information,- naive timestamps are rejected rather than interpreted relative to the local machine timezone.
Generation Ownership¶
generation_id answers “which live session instance currently owns this logical name?”
Rules:
- a new tmux-backed live session gets one
generation_id, - refreshes keep the same
generation_id, - resume reuses the persisted
generation_idwhen the same live session is being reclaimed, - a replacement publisher must use a different
generation_id, - ownership conflicts are enforced on
agent_id, not on canonical name alone.
Fresh duplicate ownership is not allowed.
If a fresh record already exists for the same canonical agent_name with a different generation_id:
- the new publish attempt is rejected,
- later refreshes also re-check ownership so a losing publisher can stand down instead of quietly coexisting.
Stale Versus Hard-Invalid Outcomes¶
The registry intentionally distinguishes “unusable stale state” from “this would target the wrong session.”
Storage-level stale outcomes¶
resolve_live_agent_record() returns no live record when the stored record.json is:
- missing,
- malformed JSON,
- schema-invalid under the strict model,
- expired,
- published under a different canonical name than the requested name lookup,
- ambiguous because more than one fresh
agent_idmatches the requested canonical name.
Those cases are treated as not-found or stale discovery state rather than as a live result.
Runtime-level hard validation outcomes¶
After a fresh record is found, name-based control still validates the pointers it is about to trust.
That path still fails explicitly when:
runtime.manifest_pathis not absolute,- the manifest file no longer exists,
- the resolved manifest backend is not tmux-backed,
- the persisted tmux session handle in the manifest does not match the addressed tmux session,
- the canonical agent name must be learned from manifest or registry metadata rather than inferred from tmux session naming alone,
runtime.agent_def_diris required but missing, non-absolute, or stale and no explicit--agent-def-diroverride was supplied.
That split is intentional:
- malformed or expired registry state should not block recovery forever,
- but a fresh record that points at the wrong session should not silently recover to some other target.
Known-Name Resolution Flow¶
At the registry-storage layer, two lookup modes exist:
- direct lookup by authoritative
agent_idusinglive_agents/<agent-id>/record.json, - convenience lookup by canonical agent name by scanning live records and returning one unique fresh match.
Name-based lookup can therefore report ambiguity when multiple live agent_ids share one canonical name.
Remove And Cleanup Ownership Boundaries¶
Two cleanup paths exist:
- targeted removal during authoritative runtime teardown,
- stale-directory cleanup through
houmao-mgr admin cleanup registry.
Targeted teardown removal is guarded by generation_id:
- if the stored record still belongs to another generation, the caller does not remove it.
Stale cleanup is broader:
- missing
record.json, - malformed records,
- expired records beyond the cleanup grace period
are removal candidates regardless of who originally created them.
Current Implementation Notes¶
- Publication is lock-free and uses compare-then-replace semantics, so the design tolerates a narrow race window while still requiring the losing generation to stand down later.
- Storage-level resolution returning
Nonedoes not mean name-based runtime control will always succeed when a record exists; manifest and agent-definition validation still happens after record lookup. - Explicit
--agent-def-diroverrides beat registry-publishedruntime.agent_def_dir.