Role-based Runbooks
This page gives concrete responsibilities and recurring actions by role.
Role Matrix
| Role | Primary Surfaces | Owns | Does Not Own |
|---|---|---|---|
| System Admin | Portal ops, control-plane APIs | policy defaults, admin approvals, security posture | day-to-day contributor node operations |
| Coordinator Owner | coordinator ops view, coordinator config | coordinator health, peer topology, queue behavior | user account auth lifecycle |
| Node Contributor | worker runtime, enrollment workflow | node uptime, registration token handling, local runtime health | global policy and treasury controls |
| Application User | portal user dashboard, wallet views | submitting workload, account-level workflow | mesh internals and deployment policy |
System Admin Runbook
Daily
- Verify coordinator, inference, and control-plane health endpoints.
- Review pending approvals and blacklist changes.
- Validate pricing and issuance windows have current recalculation data.
Weekly
- Review rollout controls and model source policy.
- Verify ledger and blacklist audit checks.
- Confirm disaster recovery and backup paths are current.
Incident
- Restrict high-risk paths (policy + approval gates).
- Isolate affected nodes or coordinators.
- Reconcile ledger/audit state before reopening traffic.
Coordinator Owner Runbook
Daily
- Check coordinator runtime health and queue behavior.
- Confirm peer discovery and mesh connectivity.
- Review assignment quality and worker failure rates.
Weekly
- Validate coordinator bootstrap/discovery configuration.
- Tune capacity and assignment controls.
- Review coordinator fee and treasury policy alignment with admin.
Incident
- Pause new assignments if integrity is uncertain.
- Drain or isolate unhealthy workers.
- Resume in staged mode with increased observability.
Node Contributor Runbook
Initial setup
- Enroll node in portal.
- Start worker with registration token.
- Confirm node appears in pending/active lists.
Ongoing
- Keep runtime updated.
- Monitor local resource usage and scheduling constraints.
- Rotate credentials or tokens when required by policy.
Application User Runbook
Usage flow
- Authenticate and verify account.
- Submit workload from portal or integrated surface.
- Track status and review outputs.
- Manage wallet/credits for sustained usage.