fix: align dashboard session access with RBAC #82

Merged
jesse merged 13 commits from fix/dashboard-rbac-list-get-consistency into main 2026-05-18 02:04:53 +02:00
Owner

Summary

  • filter list_sessions with the same per-agent session access predicate used by get_session
  • bind dashboard SSE/WS connections to the persisted local-owner identity and refcount shared AgentRegistry entries
  • repair legacy local-owner grants once, document seed-file security, and surface dashboard list warnings without discarding sessions
  • add regression coverage across Rust integration tests and dashboard Playwright tests

Verification

  • cargo fmt --all --check
  • git diff --check
  • diff secret-pattern scan
  • cargo test --workspace
  • cargo clippy --workspace --all-targets -- -D warnings -D clippy::unwrap_used -D clippy::panic -D clippy::dbg_macro -D clippy::print_stdout -D clippy::print_stderr -D clippy::undocumented_unsafe_blocks -D clippy::wildcard_imports
  • npm test -- dashboard.spec.ts (in e2e/dashboard)
  • code-reviewer: no blocking findings
## Summary - filter list_sessions with the same per-agent session access predicate used by get_session - bind dashboard SSE/WS connections to the persisted local-owner identity and refcount shared AgentRegistry entries - repair legacy local-owner grants once, document seed-file security, and surface dashboard list warnings without discarding sessions - add regression coverage across Rust integration tests and dashboard Playwright tests ## Verification - cargo fmt --all --check - git diff --check - diff secret-pattern scan - cargo test --workspace - cargo clippy --workspace --all-targets -- -D warnings -D clippy::unwrap_used -D clippy::panic -D clippy::dbg_macro -D clippy::print_stdout -D clippy::print_stderr -D clippy::undocumented_unsafe_blocks -D clippy::wildcard_imports - npm test -- dashboard.spec.ts (in e2e/dashboard) - code-reviewer: no blocking findings
Two related defects converged on the same user-visible symptom: the dashboard
showed "Could not render - schema may have changed (please file a bug)" when a
user clicked a session their agent could not read.

Root cause (server, primary):
list_sessions called SessionStorage directly with no per-agent filter, while
get_session went through SessionAuthConfig::session_role_resolver. The two
endpoints disagreed on what each agent could see — the list returned every
session in SQLite, then get_session denied most of them.

Root cause (client, contributing):
schema.js assertSessionShape collapsed every non-JSON envelope into
schemaFallbackText, misreporting normal RBAC decisions as schema drift.

Server fix:
- New handle_list_sessions_filtered + SessionAccessCheck in cognix-mcp.
- Bridge in cognix-server/registration reads _cognix.caller_agent_id (already
  injected by auth.rs:568) and applies the same role-resolver as get_session.
- bound_session_id added to RegistrationServices so list mirrors auth-layer
  fallback (caller has implicit access to its own bound session).
- filtered N session(s) warning added when sessions are dropped — no silent
  partials.

Dashboard fix:
- New ERR_PERMISSION_DENIED tag in schema.js; assertSessionShape promotes
  [permission denied] text envelopes to a tagged Error instead of fallback.
- sessions.js selectSession catch renders a clear "You do not have access to
  this session" message + truncated server reason instead of the generic
  schemaFallbackText.
- assertThoughtsTextShape intentionally unchanged (preserves existing
  "header still renders + denial in #thought-cards" contract).

Also fixed in this branch: status.js race where renderStatus() asserted both
health endpoints with the eager || gate, briefly flashing "Schema changed"
on bootstrap (changed to &&; latestHealth/latestServerHealth retain values
across subsequent failures).

Tests:
- 2 unit tests in cognix-mcp covering filter on/off behavior.
- 1 Playwright test for get_session denial in dashboard.
- 1 Playwright test for the status bootstrap race.
- 14 RegistrationServices test sites updated with bound_session_id stub
  (filter inert without _cognix.caller_agent_id, preserves backward compat).

Verification:
- cargo test --workspace: passing
- cargo clippy with project deny list: clean
- cargo fmt: clean
- e2e/dashboard playwright: 27/27 passing
Generalizes the ERR_PERMISSION_DENIED tag (added in 82dd71b) into a
five-code taxonomy so dashboard renderers can distinguish RBAC denials,
missing resources, invalid input, tool failures, and genuine schema drift.

Before: every assert* helper threw `new Error(schemaFallbackText)` ("please
file a bug"), so RBAC denials and missing resources were misreported as
bugs.

After: classifyTextEnvelope() inspects the wire-side text prefix
([permission denied] / [invalid] / [error]) and throws an Error with a
typed `error.code`. Renderers branch on code; schemaFallbackText is
reserved for ERR_SCHEMA_DRIFT only.

Wire contract unchanged — the server still emits `[error] ...` / `[invalid] ...` /
`[permission denied] ...` text envelopes, which other clients depend on.

- schema.js: add ERR_NOT_FOUND, ERR_INVALID_ARG, ERR_TOOL_ERROR,
  ERR_SCHEMA_DRIFT + taggedError(code, msg) + schemaDriftError() factory.
  assertSessionListShape / assertSessionShape / assertThoughtsTextShape
  now route text envelopes through classifyTextEnvelope() before the
  shape check. [permission denied] passthrough in assertThoughtsTextShape
  preserved (documented inline — get_thoughts intentionally renders
  denials inline in #thought-cards instead of blowing away the header).

- sessions.js: factor renderPermissionDenied into renderTypedDetailMessage
  + four typed renderers (denied / not-found / invalid / tool-error) +
  a renderSessionLoadError dispatcher. selectSession's catch now delegates
  to the dispatcher. Unknown/untagged codes still fall through to
  renderFallback (the schemaFallbackText path) as the conservative default.

- status.js: import schemaDriftError + ERR_SCHEMA_DRIFT. renderStatus's
  internal throws use schemaDriftError() so the code propagates. catch
  branches on ERR_SCHEMA_DRIFT (pill="Schema changed") vs other codes
  (pill="Error") — design hedge for future asserts gaining envelope
  classification, so a tool error never gets mislabelled as schema drift.

Out of scope (deferred): charts.js / graph.js still use schemaFallbackText
raw — those panels render markdown blobs / mermaid graphs where the
prefix-classification pattern doesn't directly apply. Tracked for a
future pass.
Symmetric to the just-landed list_sessions/get_session RBAC alignment: an
authenticated caller distinct from local_owner could pass the create_session
auth check (their bound-session role authorises it), succeed, and then have
their own get_session(new_id) and list_sessions filter hide the very session
they just created. The session_role_resolver bound-session fallback does not
apply to a brand-new session_id, so the only way for the caller to see their
own session is to be in its permissions map.

handle_create_session now reads _cognix.caller_agent_id (already injected by
the auth layer for every authorised SessionMetadata call) and, when the
caller differs from local_owner, auto-grants them Primary on the new session
via the existing SessionRegistry::grant path. Stdio/Advisory callers
(caller == local_owner) skip the grant -- register_local_session already
places local_owner as Primary, and a self-grant would be redundant.

Adds three unit tests:
- RED-first repro that caller != local_owner gets Primary on the new session
- Backward-compat: caller == local_owner produces one permissions entry
- Backward-compat: missing _cognix.caller_agent_id falls back cleanly
Regression guard for PR #82 (dashboard RBAC drift). The auth layer's
session_role_resolver denied sessions the dashboard's caller did not
own, while list_sessions returned every row in storage. The two
endpoints now share a per-caller predicate via
handle_list_sessions_filtered; this test asserts they remain in
lockstep across a randomized fixture.

Coverage: 60 generated cases across session counts {0, 1, 5, 20},
roles {Primary, Observer, Reader, Analyst}, shared/unshared sessions,
and callers {owner, granted-agent, fresh-no-access}. Hand-rolled
xorshift64 PRNG with fixed seed — no proptest/quickcheck in workspace,
no new dependencies added.

RED/GREEN proof: temporarily swapping handle_list_sessions_filtered
for the unfiltered handle_list_sessions reproduces the bug; the
filtered path passes. Meta-assertions guard against generator
regressions that would make the property vacuously true.
Adds one positive-case test per typed error code introduced in the
schema.js taxonomy refactor, plus a regression guard for the
ERR_SCHEMA_DRIFT case (so the "please file a bug" copy can't get
accidentally repurposed for normal RBAC / not-found / invalid / tool
errors).

- mock-server.ts: add `state.textEnvelopes: Record<string, string>` knob
  that returns text envelopes with `isError: false`, mirroring the
  production server's `ToolResult::ok(format!("[error] ..."))` pattern.
  Distinct from `toolErrors` (which sets `isError: true` and causes
  rpc.js to throw before the envelope reaches the schema asserts —
  unsuitable for exercising the schema.js classifier).

- dashboard.spec.ts: 4 new tests, all driving get_session through the
  classifier:
    1. ERR_NOT_FOUND   — '[error] session not found: sess-A'
    2. ERR_INVALID_ARG — '[invalid] session_id must be a UUID'
    3. ERR_TOOL_ERROR  — '[error] storage backend unreachable'
    4. ERR_SCHEMA_DRIFT — malformedGetSession (regression guard:
       asserts schemaFallbackText IS shown for genuine drift, so the
       refactor cannot collapse the wording into the typed cases).

Each test asserts (a) the panel does NOT show schemaFallbackText for
non-drift cases and (b) the typed copy contains a stable substring that
distinguishes the class. Per-class wording is implementation-defined;
the tests assert on regex substrings rather than exact copy so cosmetic
changes don't break them.

Full dashboard.spec.ts now: 31 passing (was 27). Full e2e suite: 45
passing.
The `<meta http-equiv="Content-Security-Policy">` duplicated what the
server already sends via `ui/mod.rs::with_csp` HTTP header, and browsers
correctly skip `frame-ancestors` when it arrives via meta — emitting the
console warning the user reported. Two CSP sources also create drift
risk: any future hardening (nonces, additional directives) would have to
land in two places.
The dashboard's `list_sessions` was returning `{sessions:[], total:0,
warnings:["filtered 20 session(s) the caller is not authorised to read"]}`
even though 20 locally-owned sessions existed in storage. Two converged
gaps:

1. `transport/sse.rs::handle_mcp_sse` and `transport/websocket.rs::
   handle_connection` were minting a brand-new ephemeral `AgentSession::
   new(Primary)` per connection. That fresh UUID had no role on any
   stdio-created session, so the post-bridge RBAC filter (from the
   previous commits) correctly dropped every entry.
2. `setup.rs::uses_local_owner_bootstrap` returned true only for stdio,
   so even threading the registry's local_owner identity through to SSE
   would have collapsed: SSE was generating its own random `local_owner`
   UUID at every startup, distinct from the stdio-persisted one.

Fix:
- New `AgentSession::with_id` constructor so transports can bind to a
  pre-existing principal instead of always minting one.
- `SseTransport` and `WsTransport` now accept `local_owner_agent_id:
  Option<AgentId>`, threaded from `main.rs` via `run_transport`. When
  Some, every authenticated connection adopts that canonical identity.
- Refcount-aware cleanup in both transports: only retire the
  `AgentRegistry` entry when the last sharer disconnects. SSE uses its
  existing connections map; WS gains an `Arc<Mutex<HashMap<AgentId,
  usize>>>` for the same purpose. Ephemeral identities have refcount 1
  so legacy behaviour is preserved.
- `uses_local_owner_bootstrap` now returns true whenever storage is
  SQLite, not only for stdio. This is what makes the persisted seed
  file (~/.cognix/) the single source of truth for local_owner across
  every transport sharing the same DB.

Test coverage:
- `test_sse_binds_to_local_owner_when_configured` — pins the
  principal-binding contract.
- `test_sse_mints_ephemeral_agent_when_no_local_owner` — pins the
  back-compat fallback for deployments without a local owner.
- `test_sse_refcount_keeps_local_owner_when_one_of_two_closes` — pins
  the refcount guard so closing one dashboard tab never strands a
  concurrent one.

Existing tests updated to pass `None` for the new arg (status quo).
fix: harden dashboard session access
Some checks failed
CI / Detect Changes (pull_request) Successful in 12s
CI / Integration Tests (pull_request) Has been skipped
CI / Benchmarks (pull_request) Has been skipped
CI / Security Scan (pull_request) Successful in 21s
CI / Format (pull_request) Successful in 28s
CI / Check file lengths (pull_request) Failing after 31s
CI / Conventional Validation (pull_request) Successful in 47s
CI / Clean Build Sample 1 (pull_request) Has been skipped
CI / Clean Build Sample 3 (pull_request) Has been skipped
CI / Clean Build Sample 2 (pull_request) Has been skipped
CI / Clean Build Summary (pull_request) Has been skipped
CI / Test (pull_request) Failing after 39s
CI / Check (linux-aarch64 compile-validation) (pull_request) Successful in 1m27s
CI / Dashboard Browser (pull_request) Successful in 1m48s
CI / Documentation (pull_request) Successful in 2m12s
CI / Clippy (pull_request) Successful in 2m43s
CI / Audit (CVEs) (pull_request) Successful in 5m46s
CI / Deny (pull_request) Successful in 6m4s
CI / Dashboard UI Build (pull_request) Successful in 7m51s
CI / Coverage (80% gate) (pull_request) Successful in 8m44s
CI / RSS gate (P-15) (pull_request) Successful in 6m22s
CI / Build (release) (pull_request) Successful in 8m3s
CI / PR Size Check (pull_request) Successful in 10s
CI / CI Report (pull_request) Successful in 5s
CI / D-02 Clean Build Gate (pull_request) Successful in 3m27s
b3a8ff95c8
refactor: split oversized files to satisfy 800-line CI gate
Some checks failed
CI / Detect Changes (pull_request) Successful in 10s
CI / Deny (pull_request) Failing after 3s
CI / Integration Tests (pull_request) Has been skipped
CI / Benchmarks (pull_request) Has been skipped
CI / Security Scan (pull_request) Failing after 2s
CI / Check file lengths (pull_request) Failing after 2s
CI / Check (linux-aarch64 compile-validation) (pull_request) Failing after 3s
CI / Format (pull_request) Failing after 24s
CI / Conventional Validation (pull_request) Successful in 1m29s
CI / Clean Build Sample 1 (pull_request) Has been skipped
CI / Clean Build Sample 2 (pull_request) Has been skipped
CI / Clean Build Sample 3 (pull_request) Has been skipped
CI / Clean Build Summary (pull_request) Has been skipped
CI / Dashboard Browser (pull_request) Successful in 2m13s
CI / Documentation (pull_request) Successful in 3m35s
CI / D-02 Clean Build Gate (pull_request) Successful in 4m3s
CI / Clippy (pull_request) Successful in 4m19s
CI / Build (release) (pull_request) Has been skipped
CI / RSS gate (P-15) (pull_request) Has been skipped
CI / Dashboard UI Build (pull_request) Failing after 4m31s
CI / PR Size Check (pull_request) Successful in 17s
CI / Audit (CVEs) (pull_request) Successful in 8m35s
CI / Test (pull_request) Successful in 9m45s
CI / Coverage (80% gate) (pull_request) Successful in 10m44s
CI / CI Report (pull_request) Successful in 8s
9672706de0
Both registration/thought.rs (892 lines) and transport/sse.rs (831 lines)
exceeded the CI file-length limit, causing Test and Check file lengths jobs
to fail. Extract the test suite from thought.rs into thought_tests.rs (using
the same #[path] pattern as sse_tests.rs) and move stateless SSE helpers
(auth utils, loopback predicates, dashboard RPC helpers) into sse_utils.rs.

Result: thought.rs → 393 lines, sse.rs → 714 lines — both under 800.
fix: remove unused imports and reformat to pass CI
All checks were successful
CI / Detect Changes (pull_request) Successful in 16s
CI / Format (pull_request) Has been skipped
CI / Test (pull_request) Has been skipped
CI / Clippy (pull_request) Has been skipped
CI / D-02 Clean Build Gate (pull_request) Has been skipped
CI / Benchmarks (pull_request) Has been skipped
CI / Deny (pull_request) Has been skipped
CI / Check (linux-aarch64 compile-validation) (pull_request) Has been skipped
CI / Security Scan (pull_request) Has been skipped
CI / RSS gate (P-15) (pull_request) Has been skipped
CI / Coverage (80% gate) (pull_request) Has been skipped
CI / Integration Tests (pull_request) Has been skipped
CI / Dashboard UI Build (pull_request) Has been skipped
CI / Documentation (pull_request) Has been skipped
CI / Build (release) (pull_request) Has been skipped
CI / Check file lengths (pull_request) Has been skipped
CI / Dashboard Browser (pull_request) Has been skipped
CI / Audit (CVEs) (pull_request) Has been skipped
CI / PR Size Check (pull_request) Successful in 12s
CI / Conventional Validation (pull_request) Successful in 51s
CI / Clean Build Sample 1 (pull_request) Has been skipped
CI / Clean Build Sample 2 (pull_request) Has been skipped
CI / Clean Build Sample 3 (pull_request) Has been skipped
CI / Clean Build Summary (pull_request) Has been skipped
CI / CI Report (pull_request) Successful in 13s
89c07ebf41
Remove unused `JsonRpcRequest` and `ServerHealth` imports from sse.rs
that were left after the dashboard session access refactor. Run cargo fmt
to normalize line breaks in thought_tests.rs and thought.rs.
jesse merged commit 14e35f0091 into main 2026-05-18 02:04:53 +02:00
jesse deleted branch fix/dashboard-rbac-list-get-consistency 2026-05-18 02:04:54 +02:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
jesse/cognix!82
No description provided.