k_card/Setup.md

623 lines
34 KiB
Markdown

# Setup
Last updated: 2026-04-25
This is a living setup/status file for the local ChromeCard workspace at `/home/user/chromecard`.
Update this file whenever environment status or verified behavior changes.
## Repository Policy
- Treat `/home/user/chromecard/CR_SDK_CK-main` as read-only in this workflow.
- Do not add or modify helper/test scripts inside `CR_SDK_CK-main`.
- Keep host-side helper scripts at workspace root (`/home/user/chromecard`).
## Documentation Maintenance
- Canonical living status docs for this workspace are:
- `/home/user/chromecard/Setup.md`
- `/home/user/chromecard/Workplan.md`
- After each meaningful execution step, update at least:
- `Setup.md` for observed environment/runtime state
- `Workplan.md` for phase progress and next blocking action
- Keep helper script paths consistent in docs:
- `/home/user/chromecard/fido2_probe.py`
- `/home/user/chromecard/webauthn_local_demo.py`
- Treat `CR_SDK_CK-main/README_HOST.md` as historical reference unless its script paths are aligned with this workspace policy.
## Scope
- Experimental ChromeCard connected over USB.
- Firmware source tree: `/home/user/chromecard/CR_SDK_CK-main`.
- Host-side FIDO2 demo tools:
- `/home/user/chromecard/fido2_probe.py`
- `/home/user/chromecard/webauthn_local_demo.py`
- Target runtime platform: Qubes OS with 3 AppVMs:
- `k_client` (browser + enrollment process)
- `k_proxy` (card-connected proxy/auth client)
- `k_server` (protected resource/backend)
## Planned Transport Evolution
- Current phase assumption: card is connected directly to `k_proxy` (USB).
- Future target: card is connected to a phone, and `k_proxy` performs validation through a wireless link to that phone.
- Design implication: keep authenticator transport behind an abstraction in `k_proxy` so USB-direct and phone-wireless backends can be swapped without changing client/server API contracts.
## Target Qubes Topology
- Base template for all AppVMs: `debian-13-xfce`.
- Allowed network paths:
- `k_client` -> `k_proxy` over TLS
- `k_proxy` -> `k_server` over TLS
- Response traffic returns on those established connections.
- Disallowed direct path:
- `k_client` -> `k_server` (direct access should be blocked).
Functional roles:
- `k_client`:
- Browser-only traffic client.
- Runs a user enrollment process.
- `k_proxy`:
- Current: connected to the ChromeCard over USB.
- Future: connects wirelessly to phone-attached card for validation.
- Accepts TLS requests from `k_client`.
- Uses card-backed FIDO2/WebAuthn operations to authenticate user/session.
- Calls `k_server` over TLS after successful authorization.
- Returns proxied data and session information to `k_client`.
- `k_server`:
- Hosts resource(s) requiring login via the proxy-mediated flow.
- Provides a dummy protected resource for early integration testing (monotonic increasing number/counter).
- May hold user/session state logic needed for authorization decisions.
UI baseline for each AppVM (start-menu visible apps):
- Firefox
- XFCE Terminal
- File Manager
## Target Request Flow
1. `k_client` sends HTTPS request to `k_proxy`.
2. `k_proxy` validates/authenticates user via card-backed flow.
3. If allowed, `k_proxy` opens HTTPS request to `k_server` resource.
4. `k_server` responds to `k_proxy`.
5. `k_proxy` returns response payload to `k_client` plus session state.
6. Subsequent requests reuse session state so card auth is not required every request.
Implementation note:
- `k_proxy` does not need a full web server stack; a minimal TLS API service is sufficient.
- Session state should be integrity-protected (signed/encrypted token or server-side session ID) with TTL and revocation behavior defined.
- `k_proxy` and `k_server` must be safe under concurrent access (thread-safe state handling).
## Minimum Service Behavior (Current Target)
- `k_server`:
- Expose protected endpoint returning an increasing integer value (dummy resource).
- Increment behavior must remain correct under concurrent requests.
- Optionally expose/maintain user/session validation logic.
- `k_proxy`:
- Accept concurrent HTTPS requests from one or more `k_client` instances.
- Perform card-backed auth when no valid session is present.
- Cache and validate session state so repeated requests avoid card access until expiry.
- Forward authorized requests to `k_server` and return upstream data plus session info.
Thread-safety expectation:
- Shared mutable state (counter, session store, user state) must be protected against races.
- Parallel requests must not corrupt session records or return duplicate/skipped counter values caused by unsafe updates.
## Test Topology Requirement
- Support concurrency testing from multiple simultaneous clients:
- multiple browser tabs/processes in one `k_client`, and/or
- multiple `k_client` AppVM instances if available.
- Validate both correctness and stability under load:
- session reuse works as intended
- unauthorized access stays blocked
- protected counter/resource remains consistent.
## Current Status Snapshot (2026-04-24)
- AppVM OS version is confirmed: Debian `13.4` (`k_server`, and same on `k_client`/`k_proxy`).
- Python in AppVMs is available: `Python 3.13.5`.
- `python3 /home/user/chromecard/fido2_probe.py --list` in `k_proxy` now detects ChromeCard on `/dev/hidraw0` (`vid:pid=4617:5`).
- HID raw device nodes are now visible in `k_proxy`:
- `/dev/hidraw0` -> `crw-rw----+`
- `/dev/hidraw1` -> `crw-------`
- `python3 /home/user/chromecard/fido2_probe.py --json` succeeds and returns CTAP2 `getInfo`:
- versions: `["FIDO_2_0"]`
- aaguid: `1234567890abcdef0123456789abcdef`
- options: `rk=false`, `up=true`, `uv=true`
- max_msg_size: `1024`
- Local WebAuthn demo (`http://localhost:8765` in `k_proxy`) succeeded:
- register: `ok=true`, `username=alice`, `credential_count=1`
- login/auth: `ok=true`, `username=alice`, `authenticated=true`
- Phase 5 prototype services are now available:
- `/home/user/chromecard/k_proxy_app.py`
- `/home/user/chromecard/k_server_app.py`
- `/home/user/chromecard/PHASE5_RUNBOOK.md`
- Remote VM access is now available via SSH/SCP aliases:
- command execution: `ssh <host> <cmd>`
- file copy to VM home: `scp <file> <host>:~`
- validated hosts: `k_client`, `k_proxy`, `k_server`
- `west` is not currently installed/in PATH: `west not found`.
- The checked-out `CR_SDK_CK-main` tree appears incomplete for documented sysbuild role layout:
- missing: `mvp`, `setup`, `components`, `samples`
- `CR_SDK_CK-main/scripts/build_flash_mvp.sh` exists, but it expects the above role directories.
- Python helper scripts were intentionally moved out of `CR_SDK_CK-main/scripts` and are now maintained at workspace root.
- Qubes AppVM baseline is now up: `k_client`, `k_proxy`, `k_server` can start and have terminals running.
Implication:
- Live FIDO2 connectivity from `k_proxy` to ChromeCard is confirmed over USB HID/CTAPHID.
- Local browser WebAuthn register/login flow is confirmed working in `k_proxy`.
- We cannot currently run the documented firmware build/flash flow.
Session note (2026-04-24):
- Markdown tracking was reviewed and normalized around `Setup.md` + `Workplan.md` as the active, continuously updated execution record.
- AppVM template decision recorded: use `debian-13-xfce` for `k_client`, `k_proxy`, and `k_server`.
- VM start attempt failed with Xen toolstack error: `libxenlight have failed to create new domain 'k_client'`.
- VM start blocker was resolved by reducing VM memory to `400` MiB; all three AppVMs now start.
- Runtime check from VMs: Debian `13.4` and Python `3.13.5`; `k_proxy` still shows `no hidraw devices`.
- After USB assignment to `k_proxy`, `/dev/hidraw0` and `/dev/hidraw1` appeared.
- CTAP probe re-run succeeded with detected ChromeCard device and valid CTAP2 `getInfo` response.
- Local WebAuthn demo completed successfully for user `alice` (register + login).
- Phase 5 starter implementation added with session TTL, logout/invalidation, and proxy->server protected counter forwarding.
Session note (2026-04-24, doc maintenance):
- Top-level Markdown files were re-scanned: `PHASE5_RUNBOOK.md`, `Setup.md`, `Workplan.md`.
- `PHASE5_RUNBOOK.md` remains consistent with the current Phase 5 prototype paths and flow.
- No plan/setup drift was found requiring behavioral changes; docs remain aligned.
- SSH-based VM operation was validated for `k_client`, `k_proxy`, `k_server` (Debian `13.4` confirmed remotely).
- SCP file transfer to `k_proxy` home directory was validated with read-back.
Session note (2026-04-24, remote flow diagnostics):
- VM script staging gap found: `/home/user/chromecard/k_proxy_app.py`, `k_server_app.py`, and helper files were missing on AppVMs and were copied via `scp`.
- Services were started in VMs and verified locally:
- `k_proxy` local health OK on `127.0.0.1:8770` and `127.0.0.1:8771`
- `k_server` local health OK on `127.0.0.1:8780`
- Verified VM IPs during this run:
- `k_proxy`: `10.137.0.12`
- `k_server`: `10.137.0.13`
- `k_client`: `10.137.0.16`
- Current chain failure is network pathing/firewall:
- `k_client -> k_proxy` (`10.137.0.12:8771`) times out.
- `k_proxy -> k_server` (`10.137.0.13:8780`) times out.
- Proxy returns upstream error payload: `server unavailable: timed out`.
Session note (2026-04-24, markdown re-scan):
- Re-read top-level workspace Markdown files: `Setup.md`, `Workplan.md`, `PHASE5_RUNBOOK.md`.
- Re-skimmed source-tree reference docs in `CR_SDK_CK-main`, including `BUILD.md`, `README.md`, `README_HOST.md`, `RELEASE.md`, and `distribute_bundle.md`.
- Current workspace docs remain aligned with the verified execution record.
- Source-tree doc drift remains unchanged:
- `README_HOST.md` still points to `./scripts/fido2_probe.py` and `./scripts/webauthn_local_demo.py`.
- Active workspace policy continues to treat those paths as historical; maintained helper paths remain `/home/user/chromecard/fido2_probe.py` and `/home/user/chromecard/webauthn_local_demo.py`.
- Source-tree build docs continue to describe a full SDK layout with `mvp`, `setup`, `components`, and `samples`, which is still not present in the current local checkout snapshot.
Session note (2026-04-24, policy retry):
- Markdown re-scan was retried after local policy changes.
- Re-running the workspace doc scan with a non-login shell completed cleanly, without the earlier SSH/socat startup noise in command output.
Session note (2026-04-24, chain probe retry):
- Re-probed the Qubes access path for `k_client -> k_proxy -> k_server`.
- Local forwarded SSH listener ports still exist on the host:
- `0.0.0.0:2222` -> `qrexec-client-vm 'k_client' qubes.ConnectTCP+22`
- `0.0.0.0:2223` -> `qrexec-client-vm 'k_proxy' qubes.ConnectTCP+22`
- `0.0.0.0:2224` -> `qrexec-client-vm 'k_server' qubes.ConnectTCP+22`
- These forwarded SSH ports currently fail immediately:
- `ssh k_client` / `ssh k_proxy` / `ssh k_server` close immediately on localhost forwarded ports.
- Direct `qrexec-client-vm <target> qubes.ConnectTCP+22` returns `Request refused`.
- Chain ports are currently blocked at the same qrexec layer:
- `qrexec-client-vm k_proxy qubes.ConnectTCP+8770` -> `Request refused`
- `qrexec-client-vm k_server qubes.ConnectTCP+8780` -> `Request refused`
- This means the current blocker is active qrexec policy/service refusal for `qubes.ConnectTCP`, not the Python service code in `k_proxy_app.py` or `k_server_app.py`.
- Separate SSH config issue remains on the host:
- `/etc/ssh/ssh_config.d/20-systemd-ssh-proxy.conf` is still owned `root:root` but mode `777`, which causes OpenSSH to reject it as insecure on the normal login-shell path.
Session note (2026-04-25, post-restart probe):
- Correct client-facing proxy port is `8771` for the current split-VM chain checks.
- SSH to `k_proxy` is working again.
- `k_proxy` card visibility is restored after VM restart and card reconnect:
- `/dev/hidraw0` and `/dev/hidraw1` are present in `k_proxy`
- Current service state after restart:
- `k_proxy` has no listener on `127.0.0.1:8771`
- `k_server` has no listener on `127.0.0.1:8780`
- Current qrexec chain state after restart:
- `qrexec-client-vm k_proxy qubes.ConnectTCP+8771` -> `Request refused`
- `qrexec-client-vm k_server qubes.ConnectTCP+8780` -> `Request refused`
- Practical meaning:
- SSH and card attachment recovered
- phase-5 app services are not currently running in the VMs
- qrexec forwarding for the chain ports is still being refused
Session note (2026-04-25, service restart):
- `k_server_app.py` was restarted successfully in `k_server`:
- PID `1320`
- listening on `127.0.0.1:8780`
- `/health` returns `{"ok": true, "service": "k_server", ...}`
- `k_proxy_app.py` was restarted successfully in `k_proxy`:
- PID `2774`
- listening on `127.0.0.1:8771`
- `/health` returns `{"ok": true, "service": "k_proxy", "active_sessions": 0, ...}`
- Despite local service recovery, qrexec forwarding is still denied:
- `qrexec-client-vm k_proxy qubes.ConnectTCP+8771` -> `Request refused`
- `qrexec-client-vm k_server qubes.ConnectTCP+8780` -> `Request refused`
Session note (2026-04-25, markdown refresh):
- Re-read the active workspace markdown files:
- `Setup.md`
- `Workplan.md`
- `PHASE5_RUNBOOK.md`
- Corrected the Phase 5 runbook to distinguish the old same-VM quickstart from the current split-VM chain usage.
- Current documented client-facing proxy port for split-VM tests is `8771`.
- Current documented blocker remains unchanged:
- local service health inside `k_proxy` and `k_server` is good
- inter-VM forwarding via `qubes.ConnectTCP` is still refused
Session note (2026-04-25, Phase 2 HTTPS bring-up):
- Added direct TLS support to:
- `/home/user/chromecard/k_proxy_app.py`
- `/home/user/chromecard/k_server_app.py`
- Added local certificate generator:
- `/home/user/chromecard/generate_phase2_certs.py`
- Generated local CA and service certs at:
- `/home/user/chromecard/tls/phase2/ca.crt`
- `/home/user/chromecard/tls/phase2/k_proxy.crt`
- `/home/user/chromecard/tls/phase2/k_server.crt`
- Certificate generation was corrected to include subject key identifier and authority key identifier so Python TLS verification succeeds.
- Current validated HTTPS shape is Qubes-localhost forwarding, not raw VM-IP routing:
- in `k_client`: `qvm-connect-tcp 9771:k_proxy:8771`
- in `k_proxy`: `qvm-connect-tcp 9780:k_server:8780`
- `k_proxy` listens on `https://127.0.0.1:8771`
- `k_server` listens on `https://127.0.0.1:8780`
- `k_proxy` upstream is `https://127.0.0.1:9780`
- Verified HTTPS checks:
- `k_client -> k_proxy` `/health` over TLS succeeds with `--cacert /home/user/chromecard/tls/phase2/ca.crt`
- `k_proxy -> k_server` `/health` and `/resource/counter` over TLS succeed through the `9780` forwarder
- end-to-end `k_client -> k_proxy -> k_server` login + session reuse succeeded over HTTPS
- End-to-end verified results:
- login returned `ok=true` for `alice`
- first protected counter call returned value `1`
- second protected counter call returned value `2`
- session status remained valid after reuse
Session note (2026-04-25, Phase 2.5 ownership and concurrency):
- Current prototype state ownership is now explicit:
- `k_proxy` is authoritative for session state
- `k_server` is authoritative for protected resource state
- `k_client` is not authoritative for either session validity or counter/resource state
- Current session model in `k_proxy`:
- server-side in-memory session store only
- opaque bearer token generated by `secrets.token_urlsafe(32)`
- per-session fields are `username` and `expires_at`
- expiry is enforced in `k_proxy`; `k_server` does not validate client sessions directly
- Current resource model in `k_server`:
- in-memory monotonic counter guarded by a lock
- access allowed only when request arrives from `k_proxy` with the expected `X-Proxy-Token`
- Current concurrency model in code:
- both services use `ThreadingHTTPServer`
- `k_proxy` protects session-map mutations and garbage collection with a single lock
- `k_server` protects counter increments with a single lock
- TLS verification and upstream fetches happen outside the session lock in `k_proxy`
- Current runtime assumptions and limits:
- Qubes localhost forwarders are treated as transport plumbing, not as state authorities
- if `k_proxy` restarts, in-memory sessions are lost
- if `k_server` restarts, the in-memory counter resets
- the current shared `X-Proxy-Token` is a prototype trust mechanism, not a final authorization design
- Practical meaning:
- race-free behavior is currently defined for session CRUD and counter increments inside one process per VM
- persistence, distributed session authority, and multi-proxy/multi-server coordination are not implemented yet
Session note (2026-04-25, Phase 6 client portal prototype):
- Added browser-facing client process:
- `/home/user/chromecard/k_client_portal.py`
- Current Phase 6 prototype shape:
- portal runs in `k_client` on `http://127.0.0.1:8766`
- portal keeps local enrolled username state in `k_client`
- portal calls `k_proxy` over the validated TLS forward `https://127.0.0.1:9771`
- Current local enrollment model:
- enrollment is a client-local username selection stored by the portal
- no dedicated server-side enrollment API exists yet
- Verified portal API flow in `k_client`:
- `GET /health` returns `ok=true`
- `POST /api/enroll` with `alice` succeeds
- `POST /api/login` succeeds and returns a proxy session token
- `POST /api/status` succeeds
- `POST /api/resource/counter` succeeds twice with upstream values `3` and `4`
- `POST /api/logout` succeeds
- Current implication:
- `k_client` now has a concrete client-side process instead of only runbook curls
- browser-facing flow is now available through the local portal
- next hardening step is to replace client-local enrollment with the intended enrollment contract and decide whether browser traffic should eventually talk to `k_proxy` directly or continue through a local client portal
Session note (2026-04-25, Phase 6 enrollment contract):
- Added proxy-side enrollment API and storage:
- `POST /enroll/register`
- `GET /enroll/status?username=<name>`
- persisted prototype store at `/home/user/chromecard/k_proxy_enrollments.json` in `k_proxy`
- Current enrollment authority is now `k_proxy`, not the `k_client` portal.
- Current portal behavior:
- portal enrollment calls `k_proxy` over TLS
- portal keeps only a preferred local username for convenience
- portal login now depends on proxy-side enrollment existing
- Verified behavior:
- direct proxy login for unenrolled `bob` returns `{"ok": false, "error": "user not enrolled", ...}`
- portal enrollment of `alice` succeeds and persists in proxy-side enrollment storage
- proxy enrollment status for `alice` returns `ok=true`
- portal login and protected counter access still succeed after enrollment
- Practical meaning:
- Phase 6 now has a real `k_client -> k_proxy` enrollment request path
- the remaining gap is not basic routing; it is deciding the final enrollment semantics and whether the browser should stay behind a local portal or talk to `k_proxy` directly
Session note (2026-04-25, browser target moved to k_proxy):
- `k_proxy` now serves the browser-facing portal UI directly on `/` over `https://127.0.0.1:9771`.
- `k_client_portal.py` is now a temporary bridge page:
- it points users to `https://127.0.0.1:9771/`
- it is no longer the primary browser target
- Verified direct browser/API target behavior from `k_client`:
- `GET https://127.0.0.1:9771/` returns the proxy portal HTML
- `GET https://127.0.0.1:9771/health` returns `ok=true`
- direct `POST /enroll/register` for `carol` succeeds
- direct `POST /session/login` for `carol` succeeds
- Current implication:
- browser traffic is now intended to go straight to `k_proxy`
- the `k_client` portal remains only as a temporary bridge/compatibility layer
Session note (2026-04-25, provisional enrollment hardening):
- The enrollment contract in `k_proxy` is now explicit but provisional.
- Current prototype enrollment rules:
- usernames are canonicalized to lowercase
- allowed username pattern is `3-32` chars using lowercase letters, digits, `.`, `_`, `-`
- optional `display_name` is allowed up to `64` chars
- enrollment create is create-only and duplicate create returns `user already enrolled`
- enrollment update is a separate operation
- enrollment delete is a separate operation and removes any active sessions for that username
- Current enrollment endpoints on `k_proxy`:
- `POST /enroll/register`
- `GET /enroll/status?username=<name>`
- `POST /enroll/update`
- `POST /enroll/delete`
- `GET /enroll/list`
- Verified behavior from `k_client` against `https://127.0.0.1:9771`:
- invalid username `A!` is rejected
- create for `dave` with `display_name` succeeds
- duplicate create for `dave` is rejected
- update for `dave` succeeds
- list returns enrolled users and metadata
- delete for `dave` succeeds
- login for deleted `dave` fails with `user not enrolled`
- Deliberate current limit:
- enrollment itself still does not require card presence; only login does
- this was kept lightweight because the enrollment semantics are expected to change later
Session note (2026-04-25, Phase 6.5 concurrency probe):
- Added reproducible concurrency probe:
- `/home/user/chromecard/phase65_concurrency_probe.py`
- probe now supports `--max-workers` so client-side fan-out can be swept explicitly
- Successful baseline run from `k_client` against direct proxy path:
- `3` users
- `4` protected requests per user
- `12/12` requests succeeded
- counter values were unique and contiguous from `6` to `17`
- max observed latency was about `457 ms`
- Larger follow-up run exposed current limit:
- `5` users
- `5` protected requests per user
- `18/25` requests succeeded
- failures returned TLS EOF / upstream unavailable errors
- successful counter values were still unique and contiguous from `18` to `35`
- max observed latency was about `758 ms`
- Additional Phase 6.5 diagnosis:
- fixed a keep-alive/body-drain bug in the HTTP/1.1 experiment so `k_server` no longer misparses follow-on requests as `{}POST`
- added an upstream connection pool in `k_proxy`; current default/test setting clamps `k_proxy -> k_server` to one pooled TLS connection
- despite that change, a full fan-out run with `25` in-flight protected calls still fails on client-observed TLS EOFs
- a worker-limited run now passes cleanly:
- `5` users
- `5` protected requests per user
- `25/25` requests succeeded with `--max-workers 10`
- raising client-side fan-out still breaks:
- `22/25` requests succeeded with `--max-workers 15`
- `15/25` requests succeeded with fully unbounded `25` workers in the latest rerun
- Current diagnosis:
- the protected counter and session logic stay correct under load; successful values remain unique and contiguous
- `k_proxy` and `k_server` can complete the requests that actually reach them
- the primary collapse point in current testing is the client-facing Qubes forwarder on `9771`
- `qvm_connect_9771.log` shows `qrexec-agent-data` / data-vchan failures and repeated `xs_transaction_start: No space left on device`
- `qvm_connect_9780.log` also showed earlier qrexec failures, but the latest worker-threshold evidence points first to connection fan-out on `k_client -> k_proxy`
- Practical meaning:
- the application logic is good for moderate concurrent use in the current prototype
- the direct browser path appears stable around `10` in-flight protected calls in the current Qubes setup
- the current concurrency ceiling is being set by Qubes forwarding behavior rather than by the monotonic counter logic
Session note (2026-04-25, in-VM forwarding test):
- Tested the intended in-VM forwarding path with `qvm-connect-tcp` instead of host-side `qrexec-client-vm`.
- Forwarders start and bind locally:
- in `k_client`: `qvm-connect-tcp 8771:k_proxy:8771` binds `localhost:8771`
- in `k_proxy`: `qvm-connect-tcp 8780:k_server:8780` binds `localhost:8780`
- But the actual client->proxy connection is still refused when used:
- `k_client` forward log shows `Request refused`
- `socat` reports child exit status `126` and `Connection reset by peer`
- Local login on `k_proxy` reaches the app but fails on the auth dependency:
- `POST /session/login` to `http://127.0.0.1:8771` returns `401`
- details: `Missing dependency: python-fido2 ... No module named 'fido2'`
- `k_server` was not reached during this login test; current `k_server.log` only shows `/health`.
Session note (2026-04-25, after python3-fido2 install):
- `k_proxy` was restarted after `python3-fido2` installation and now listens again on `127.0.0.1:8771`.
- The previous Python import blocker is resolved; local login now reaches the CTAP probe path.
- Current local login result on `k_proxy`:
- `{"ok": false, "error": "card auth failed", "details": "No CTAP HID devices found."}`
- Current forwarded login result from `k_client` is still not completing:
- `curl http://127.0.0.1:8771/session/login` -> `Empty reply from server`
- `qvm_connect_8771.log` still shows repeated `Request refused` and child exit status `126`
- Practical meaning:
- Python dependency issue in `k_proxy` is fixed
- card access inside `k_proxy` is currently missing again at CTAP/HID level
- `k_client -> k_proxy` qrexec forwarding is still effectively denied/refused
Session note (2026-04-25, card reattached):
- Card visibility in `k_proxy` is restored again:
- `/dev/hidraw0` and `/dev/hidraw1` present
- `fido2_probe.py --list` detects ChromeCard on `/dev/hidraw0`
- Local login on `k_proxy` now succeeds again:
- `POST /session/login` on `127.0.0.1:8771` returns `200`
- session creation for user `alice` succeeded
- Remaining failure is isolated to the client-facing qrexec path:
- `k_client` -> `localhost:8771` through `qvm-connect-tcp` still returns `Empty reply from server`
- `qvm_connect_8771.log` still shows `Request refused`
Session note (2026-04-25, clean forward retest):
- Re-ran both forwards and exercised each hop immediately after local bind.
- `k_proxy -> k_server`:
- `qvm-connect-tcp 8780:k_server:8780` binds `localhost:8780` in `k_proxy`
- first real `POST /resource/counter` through that forward returns `Empty reply from server`
- `qvm_connect_8780.log` then records `Request refused` with child exit status `126`
- `k_client -> k_proxy`:
- `qvm-connect-tcp 8771:k_proxy:8771` binds `localhost:8771` in `k_client`
- first real `POST /session/login` through that forward returns `Empty reply from server`
- `qvm_connect_8771.log` records `Request refused` with child exit status `126`
- Conclusion from this retest:
- both forwards fail in the same way
- local bind succeeds, but the actual qrexec `qubes.ConnectTCP` request is refused when the first connection is attempted
Session note (2026-04-25, dom0 policy fix validated):
- After changing dom0 policy to use explicit destination VMs instead of `@default` for `qubes.ConnectTCP`, both forwards now work.
- Verified hop 1:
- in `k_proxy`, `POST http://127.0.0.1:8780/resource/counter` with `X-Proxy-Token: dev-proxy-token` succeeds
- response included counter value `1`
- Verified hop 2:
- in `k_client`, `POST http://127.0.0.1:8771/session/login` succeeds
- session token is returned through the `k_client -> k_proxy` forward
- Verified full end-to-end flow from `k_client`:
- login succeeded and returned session token
- `POST /session/status` succeeded
- `POST /resource/counter` succeeded twice with upstream values `2` and `3`
- `POST /session/logout` succeeded
- post-logout `POST /resource/counter` correctly returned `401 invalid or expired session`
- Current conclusion:
- `k_client -> k_proxy -> k_server` chain is operational
- session reuse and logout behavior are working in the current prototype
Session note (2026-04-25, live chain re-validation and regression helper):
- Re-validated the split-VM chain after restart using the current TLS/localhost-forward shape:
- `k_client` local `9771` -> `k_proxy:8771`
- `k_proxy` local `9780` -> `k_server:8780`
- Verified live service state during this run:
- `k_server` local `https://127.0.0.1:8780/health` returned `ok=true`
- `k_proxy` local `https://127.0.0.1:8771/health` returned `ok=true`
- `k_proxy` local `https://127.0.0.1:9780/health` reached `k_server`
- `k_client` local `https://127.0.0.1:9771/health` reached `k_proxy`
- Verified end-to-end behavior from `k_client`:
- login for `alice` succeeded
- session status succeeded
- protected counter calls succeeded with session reuse
- logout succeeded
- post-logout protected access returned `401 invalid or expired session`
- Added reproducible regression helper at:
- `/home/user/chromecard/phase5_chain_regression.sh`
- Verified the new helper end-to-end on 2026-04-25:
- default run uses `20` requests at parallelism `8`
- returned values were unique and gap-free
- latest verified counter range from the helper was `43..62`
- Practical meaning:
- the current blocker is no longer Qubes forwarding for the base Phase 5 chain
- the current next-step gap is auth semantics, not transport bring-up
Session note (2026-04-25, direct FIDO2 auth attempt):
- Added an experimental direct FIDO2 path in `/home/user/chromecard/k_proxy_app.py`:
- runtime switch: `--auth-mode fido2-direct`
- default runtime remains `probe`
- Added a low-level CTAP helper at `/home/user/chromecard/raw_ctap_probe.py`:
- purpose: bypass `Fido2Client` and exercise raw CTAP2 `makeCredential` / `getAssertion`
- logs keepalive callbacks and exact transport exceptions for host-side debugging
- Direct-mode intent:
- replace the legacy `fido2_probe.py --json` session gate
- perform real credential registration and real assertion verification locally in `k_proxy` with `python-fido2`
- Current observed blocker on `k_proxy`:
- direct `make_credential` fails with `No compatible PIN/UV protocols supported!`
- reproduces outside the app in a minimal VM-side probe, so this is not just a handler bug
- likely cause is the current card / `python-fido2` stack selecting a PIN/UV-dependent CTAP2 path for registration
- Additional probe:
- a forced CTAP1 fallback experiment did not fail immediately, but also did not complete quickly enough to treat as a usable working path in this turn
- Latest live blocker (2026-04-25, after refactor/deploy):
- direct probing is currently blocked before the card Yes/No UI stage because `k_proxy` no longer sees any CTAP HID device
- `ssh k_proxy "python3 /home/user/chromecard/fido2_probe.py --list"` now returns `No CTAP HID devices found.`
- `ssh k_proxy "ls -l /dev/hidraw*"` shows no `hidraw` nodes at the moment
- Follow-up after card reattach (2026-04-25):
- `k_proxy` again shows `/dev/hidraw0` and `/dev/hidraw1`
- direct node-open check confirms `/dev/hidraw0` is readable as the normal user
- `/dev/hidraw1` still returns `PermissionError: [Errno 13] Permission denied`
- raw `makeCredential` probe still produced no on-card registration prompt, so the host path is hanging before the firmware Yes/No UI
- Practical outcome for this session:
- the experimental direct mode is kept in code for follow-up work
- the deployed `k_proxy` service was restored to default `probe` mode
- verified `alice` login still works afterward, so the validated Phase 5 baseline remains intact
## Known FIDO2 Transport Boundary
- FIDO2 on this firmware is handled via USB HID (CTAPHID), not Wi-Fi/BLE/MQTT.
- Key code points in `CR_SDK_CK-main`:
- `mgr_fido2.c`: `mgr_fido2_init()` registers `fido2_ctaphid_handle_packet`.
- `ctaphid.c`: `fido2_ctaphid_handle_packet(...)`.
- `cr_config.h`: FIDO2 HID report descriptor definitions.
## Host Bring-Up Steps (How To Get To A Working FIDO2 Check)
1. Confirm USB enumeration and HID visibility.
- Replug card with a known data-capable cable.
- Check: `ls -l /dev/hidraw*`
2. If needed, grant Linux HID access for this device.
- Add rule at `/etc/udev/rules.d/70-chromecard-fido.rules`:
```udev
SUBSYSTEM=="hidraw", ATTRS{idVendor}=="1209", ATTRS{idProduct}=="0005", MODE="0660", TAG+="uaccess"
```
- Reload/apply rules and replug the device.
3. Verify CTAP HID presence.
- `python3 /home/user/chromecard/fido2_probe.py --list`
- Then:
- `python3 /home/user/chromecard/fido2_probe.py --json`
- For raw CTAP debugging on `k_proxy`:
- `python3 /home/user/chromecard/raw_ctap_probe.py info`
- `python3 /home/user/chromecard/raw_ctap_probe.py make-credential --rp-id localhost`
4. Run local WebAuthn bring-up demo.
- `python3 /home/user/chromecard/webauthn_local_demo.py`
- Open `http://localhost:8765` (use `localhost`, not `127.0.0.1`).
5. Execute register/login test.
- Register a user.
- Login with the same user.
- Confirm no origin/challenge mismatch errors.
## Build/Flash Prerequisites (How To Get To Firmware Build)
1. Ensure full SDK checkout layout exists under `CR_SDK_CK-main`:
- `mvp`
- `setup`
- `components`
- `samples`
2. Ensure toolchain is available in shell:
- `west --version`
- `nrfjprog --version`
3. Once layout/tooling are in place, run:
- `cd /home/user/chromecard/CR_SDK_CK-main`
- `./scripts/build_flash_mvp.sh`
## Open Gaps To Resolve
- Whether a full `CR_SDK_CK-main` checkout (with role directories) is available locally.
- Whether server-side code should be pulled now for broader CIP/WebAuthn integration testing.
- Exact enrollment process interface running in `k_client` and how it reaches `k_proxy`.
- Upgrade Phase 5 auth gate from card-presence probe to full WebAuthn assertion verification for session creation.
- Determine the viable path for real credential registration on `k_proxy`:
- enable whatever PIN/UV support the card expects for direct CTAP2 registration, or
- adopt a different one-time enrollment path that can persist real credential material for later direct assertion verification.
- Restore card visibility inside `k_proxy` so direct probes can reach the card UI again:
- `/dev/hidraw*` must exist in `k_proxy`
- `fido2_probe.py --list` must detect the card before the raw Yes/No probe can continue
- Identify why the host probe hangs before card UI even with `/dev/hidraw0` readable:
- determine which hidraw interface `python-fido2` is selecting on `k_proxy`
- determine whether the blocked path is on the second HID interface or in the Qubes USB mediation layer
- Precise ownership split of session/user state between `k_proxy` and `k_server`.
- Concrete concurrency limits and acceptance criteria (requests/sec, parallel clients, latency/error thresholds).