764 lines
43 KiB
Markdown
764 lines
43 KiB
Markdown
# Workplan
|
|
|
|
Last updated: 2026-05-08
|
|
|
|
This is the execution plan for making ChromeCard FIDO2 development and validation reproducible on this machine.
|
|
|
|
## Constraints
|
|
|
|
- Treat `/home/user/chromecard/CR_SDK_CK-main` as read-only.
|
|
- Keep helper scripts such as `fido2_probe.py` and `webauthn_local_demo.py` at `/home/user/chromecard`.
|
|
- Target deployment model is Qubes OS with 3 AppVMs based on `debian-13-xfce`: `k_client`, `k_proxy`, `k_server`.
|
|
- Current authenticator link is card->`k_proxy` (USB), but architecture must allow migration to wireless phone-mediated validation.
|
|
- VM execution path is SSH-first for experiments: `ssh <host> <cmd>` and `scp <file> <host>:~`.
|
|
|
|
## Goals
|
|
|
|
- Re-establish deterministic host-to-card FIDO2 communication over USB HID/CTAPHID.
|
|
- Restore a buildable/flashable firmware workspace for `CR_SDK_CK-main`.
|
|
- Turn ad-hoc demos into a repeatable verification flow.
|
|
- Stand up chained TLS communication in Qubes: `k_client -> k_proxy -> k_server`.
|
|
- Support both login flow (browser in `k_client`) and user enrollment flow (process in `k_client`).
|
|
- Minimize repeated card prompts by introducing secure session reuse after successful authentication.
|
|
- Implement a protected dummy resource on `k_server` (monotonic counter) for end-to-end validation.
|
|
- Ensure `k_proxy` and `k_server` are thread-safe and support concurrent access.
|
|
- Prepare `k_proxy` auth path for future transport shift: USB-direct -> wireless phone bridge.
|
|
|
|
## Phase 0: Qubes VM Baseline (Blocking)
|
|
|
|
1. Provision/verify AppVMs.
|
|
- Ensure `k_client`, `k_proxy`, `k_server` exist and are based on `debian-13-xfce`.
|
|
|
|
2. Assign functional responsibilities.
|
|
- `k_client`: browser client + enrollment process.
|
|
- `k_proxy`: USB card access + proxy/auth bridge.
|
|
- `k_server`: protected resource/service endpoint.
|
|
|
|
3. Define TLS endpoints and certificates.
|
|
- `k_proxy` presents TLS service to `k_client`.
|
|
- `k_server` presents TLS service to `k_proxy`.
|
|
- Trust roots and cert distribution model documented per VM.
|
|
|
|
Exit criteria:
|
|
- All 3 VMs exist, boot, and have clearly defined service ownership.
|
|
|
|
## Phase 1: Qubes Firewall Policy
|
|
|
|
1. Enforce allowed forward paths only.
|
|
- Allow `k_client` outbound TLS only to `k_proxy` service port(s).
|
|
- Allow `k_proxy` outbound TLS only to `k_server` service port(s).
|
|
- Deny direct `k_client` to `k_server` traffic.
|
|
|
|
2. Validate return path behavior.
|
|
- Confirm responses propagate back through established flows.
|
|
|
|
3. Verify with simple probes.
|
|
- TLS handshake and HTTP(S) checks from `k_client` to `k_proxy`.
|
|
- TLS handshake and HTTP(S) checks from `k_proxy` to `k_server`.
|
|
|
|
Exit criteria:
|
|
- Policy matches intended chain and is test-verified.
|
|
|
|
Status (2026-04-24, remote diagnostics):
|
|
- Confirmed active blocker remains Phase 1 network policy/pathing.
|
|
- Evidence from live VM probes:
|
|
- `k_client (10.137.0.16) -> k_proxy (10.137.0.12:8771)`: TCP timeout.
|
|
- `k_proxy (10.137.0.12) -> k_server (10.137.0.13:8780)`: upstream timeout.
|
|
- Local service health inside each VM is good, so failure is inter-VM reachability, not local process startup.
|
|
|
|
Status (2026-04-25, after restart and service recovery):
|
|
- Refined blocker: this is currently a qrexec/`qubes.ConnectTCP` refusal problem, not an app-local listener problem.
|
|
- Current evidence:
|
|
- `k_proxy` local `/health` is up on `127.0.0.1:8771`
|
|
- `k_server` local `/health` is up on `127.0.0.1:8780`
|
|
- `qrexec-client-vm k_proxy qubes.ConnectTCP+8771` -> `Request refused`
|
|
- `qrexec-client-vm k_server qubes.ConnectTCP+8780` -> `Request refused`
|
|
- Immediate next action for Phase 1:
|
|
- verify and fix the dom0 policy/mechanism that should permit `qubes.ConnectTCP` forwarding for the chain ports
|
|
|
|
Status (2026-04-25, dom0 policy fix validated):
|
|
- The forwarding blocker is cleared for the current prototype shape.
|
|
- Verified working chain:
|
|
- `k_client` localhost `9771` -> `k_proxy:8771`
|
|
- `k_proxy` localhost `9780` -> `k_server:8780`
|
|
- Verified outcome:
|
|
- TLS health checks pass on both hops
|
|
- end-to-end login, session status, protected counter access, and logout all succeed from `k_client`
|
|
- Phase 1 is complete for the current localhost-forwarded `qubes.ConnectTCP` design.
|
|
|
|
## Phase 2: TLS Certificates and Service Endpoints
|
|
|
|
1. Certificate model.
|
|
- Create or import CA and issue certs for `k_proxy` and `k_server`.
|
|
- Install trust roots in client VM(s) that need validation.
|
|
|
|
2. Service shape.
|
|
- `k_server`: HTTPS service exposing protected resource endpoint(s), including a monotonic counter endpoint.
|
|
- `k_proxy`: minimal HTTPS API gateway service (full web server framework not required).
|
|
|
|
3. Endpoint contract.
|
|
- Define request/response schema between `k_client` and `k_proxy`.
|
|
- Define upstream request contract from `k_proxy` to `k_server`.
|
|
|
|
Exit criteria:
|
|
- Mutual TLS trust decisions are documented and tested.
|
|
- HTTPS calls succeed on both links with expected cert validation.
|
|
|
|
Status (2026-04-25):
|
|
- Implemented HTTPS listeners in both prototype services.
|
|
- Added local CA + service certificate generation in `generate_phase2_certs.py`.
|
|
- Verified the working Qubes path is localhost forwarding plus TLS:
|
|
- `k_client` local `9771` forwards to `k_proxy:8771`
|
|
- `k_proxy` local `9780` forwards to `k_server:8780`
|
|
- Verified cert validation on both hops using the generated CA.
|
|
- Verified end-to-end HTTPS flow:
|
|
- `k_client -> k_proxy` login over TLS
|
|
- `k_proxy -> k_server` protected counter call over TLS
|
|
- session reuse still works across repeated protected requests
|
|
- Phase 2 is now effectively complete for the current prototype shape.
|
|
|
|
## Phase 2.5: Define State Ownership and Concurrency Model
|
|
|
|
1. State ownership.
|
|
- Decide where user/session state is authoritative (`k_proxy`, `k_server`, or split model).
|
|
- Define token/session format and validation boundary.
|
|
|
|
2. Concurrency controls.
|
|
- Define thread-safe strategy for session store and shared counters.
|
|
- Define locking/atomic/update semantics for counter increments and session updates.
|
|
|
|
3. Runtime model.
|
|
- Choose service runtime/config that supports simultaneous requests safely.
|
|
|
|
Exit criteria:
|
|
- Architecture clearly documents state authority and race-free update rules.
|
|
|
|
Next action (2026-04-25):
|
|
- Move into Phase 2.5 and make the current prototype decisions explicit:
|
|
- authority for session state remains `k_proxy`
|
|
- `k_server` remains authority for the protected counter/resource state
|
|
- localhost Qubes forwarders are part of the active runtime model for the two TLS hops
|
|
- define concurrency assumptions and limits around session store, forwarders, and counter access
|
|
|
|
Status (2026-04-25):
|
|
- Current ownership model is now explicit:
|
|
- `k_proxy` is authoritative for session creation, expiry, lookup, and logout
|
|
- `k_server` is authoritative for the protected monotonic counter
|
|
- `k_client` is a client only; it holds bearer tokens but is not a state authority
|
|
- Current validation boundary is explicit:
|
|
- `k_proxy` validates bearer tokens against its in-memory session store
|
|
- `k_server` trusts only requests that arrive with the configured `X-Proxy-Token`
|
|
- `k_server` does not currently validate end-user session tokens directly
|
|
- Current concurrency strategy is explicit:
|
|
- `k_proxy` uses `ThreadingHTTPServer` plus one lock around the in-memory session map
|
|
- `k_server` uses `ThreadingHTTPServer` plus one lock around counter increments
|
|
- upstream HTTPS calls from `k_proxy` are made outside the session-store lock
|
|
- Current runtime limits are explicit:
|
|
- sessions are process-local and disappear on `k_proxy` restart
|
|
- counter state is process-local and resets on `k_server` restart
|
|
- transport relies on Qubes localhost forwarders `9771` and `9780`
|
|
- Phase 2.5 is complete for the current prototype shape.
|
|
|
|
## Phase 3: Recover Basic Device Visibility on `k_proxy` (Blocking)
|
|
|
|
1. Verify physical + USB enumeration path.
|
|
- Check cable/port and confirm device appears in USB listings.
|
|
- Confirm `/dev/hidraw*` nodes appear when card is connected.
|
|
|
|
2. Validate Linux permissions.
|
|
- Install/update udev rule for ChromeCard HID VID/PID.
|
|
- Reload udev and verify non-root read/write access to hidraw node.
|
|
|
|
3. Re-run host probe.
|
|
- Run `python3 /home/user/chromecard/fido2_probe.py --list`.
|
|
- Run `python3 /home/user/chromecard/fido2_probe.py --json`.
|
|
- Record VID/PID/path and CTAP2 `getInfo` output in `Setup.md`.
|
|
|
|
Exit criteria:
|
|
- At least one CTAP HID device is listed.
|
|
- `--json` returns valid `ctap2_info`.
|
|
|
|
## Phase 4: Re-validate Local WebAuthn Demo on `k_proxy`
|
|
|
|
1. Start local demo server.
|
|
- Run `python3 /home/user/chromecard/webauthn_local_demo.py`.
|
|
- Confirm URL is `http://localhost:8765`.
|
|
|
|
2. Exercise register/login.
|
|
- Register a test user.
|
|
- Authenticate with same user.
|
|
- Capture errors (if any) and update `Setup.md`.
|
|
|
|
3. Decide next demo hardening step.
|
|
- Keep bring-up-only mode, or
|
|
- add signature verification for attestation/assertion.
|
|
|
|
Exit criteria:
|
|
- Register and login both complete with card interaction prompts.
|
|
|
|
Status (2026-04-24):
|
|
- Completed in `k_proxy` using `http://localhost:8765`.
|
|
- Registration result: `ok=true`, `username=alice`, `credential_count=1`.
|
|
- Authentication result: `ok=true`, `username=alice`, `authenticated=true`.
|
|
|
|
## Phase 5: Implement Proxy Auth + Session Reuse
|
|
|
|
1. Authenticate via card once per session window.
|
|
- `k_proxy` handles initial auth using connected card.
|
|
- On success, create session state for `k_client`.
|
|
|
|
2. Session model.
|
|
- Prefer server-side session store or signed session token.
|
|
- Include TTL/expiry, rotation, and explicit invalidation/logout path.
|
|
- Do not expose card secrets or long-lived auth material to `k_client`.
|
|
|
|
3. Proxying behavior.
|
|
- With valid session: `k_proxy` forwards request to `k_server` and returns result.
|
|
- Without valid session: require fresh card-backed auth flow.
|
|
|
|
Exit criteria:
|
|
- Repeated authorized requests do not require card interaction until session expiry.
|
|
- Expired/invalid sessions are correctly rejected.
|
|
|
|
Status (2026-04-24):
|
|
- Started with a runnable prototype:
|
|
- `/home/user/chromecard/k_proxy_app.py`
|
|
- `/home/user/chromecard/k_server_app.py`
|
|
- `/home/user/chromecard/PHASE5_RUNBOOK.md`
|
|
- Implemented in prototype:
|
|
- session create/status/logout endpoints in `k_proxy`
|
|
- TTL-based server-side session store with expiry garbage collection
|
|
- protected monotonic counter endpoint in `k_server` with thread-safe increments
|
|
- proxy forwarding from `k_proxy` to `k_server` using a shared upstream token
|
|
- Current auth gate for session creation is card-presence probe (`fido2_probe.py --json`), pending upgrade to full assertion verification path.
|
|
|
|
Status (2026-04-25):
|
|
- Prototype services were re-started successfully after VM restart.
|
|
- Current split-VM test shape is:
|
|
- `k_proxy` listening on `127.0.0.1:8771`
|
|
- `k_server` listening on `127.0.0.1:8780`
|
|
- End-to-end validation is now passing through the live chain from `k_client`.
|
|
- Current verified behavior:
|
|
- login succeeds for `alice`
|
|
- session status succeeds
|
|
- repeated protected counter requests succeed with session reuse
|
|
- logout succeeds
|
|
- post-logout protected access returns `401`
|
|
- Added repeatable host-side regression helper:
|
|
- `/home/user/chromecard/phase5_chain_regression.sh`
|
|
- Phase 5 is complete for the current prototype semantics.
|
|
- Experimental follow-up in code:
|
|
- `k_proxy_app.py` now also has `--auth-mode fido2-direct`
|
|
- this mode attempts direct credential registration and direct assertion verification with `python-fido2`
|
|
- it is not the deployed default because direct registration currently fails on `k_proxy` with `No compatible PIN/UV protocols supported!`
|
|
- `/home/user/chromecard/raw_ctap_probe.py` now exists for lower-level CTAP2 probing with keepalive/error logging
|
|
- latest retry result: after reattaching the card, `k_proxy` again exposes `/dev/hidraw0` and `/dev/hidraw1`, but raw `makeCredential` still reaches no Yes/No card prompt
|
|
- `/dev/hidraw0` opens successfully as the normal user; `/dev/hidraw1` is still permission-denied
|
|
- manual CTAPHID testing now shows `/dev/hidraw0` is the correct FIDO interface and a direct `INIT` write gets no response at all
|
|
- rerunning `webauthn_local_demo.py` inside `k_proxy` also still gives no card prompt, so the current break is below both browser WebAuthn and direct host probes
|
|
- after a full power cycle and reattach, manual CTAPHID `INIT` replies again and browser registration in `webauthn_local_demo.py` succeeds again
|
|
- direct `raw_ctap_probe.py --device-path /dev/hidraw0 make-credential --rp-id localhost` now also succeeds again after card confirmation
|
|
- `k_proxy_app.py --auth-mode fido2-direct` has been moved onto low-level CTAP2 with hidraw auto-detection; it still accepts `--direct-device-path`, but no longer breaks if the card re-enumerates onto `/dev/hidraw1`
|
|
- after repeated fixes for hidraw lifetime, VM-side `python-fido2` response mapping, and CTAP payload shape, real app registration now succeeds for `directtest`
|
|
|
|
## Phase 5.5: Implement Dummy Resource + Access Policy on `k_server`
|
|
|
|
1. Protected dummy resource.
|
|
- Add endpoint returning increasing number.
|
|
- Require valid upstream auth/session context from `k_proxy`.
|
|
|
|
2. Optional user/session handling.
|
|
- Add minimal user/session checks if `k_server` is chosen as authority (or partial authority).
|
|
|
|
3. Correctness under concurrency.
|
|
- Ensure increments are monotonic and race-safe under parallel calls.
|
|
|
|
Exit criteria:
|
|
- Authorized requests obtain consistent increasing values.
|
|
- Unauthorized requests are rejected.
|
|
|
|
Status (2026-04-25):
|
|
- The protected counter resource is implemented and validated in the live split-VM chain.
|
|
- Verified behavior:
|
|
- authorized requests from `k_proxy` obtain increasing values
|
|
- unauthorized post-logout requests from `k_client` are rejected with `401`
|
|
- `20` concurrent protected requests through the chain returned unique, gap-free values
|
|
- Phase 5.5 is complete for the current prototype shape.
|
|
|
|
## Phase 6: Integrate Client Enrollment + Proxy Login Flow
|
|
|
|
1. Enrollment process in `k_client`.
|
|
- Start process from `k_client` that captures new-user enrollment intent/data.
|
|
- Route enrollment requests to `k_proxy` over TLS.
|
|
|
|
2. Card-mediated login in `k_proxy`.
|
|
- `k_proxy` uses connected card for FIDO2/WebAuthn operations.
|
|
- `k_proxy` authenticates toward `k_server` over TLS.
|
|
|
|
3. Browser flow in `k_client`.
|
|
- Browser traffic goes only to `k_proxy`.
|
|
|
|
Immediate next action:
|
|
- Preserve the now-working direct auth path as a tested option while keeping the default deployed baseline stable.
|
|
- Verified end-to-end state:
|
|
- direct `/enroll/register` succeeds for `directtest`
|
|
- direct `/session/login` succeeds for `directtest`
|
|
- `/session/status` succeeds
|
|
- protected `/resource/counter` succeeds through `k_proxy -> k_server`
|
|
- `/session/logout` succeeds
|
|
- post-logout protected access returns `401`
|
|
- Next work should be cleanup/hardening:
|
|
- decide whether to keep `directtest` enrollment
|
|
- rerun `phase5_chain_regression.sh --interactive-card --expect-auth-mode fido2_assertion` against the current direct-auth baseline
|
|
- decide when `fido2-direct` should replace `probe` as the default deployed auth mode
|
|
|
|
Exit criteria:
|
|
- Enrollment and login both function end-to-end via `k_client -> k_proxy -> k_server`.
|
|
|
|
Status (2026-04-25):
|
|
- Added first `k_client` implementation at `/home/user/chromecard/k_client_portal.py`.
|
|
- Current prototype flow:
|
|
- browser now targets `k_proxy` directly over `https://127.0.0.1:9771`
|
|
- `k_client_portal.py` also serves a local browser flow page on `http://127.0.0.1:8766`
|
|
- `k_proxy` continues to authenticate with the card and forward to `k_server`
|
|
- the `k_client` page now also lists registered users from `k_proxy`
|
|
- the `k_client` page can unregister users from the browser
|
|
- the portal login action now uses the current username field instead of only the remembered local user
|
|
- a Playwright regression spec now exists for the browser flow in `tests/k_client_portal.spec.js`
|
|
- the Playwright browser regression has now passed end-to-end once from this host against a forwarded portal URL
|
|
- Verified end-to-end through the portal:
|
|
- enroll `alice`
|
|
- login succeeds
|
|
- session status succeeds
|
|
- protected counter succeeds repeatedly with session reuse
|
|
- logout succeeds
|
|
- Enrollment contract progress:
|
|
- `k_proxy` now exposes prototype enrollment endpoints
|
|
- proxy-side enrollment storage exists and is checked before login is allowed
|
|
- direct browser/API traffic can now use those proxy endpoints without going through the local bridge
|
|
- Phase 6 is materially further along for the current prototype shape:
|
|
- direct browser target is on `k_proxy`
|
|
- login/resource flow is integrated on the direct proxy path
|
|
- enrollment now has a real client->proxy path
|
|
- the `k_client` page is now a usable demo/operator surface in addition to the direct proxy path
|
|
- final enrollment semantics are still provisional
|
|
|
|
Status (2026-04-25, enrollment hardening):
|
|
- Added a more explicit provisional enrollment contract in `k_proxy`:
|
|
- username normalization and validation
|
|
- optional `display_name`
|
|
- separate create, update, delete, status, and list operations
|
|
- delete invalidates existing sessions for that username
|
|
- Verified the hardened behaviors on the direct proxy path.
|
|
- Phase 6 is now strong enough to treat the browser/proxy flow as a stable prototype baseline.
|
|
- The remaining reason Phase 6 is not "final" is product semantics, not missing basic mechanics:
|
|
- whether enrollment should require card presence
|
|
- what user attributes belong in enrollment
|
|
- what re-enroll and recovery should mean
|
|
|
|
Status (2026-04-25, Phase 6.5 initial concurrency results):
|
|
- Added reproducible probe script at `/home/user/chromecard/phase65_concurrency_probe.py`.
|
|
- Probe now supports `--max-workers` so client-side fan-out can be tested separately from total request count.
|
|
- Moderate direct-path concurrency passes:
|
|
- `3 users x 4 requests`
|
|
- `12/12` successful protected calls
|
|
- counter values remained unique and contiguous
|
|
- Larger direct-path concurrency currently fails:
|
|
- `5 users x 5 requests`
|
|
- only `18/25` successful protected calls
|
|
- failed calls report TLS EOF / upstream unavailable errors
|
|
- Follow-up findings are more precise:
|
|
- body-drain handling was fixed for the HTTP/1.1 keep-alive experiment
|
|
- `k_proxy -> k_server` upstream concurrency is now clampable and currently tested at one pooled connection
|
|
- `5 users x 5 requests` passes at `25/25` when client fan-out is limited to `--max-workers 10`
|
|
- the same total load still fails at higher fan-out:
|
|
- `22/25` at `--max-workers 15`
|
|
- `15/25` at fully unbounded `25` workers in the latest rerun
|
|
- Current bottleneck is still not counter correctness:
|
|
- successful results still show unique, contiguous counter values
|
|
- `k_proxy` and `k_server` complete the requests that actually arrive
|
|
- Current likely bottleneck is the client-facing Qubes forwarding layer:
|
|
- `qvm_connect_9771.log` shows qrexec data-vchan failures
|
|
- observed message includes `xs_transaction_start: No space left on device`
|
|
- `qvm_connect_9780.log` showed earlier failures too, but the latest threshold test points first to connection fan-out on `k_client -> k_proxy`
|
|
- Phase 6.5 is therefore started but not complete:
|
|
- application-level concurrency looks acceptable at moderate load
|
|
- current working envelope is roughly `10` in-flight protected calls on the direct browser path
|
|
- higher-load failures still need Qubes forwarding diagnosis before the phase can be closed
|
|
|
|
Status (2026-04-25, Phase 5 regression helper):
|
|
- Added repeatable split-VM regression helper:
|
|
- `/home/user/chromecard/phase5_chain_regression.sh`
|
|
- Verified helper result on the live chain:
|
|
- `20` requests at parallelism `8`
|
|
- login/session-status/counter/logout sequence completed successfully
|
|
- returned counter values were unique and gap-free
|
|
- latest verified helper range was `43..62`
|
|
- Current implication:
|
|
- the Phase 5 baseline is now reproducible
|
|
- next work should target auth semantics rather than basic chain bring-up
|
|
|
|
## Phase 6.5: Concurrency and Multi-Client Test Setup
|
|
|
|
1. Single-VM concurrency tests.
|
|
- Generate parallel request bursts from `k_client` to `k_proxy`.
|
|
- Verify response integrity, session reuse behavior, and error rates.
|
|
|
|
2. Multi-client tests.
|
|
- Run requests from multiple `k_client` instances (or equivalent parallel clients) concurrently.
|
|
- Verify isolation between users/sessions.
|
|
|
|
3. Acceptance checks.
|
|
- No race-related crashes/corruption in `k_proxy` or `k_server`.
|
|
- Counter/resource behavior remains correct under load.
|
|
- Session reuse reduces card prompts while preserving authorization checks.
|
|
|
|
Exit criteria:
|
|
- Test results demonstrate stable concurrent operation with documented limits.
|
|
|
|
## Phase 7: Restore Firmware Build/Flash Path
|
|
|
|
1. Validate SDK tree completeness.
|
|
- Confirm presence of `mvp`, `setup`, `components`, `samples` under `CR_SDK_CK-main`.
|
|
- If missing, obtain full repository/checkpoint and document source.
|
|
|
|
2. Install/enable build tools.
|
|
- Ensure `west` and `nrfjprog` are available in shell.
|
|
- Confirm target board/toolchain match (`nrf7002dk/nrf5340/cpuapp`, NCS `v2.9.2` baseline in docs).
|
|
|
|
3. Run baseline build+flash.
|
|
- From `CR_SDK_CK-main`, run `./scripts/build_flash_mvp.sh`.
|
|
- If flashing fails, run documented recovery and retry.
|
|
|
|
Exit criteria:
|
|
- Successful `west build` and `west flash`.
|
|
|
|
## Phase 8: Consolidate Documentation and Paths
|
|
|
|
1. Remove path drift between docs and actual files.
|
|
- Keep `fido2_probe.py` and `webauthn_local_demo.py` at workspace root.
|
|
- Ensure docs never instruct placing helper scripts under `CR_SDK_CK-main`.
|
|
- Update references consistently in all docs.
|
|
|
|
2. Keep `Setup.md` current.
|
|
- After each significant change, update status snapshot and outcomes.
|
|
|
|
3. Add minimal reproducibility checklist.
|
|
- One command list for probe + demo + build/flash prechecks.
|
|
|
|
4. Maintain Markdown execution records continuously.
|
|
- `Setup.md` and `Workplan.md` are the canonical living docs for this workspace.
|
|
- Re-scan relevant `.md` files before each new execution cycle and reconcile drift.
|
|
- Record date-stamped session notes when priorities or blockers change.
|
|
|
|
Status (2026-04-24, markdown maintenance):
|
|
- Re-scanned the active workspace Markdown set and the main source-tree reference docs.
|
|
- No workplan phase change was required from this pass.
|
|
- Ongoing documentation watch item remains path drift in `CR_SDK_CK-main/README_HOST.md`, which still uses historical `./scripts/...` helper locations instead of workspace-root helper paths.
|
|
- Operational note: the markdown scan path now runs cleanly after policy adjustment when invoked without a login shell.
|
|
|
|
Status (2026-04-24, chain probe retry):
|
|
- Phase 1 remains blocked, but the failure point is now narrowed further:
|
|
- current refusal occurs at Qubes `qubes.ConnectTCP` policy/service evaluation for ports `22`, `8770`, and `8780`
|
|
- this happens before any end-to-end app-level request can be retried
|
|
- Practical implication:
|
|
- do not spend time on `k_proxy_app.py` / `k_server_app.py` request handling until qrexec forwarding is permitting the intended hops again
|
|
- next recovery action is to fix/activate the relevant Qubes `qubes.ConnectTCP` policy and then re-run the qrexec bridge checks before testing HTTP flow
|
|
|
|
Status (2026-04-25, post-restart probe):
|
|
- Corrected the client-facing proxy port reference to `8771`.
|
|
- SSH access to `k_proxy` and card visibility recovered after VM restart.
|
|
- New immediate blockers are:
|
|
- `k_proxy` service not listening on `127.0.0.1:8771`
|
|
- `k_server` service not listening on `127.0.0.1:8780`
|
|
- qrexec forwarding for `8771` and `8780` still returns `Request refused`
|
|
- Next retry should start services first, then re-test qrexec forwarding and only then attempt end-to-end client flow.
|
|
|
|
Status (2026-04-25, service restart):
|
|
- Local VM services are running again on the intended loopback ports:
|
|
- `k_server`: `127.0.0.1:8780`
|
|
- `k_proxy`: `127.0.0.1:8771`
|
|
- Phase 1 remains blocked specifically by qrexec policy/forwarding refusal on those ports.
|
|
- Next action is no longer app startup; it is fixing the `qubes.ConnectTCP` allow path for `8771` and `8780`.
|
|
|
|
Status (2026-04-25, in-VM forwarding test):
|
|
- Verified that using `qvm-connect-tcp` inside the source VMs still does not complete the client->proxy hop:
|
|
- bind succeeds locally, but first real connection gets `Request refused`
|
|
- Independent app-layer blocker also found in `k_proxy`:
|
|
- `python-fido2` is missing there, so local `/session/login` currently fails before card auth can succeed
|
|
- Current ordered blockers:
|
|
- first: effective Qubes/qrexec allow path for `k_client -> k_proxy:8771`
|
|
- second: install `python-fido2` in `k_proxy`
|
|
- third: re-test end-to-end login and then proxy->server counter flow
|
|
|
|
Status (2026-04-25, after python3-fido2 install):
|
|
- `python3-fido2` blocker in `k_proxy` is resolved.
|
|
- Updated ordered blockers:
|
|
- first: effective Qubes/qrexec allow path for `k_client -> k_proxy:8771`
|
|
- second: restore CTAP HID device visibility/access in `k_proxy` (`No CTAP HID devices found`)
|
|
- third: re-test end-to-end login and then proxy->server counter flow
|
|
|
|
Status (2026-04-25, card reattached):
|
|
- CTAP HID visibility/access in `k_proxy` is restored.
|
|
- Local proxy login is working again with the attached card.
|
|
- The only currently confirmed blocker for the end-to-end path is the `k_client -> k_proxy:8771` qrexec/`qvm-connect-tcp` refusal.
|
|
|
|
Status (2026-04-25, clean forward retest):
|
|
- The retest shows the same qrexec failure mode on both hops, not just the client-facing one.
|
|
- Updated blocker statement:
|
|
- effective `qubes.ConnectTCP` allow path is failing for both
|
|
- `k_client -> k_proxy:8771`
|
|
- `k_proxy -> k_server:8780`
|
|
- App services and card path are currently good; forwarding remains the single active system blocker.
|
|
|
|
Status (2026-04-25, dom0 policy fix validated):
|
|
- The explicit-destination dom0 `qubes.ConnectTCP` policy fix resolved forwarding on both hops.
|
|
- Current verified working chain:
|
|
- `k_client -> k_proxy:8771`
|
|
- `k_proxy -> k_server:8780`
|
|
- Current verified prototype behavior:
|
|
- session login works from `k_client`
|
|
- session status works
|
|
- protected counter flow reaches `k_server`
|
|
- session reuse avoids re-login for repeated counter calls
|
|
- logout invalidates the session and subsequent protected access returns `401`
|
|
- Immediate networking blocker is cleared.
|
|
|
|
Exit criteria:
|
|
- New team member can follow docs end-to-end without path or tooling ambiguity.
|
|
|
|
## Phase 9: Migrate to Phone-Mediated Wireless Validation
|
|
|
|
Status (2026-05-04): **ACTIVE — Architecture v2 adopted; Component 1 + Component 2 CONNECT handler complete**
|
|
|
|
### Architecture v2 changes (2026-05-04)
|
|
|
|
The following changes replace the v1 architecture. Source: `chromecard_arkitektur_v2.docx`.
|
|
|
|
**Component 2 no longer calls endpoints:** Component 2 returns the WebAuthn token to whoever asked (Component 1). It is Component 1 that calls the endpoint with the token. This is the most important behavioral change.
|
|
|
|
**New Component 3 (external client):** A compiled binary (Go recommended, Rust alternative) installed on external client computers. Replaces the old browser-proxy-configuration approach. Tasks: find the phone (currently hardcoded IP+port — rendezvous TBD), forward validation requests to Component 1, receive token back, call the protected endpoint directly, return response to browser.
|
|
|
|
**Flow A splits into two paths:**
|
|
- Phone browser: Browser → Component 1 → Component 2 (returns token) → Component 1 calls endpoint → resource
|
|
- External client: Browser → Component 3 → Component 1 → Component 2 (returns token) → Component 1 → Component 3 calls endpoint → resource
|
|
|
|
**Platform note:** Android needs no extra infrastructure. iOS requires a push-relay (APNs) for background operation — platform priority is an open decision.
|
|
|
|
**New open decisions:** Rendezvous mechanism for Component 3; iOS vs Android priority.
|
|
|
|
**Architectural decision (2026-05-08) — token binding model:**
|
|
Current choice: per-request authentication. No session is opened. Each request to a gated resource requires a fresh FIDO2 assertion from the card, with the challenge bound to the specific request (URL + method + nonce). The server verifies that the assertion's challenge matches the resource being requested. A token cannot be replayed for a different resource.
|
|
Consequence: one card interaction per request. This is intentional for now.
|
|
May change to: session model (one card interaction opens a time-limited session for all gated resources). If changed, token must at minimum be bound to a specific server (audience) to prevent cross-server replay.
|
|
Trigger for revisiting: user experience — if per-request card interaction proves too slow or disruptive.
|
|
|
|
### Target architecture (v2)
|
|
|
|
Four physical devices: optional client computer, phone, chromecard, server.
|
|
|
|
**Phone components:**
|
|
- **Component 1 — Proxy + gating filter:** Receives requests from phone browser and from external clients via Component 3. Per-request: gated host → forward to Component 2, receive WebAuthn token back, call endpoint with token (TLS); non-gated → forward directly to internet on port 80 (no TLS, bypasses auth entirely).
|
|
- **Component 2 — WebAuthn client + URL recognition:** Always returns token to caller, never calls endpoints itself. Detects registration URL → admin registration flow (admin fingerprint); other gated URLs → FIDO2 assertion flow (user fingerprint → token returned to Component 1).
|
|
- **Registration page:** Local web app on phone; admin fingerprint access control enforced by card.
|
|
- **Component 3 (external client):** Compiled binary, finds phone, relays auth through Component 1, calls endpoint with received token.
|
|
|
|
**Three flows:**
|
|
- **Flow A (phone browser):** Browser → Comp 1 → Comp 2 → card → token → Comp 1 → endpoint → resource
|
|
- **Flow A (external client):** Browser → Comp 3 → Comp 1 → Comp 2 → card → token → Comp 1 → Comp 3 → endpoint → resource
|
|
- **Flow B:** Browser → Comp 1 → Comp 2 (registration URL) → card (admin biometric) → enroll/delete user
|
|
- **Flow C:** Non-gated host → Comp 1 → internet port 80 (no TLS, no card)
|
|
|
|
**Open decisions:** PIN on card; user DB on-card vs. external; network-level access control on registration page; Component 3 rendezvous mechanism; iOS vs Android priority.
|
|
|
|
Development chain (Qubes): `k_client browser → k_phone (Flutter Android) → USB HID → ChromeCard → k_server`
|
|
|
|
The `k_phone` Flutter app replaces `k_proxy` entirely. It presents the same HTTP API as `k_proxy_app.py`
|
|
so `k_client_portal.py` and the browser portal work without changes.
|
|
|
|
**Development environment:** Mac (not Qubes). Android emulator is incompatible with Xen/Qubes. All
|
|
k_phone development and testing runs on the Mac with the Android emulator and `card_emulator_bridge.py`.
|
|
|
|
### Work completed (2026-04-29)
|
|
|
|
- Flutter project scaffolded at `k_phone/` (no `flutter create` — fully hand-written)
|
|
- 10+ Android build issues resolved (AGP, Gradle, Kotlin, desugaring, notification channel, foreground service type)
|
|
- `k_phone/lib/ctaphid_channel.dart`: full CTAPHID framing + USB/emulator dual-transport
|
|
- Fixed: persistent socket subscription (single-subscription stream cannot use `await for ... break` per packet)
|
|
- Fixed: `_emulatorSocketOpen` flag prevents dead-socket writes from raising `StateError`
|
|
- Fixed: emulator round-trip sends all request packets before reading (no per-packet blocking)
|
|
- `k_phone/lib/proxy_service.dart`: full HTTP proxy — all endpoints implemented, error handling hardened
|
|
- Fixed: card-error try-catch separated from DB StateError catch (was masking socket errors as "user already enrolled")
|
|
- `autoStart: true` for emulator testing; revert to `false` for production builds
|
|
- `k_phone/lib/enrollment_db.dart`: enrollment model + JSON persistence via path_provider
|
|
- `k_phone/lib/fido2_ops.dart`: CTAP2 `makeCredential`, `getAssertion`, ECDSA-P256 assertion verification
|
|
- Fixed: CTAP2 command prefix bytes (0x01/0x02) prepended to CBOR payload per CTAP2-over-CTAPHID spec
|
|
- `k_phone/lib/session_manager.dart`: in-memory bearer token sessions; `hasAnyActiveSession()` added for gated-proxy forwarding (personal-device model: any live session authorises gated traffic)
|
|
- `k_phone/lib/k_server_client.dart`: HTTP forwarder to k_server
|
|
- `k_phone/android/app/src/main/kotlin/.../MainActivity.kt`: USB HID Kotlin platform channel
|
|
- `tests/card_emulator_bridge.py`: asyncio CTAPHID TCP bridge wrapping `CardEmulator` for emulator dev
|
|
|
|
### Work completed (2026-05-02)
|
|
|
|
- `k_phone/lib/filter_proxy.dart`: Component 1 implemented — HTTP proxy with gating filter
|
|
- Plain HTTP to gated host: rewritten to relative path and forwarded to Component 2
|
|
- HTTPS CONNECT to gated host: CONNECT request relayed to Component 2; tunnel opened on 200, denied on 4xx
|
|
- All other traffic forwarded directly to target host
|
|
- Gated hosts file: `gated_hosts.txt` in app documents directory (one `host` or `host:port` per line)
|
|
- Default seeded with `httpbin.org` on first run
|
|
- `k_phone/test/filter_proxy_test.dart`: full test suite for Component 1 (gated matching, HTTP routing, CONNECT routing, edge cases)
|
|
- `k_phone/test/enrollment_test.dart`: full test suite for `EnrollmentDb` (register, list, delete, persistence, update)
|
|
|
|
### Work completed (2026-05-02, session 2)
|
|
|
|
- `k_phone/lib/proxy_service.dart`: `_handleConnect` added to `_ProxyServer`
|
|
- Dispatched from `_handleRequest` for `CONNECT` method
|
|
- Checks `_sessions.hasAnyActiveSession()` — returns 407 if no active session
|
|
- Extracts upstream host:port from `Host` header
|
|
- Opens TCP socket to upstream target (the real external server — httpbin.org, etc.)
|
|
- Detaches the HTTP socket (`detachSocket(writeHeaders: false)`) and writes `200 Connection Established` manually
|
|
- Pipes bytes bidirectionally: client ↔ upstream
|
|
- k_server is not involved in CONNECT tunnels; Component 2 connects directly to the real target
|
|
|
|
### Verified on emulator (2026-04-29)
|
|
|
|
```
|
|
POST /enroll/register → makeCredential via bridge → has_credential: true ✓
|
|
POST /session/login → getAssertion + ECDSA verify → auth_mode: fido2_assertion ✓
|
|
POST /session/status → 299 s remaining ✓
|
|
POST /session/logout → invalidated: true ✓
|
|
POST /resource/counter → internal error (k_server not running locally — expected)
|
|
POST /resource/counter (after logout) → 401 invalid or expired session ✓
|
|
```
|
|
|
|
Bridge log confirmed:
|
|
```
|
|
CTAP2 cmd=0x01 body=180 bytes → makeCredential OK auth_data=164 bytes
|
|
CTAP2 cmd=0x02 body=113 bytes → getAssertion OK auth_data=37 bytes sig=71 bytes
|
|
```
|
|
|
|
### Work completed (2026-05-05, v2 architecture refactor)
|
|
|
|
**k_phone (Dart):**
|
|
- `filter_proxy_test.dart`: rewritten for v2 semantics — gated HTTP now hits a mock endpoint with Bearer token, not Component 2 directly. 24/24 tests pass.
|
|
- `filter_proxy.dart`: extracted `_writeProxyHeaders` and `_forwardHttpRequest` helpers to eliminate ~30 lines of duplication between `_handleGatedHttp` and `_handleDirectHttp`; simplified `_handleDirectHttp` signature (redundant `host`/`port` params removed).
|
|
- `session_manager.dart`: added `static const int ttlSeconds = 300` (public); `_ttl` now references it.
|
|
- `portal_html.dart` (new): extracted 400-line HTML blobs (`kPortalHtml`, `kEnrollHtml`, `kPortalHtmlBytes`, `kEnrollHtmlBytes`) from `proxy_service.dart`.
|
|
- `proxy_service.dart`: imports `portal_html.dart`; removed `_kSessionTtlSeconds` constant (replaced with `SessionManager.ttlSeconds`); merged `_serveHtml`/`_serveEnrollHtml` into `_serveHtmlBytes(req, bytes)`; extracted `_parseUsername` and `_parseUsernameAndDisplay` helpers eliminating repeated validation boilerplate; removed dead `_loadTlsContext` stub; simplified `start()` TLS branch. File: 872 → 455 lines.
|
|
- `k_server_client.dart`: deleted (dead code — no longer imported anywhere).
|
|
|
|
**component3 (Go):**
|
|
- `gated.go`: `IsGated(host, port string)` — was `IsGated(host string)`. Was silently missing `host:port` entries in gated_hosts.txt. Now checks both bare hostname and `host:port`.
|
|
- `proxy.go`: `handleHTTP` extracts `port` from URL (defaults `"80"`), passes to `IsGated`; `handleConnect` passes `portStr` to `IsGated`.
|
|
- `phone.go`: added `getToken()` calling `/auth/get-token` — avoids FIDO2 card interaction if the phone already has an active session. `EnsureSession()` tries `getToken()` first, falls back to `login()`. Fixed `login()` JSON field: `expires_in` → `ttl_seconds` (actual server field name). `go build ./...` passes.
|
|
|
|
### Parallel-change note: Component 1 and Component 3 share the same proxy logic
|
|
|
|
Component 3 (`component3/`) and Component 1 (`k_phone/lib/filter_proxy.dart`) implement the same core behaviour: intercept HTTP/HTTPS traffic, decide per-request whether the target is gated, fetch a WebAuthn token if so, and call the endpoint directly with the token. Any structural change to one (new gating logic, token-binding changes, CONNECT handling, error semantics) will almost certainly need a corresponding change in the other. Treat them as a pair: when modifying Component 3, check Component 1 for the same fix, and vice versa.
|
|
|
|
### Work completed (2026-05-08, per-request token binding)
|
|
|
|
- `fido2_ops.dart`: `GetAssertionResult` now includes `clientDataJson`; `getAssertion()` accepts optional `challenge` param for binding.
|
|
- `proxy_service.dart`: `_handleAuthGetToken` rewritten — accepts `{url, method, nonce}`, derives `challenge = SHA256(url|method|nonce)`, calls card (getAssertion), returns self-contained assertion bundle as base64url Bearer token. No session involved.
|
|
- `filter_proxy.dart`: `_getAuthToken(uri, method)` generates a secure 16-byte nonce, posts `{url, method, nonce}` to Component 2, uses returned assertion token directly.
|
|
- `component3/phone.go`: rewritten as stateless `GetTokenForRequest(url, method)` — no session caching, no mutex, no expiry tracking.
|
|
- `component3/proxy.go`: `handleHTTP` uses `GetTokenForRequest(r.URL.String(), r.Method)`.
|
|
- `component3/main.go`: `--user` flag removed (Component 2 picks the enrolled user).
|
|
- `k_server_app.py`: `_verify_assertion_token()` added — decodes bundle, verifies path+method match, verifies challenge claim, verifies ECDSA-P256 signature over authData||clientDataHash using public key extracted from bundle's credentialData. `_is_proxy_authorized()` accepts either X-Proxy-Token (legacy k_proxy path) or Bearer assertion token.
|
|
- `filter_proxy_test.dart`: 2 new tests for `/auth/get-token` body fields (url, method, nonce). 48/48 tests pass.
|
|
- `tests/test_k_server.py`: 17 Python tests for `_verify_assertion_token` — 12 unit tests with synthetic P-256 keys, 5 round-trip tests via `CardEmulator`. All pass.
|
|
- 48/48 Flutter tests pass; `go build ./...` clean; `flutter analyze` no issues.
|
|
|
|
### Work completed (2026-05-08, Playwright acceptance tests for k_phone)
|
|
|
|
- `tests/k_phone_portal.spec.js` (new): Portal UI acceptance tests (enroll → login → status → list → logout → delete). DOM assertions against `#storedUser`, `#sessionActive`, `#log`. Also tests empty-username and unknown-user error paths.
|
|
- Run: `K_PHONE_BASE_URL=http://phone-ip:8771 npx playwright test tests/k_phone_portal.spec.js`
|
|
|
|
- `tests/k_phone_proxy.spec.js` (new): Proxy routing acceptance tests. Four serial tests that prove Component 1's routing decisions:
|
|
1. No users → non-gated request passes through (< 500).
|
|
2. No users → gated request rejected with 407 (Component 2 has no enrolled user).
|
|
3. Register user (card fingerprint) → non-gated still passes through.
|
|
4. With enrolled user → gated request succeeds after card assertion (200); response body proves Bearer token was forwarded to target.
|
|
- Uses Node `http` module for proxy requests (absolute URI / proxy protocol).
|
|
- Uses Playwright `page` fixture for enrollment in test 3 (card interaction).
|
|
- `GATED_URL` defaults to `http://httpbin.org/get`; point at `http://k-server-ip:8780/resource/counter` (GATED_METHOD=POST) for full chain validation including token signature verification.
|
|
- Run: `K_PHONE_PROXY=http://phone-ip:8888 K_PHONE_BASE_URL=http://phone-ip:8771 npx playwright test tests/k_phone_proxy.spec.js`
|
|
|
|
### Next action
|
|
|
|
1. Deploy to a real Android phone with physical ChromeCard via USB
|
|
2. Verify USB HID path (Kotlin MainActivity.kt platform channel, hidraw node auto-detection)
|
|
3. Run `phase5_chain_regression.sh` against `k_phone` on Android with k_server running
|
|
|
|
### k_phone API contract (must match k_proxy_app.py exactly)
|
|
|
|
- `GET /health`
|
|
- `POST /enroll/register` `{"username","display_name"}`
|
|
- `GET /enroll/status?username=`
|
|
- `POST /enroll/update` `{"username","display_name"}`
|
|
- `POST /enroll/delete` `{"username"}`
|
|
- `GET /enroll/list`
|
|
- `POST /session/login` `{"username"}`
|
|
- `POST /session/status`
|
|
- `POST /session/logout`
|
|
- `POST /resource/counter` (forwarded to k_server with X-Proxy-Token)
|
|
|
|
### Key design decisions
|
|
|
|
- rp_id: `"localhost"`, origin: `"https://localhost"` (matches k_proxy_app.py defaults)
|
|
- clientDataHash = SHA256(clientDataJSON), where clientDataJSON = `{"type":"webauthn.create","challenge":"<b64>","origin":"https://localhost","crossOrigin":false}`
|
|
- credential_data_b64 stores `AttestedCredentialData` bytes = `aaguid(16) + credIdLen(2) + credId(n) + coseKey`
|
|
- Signature verification: ECDSA-SHA256(authData || clientDataHash, P-256 pubKey extracted from COSE key)
|
|
- No begin/complete HTTP round-trip — registration and auth are each a single HTTP call (same as Python)
|
|
- Sessions: server-side in-memory, TTL 300 s (matching Python default), token = 32-byte hex
|
|
|
|
### start bridge for emulator testing
|
|
|
|
```bash
|
|
uv run --python 3.12 --with fido2 --with cbor2 --with cryptography tests/card_emulator_bridge.py
|
|
```
|
|
|
|
### Phase 9 exit criteria
|
|
|
|
- `k_phone` presents identical HTTP API to `k_proxy_app.py` (so k_client works unchanged)
|
|
- Registration and login both complete via `card_emulator_bridge.py` in emulator testing
|
|
- With physical ChromeCard plugged into Android phone: full register → login → counter → logout works
|
|
- `phase5_chain_regression.sh` passes against `k_phone` on Android
|
|
|
|
## Current Next Step
|
|
|
|
Status (2026-04-29):
|
|
- Phase 9 emulator milestone complete: makeCredential + getAssertion verified via CardEmulator bridge.
|
|
- Next blocking step: deploy to real Android phone with ChromeCard over USB.
|
|
- k_server is not running in the Mac test environment; counter endpoint will work once running in Qubes.
|
|
|
|
Phase status (2026-04-29):
|
|
- Phase 6.5 (concurrency): deferred. ~10 in-flight ceiling is acceptable.
|
|
- Phase 7 (firmware build/flash): blocked on Chrome Roads (card vendor).
|
|
- Phase 9 (phone integration): **emulator FIDO2 verified; physical phone + USB HID path is next.**
|
|
|
|
Status (2026-04-26, markdown maintenance):
|
|
- Re-scanned `Setup.md`, `Workplan.md`, and `PHASE5_RUNBOOK.md` against the current workspace files.
|
|
|
|
## Inputs Expected During This Session
|
|
|
|
- Exact observed behavior on reconnect attempts (USB/hidraw/probe).
|
|
- Whether we should pull server-side code now.
|
|
- Any board/firmware variants different from default documentation assumptions.
|
|
- Preferred TLS ports, certificate approach, and hostname scheme for `k_client`, `k_proxy`, `k_server`.
|
|
- Session TTL and invalidation requirements for cached authenticated access.
|
|
- Decision on where user/session authority lives (`k_proxy` vs `k_server` vs split).
|
|
- Target concurrency level for validation (parallel clients and parallel requests per client).
|
|
- Preferred wireless transport/protocol between `k_proxy` and phone (for future phase).
|
|
|
|
## Session Maintenance Notes (2026-04-24)
|
|
|
|
- Top-level Markdown review completed for `PHASE5_RUNBOOK.md`, `Setup.md`, and `Workplan.md`.
|
|
- Current execution plan remains in sync with the Phase 5 runbook:
|
|
- prototype services at `/home/user/chromecard/k_proxy_app.py` and `/home/user/chromecard/k_server_app.py`
|
|
- run sequence documented in `/home/user/chromecard/PHASE5_RUNBOOK.md`
|
|
- No phase ordering or blocker changes were required from this review pass.
|
|
- Remote execution support is now active and validated:
|
|
- `ssh` command execution works for `k_client`, `k_proxy`, `k_server`
|
|
- `scp` push to VM home works (validated on `k_proxy`)
|