k_card/Workplan.md

21 KiB

Workplan

Last updated: 2026-04-25

This is the execution plan for making ChromeCard FIDO2 development and validation reproducible on this machine.

Constraints

  • Treat /home/user/chromecard/CR_SDK_CK-main as read-only.
  • Keep helper scripts such as fido2_probe.py and webauthn_local_demo.py at /home/user/chromecard.
  • Target deployment model is Qubes OS with 3 AppVMs based on debian-13-xfce: k_client, k_proxy, k_server.
  • Current authenticator link is card->k_proxy (USB), but architecture must allow migration to wireless phone-mediated validation.
  • VM execution path is SSH-first for experiments: ssh <host> <cmd> and scp <file> <host>:~.

Goals

  • Re-establish deterministic host-to-card FIDO2 communication over USB HID/CTAPHID.
  • Restore a buildable/flashable firmware workspace for CR_SDK_CK-main.
  • Turn ad-hoc demos into a repeatable verification flow.
  • Stand up chained TLS communication in Qubes: k_client -> k_proxy -> k_server.
  • Support both login flow (browser in k_client) and user enrollment flow (process in k_client).
  • Minimize repeated card prompts by introducing secure session reuse after successful authentication.
  • Implement a protected dummy resource on k_server (monotonic counter) for end-to-end validation.
  • Ensure k_proxy and k_server are thread-safe and support concurrent access.
  • Prepare k_proxy auth path for future transport shift: USB-direct -> wireless phone bridge.

Phase 0: Qubes VM Baseline (Blocking)

  1. Provision/verify AppVMs.
  • Ensure k_client, k_proxy, k_server exist and are based on debian-13-xfce.
  1. Assign functional responsibilities.
  • k_client: browser client + enrollment process.
  • k_proxy: USB card access + proxy/auth bridge.
  • k_server: protected resource/service endpoint.
  1. Define TLS endpoints and certificates.
  • k_proxy presents TLS service to k_client.
  • k_server presents TLS service to k_proxy.
  • Trust roots and cert distribution model documented per VM.

Exit criteria:

  • All 3 VMs exist, boot, and have clearly defined service ownership.

Phase 1: Qubes Firewall Policy (Blocking)

  1. Enforce allowed forward paths only.
  • Allow k_client outbound TLS only to k_proxy service port(s).
  • Allow k_proxy outbound TLS only to k_server service port(s).
  • Deny direct k_client to k_server traffic.
  1. Validate return path behavior.
  • Confirm responses propagate back through established flows.
  1. Verify with simple probes.
  • TLS handshake and HTTP(S) checks from k_client to k_proxy.
  • TLS handshake and HTTP(S) checks from k_proxy to k_server.

Exit criteria:

  • Policy matches intended chain and is test-verified.

Status (2026-04-24, remote diagnostics):

  • Confirmed active blocker remains Phase 1 network policy/pathing.
  • Evidence from live VM probes:
    • k_client (10.137.0.16) -> k_proxy (10.137.0.12:8771): TCP timeout.
    • k_proxy (10.137.0.12) -> k_server (10.137.0.13:8780): upstream timeout.
  • Local service health inside each VM is good, so failure is inter-VM reachability, not local process startup.

Status (2026-04-25, after restart and service recovery):

  • Refined blocker: this is currently a qrexec/qubes.ConnectTCP refusal problem, not an app-local listener problem.
  • Current evidence:
    • k_proxy local /health is up on 127.0.0.1:8771
    • k_server local /health is up on 127.0.0.1:8780
    • qrexec-client-vm k_proxy qubes.ConnectTCP+8771 -> Request refused
    • qrexec-client-vm k_server qubes.ConnectTCP+8780 -> Request refused
  • Immediate next action for Phase 1:
    • verify and fix the dom0 policy/mechanism that should permit qubes.ConnectTCP forwarding for the chain ports

Phase 2: TLS Certificates and Service Endpoints

  1. Certificate model.
  • Create or import CA and issue certs for k_proxy and k_server.
  • Install trust roots in client VM(s) that need validation.
  1. Service shape.
  • k_server: HTTPS service exposing protected resource endpoint(s), including a monotonic counter endpoint.
  • k_proxy: minimal HTTPS API gateway service (full web server framework not required).
  1. Endpoint contract.
  • Define request/response schema between k_client and k_proxy.
  • Define upstream request contract from k_proxy to k_server.

Exit criteria:

  • Mutual TLS trust decisions are documented and tested.
  • HTTPS calls succeed on both links with expected cert validation.

Status (2026-04-25):

  • Implemented HTTPS listeners in both prototype services.
  • Added local CA + service certificate generation in generate_phase2_certs.py.
  • Verified the working Qubes path is localhost forwarding plus TLS:
    • k_client local 9771 forwards to k_proxy:8771
    • k_proxy local 9780 forwards to k_server:8780
  • Verified cert validation on both hops using the generated CA.
  • Verified end-to-end HTTPS flow:
    • k_client -> k_proxy login over TLS
    • k_proxy -> k_server protected counter call over TLS
    • session reuse still works across repeated protected requests
  • Phase 2 is now effectively complete for the current prototype shape.

Phase 2.5: Define State Ownership and Concurrency Model

  1. State ownership.
  • Decide where user/session state is authoritative (k_proxy, k_server, or split model).
  • Define token/session format and validation boundary.
  1. Concurrency controls.
  • Define thread-safe strategy for session store and shared counters.
  • Define locking/atomic/update semantics for counter increments and session updates.
  1. Runtime model.
  • Choose service runtime/config that supports simultaneous requests safely.

Exit criteria:

  • Architecture clearly documents state authority and race-free update rules.

Next action (2026-04-25):

  • Move into Phase 2.5 and make the current prototype decisions explicit:
    • authority for session state remains k_proxy
    • k_server remains authority for the protected counter/resource state
    • localhost Qubes forwarders are part of the active runtime model for the two TLS hops
    • define concurrency assumptions and limits around session store, forwarders, and counter access

Status (2026-04-25):

  • Current ownership model is now explicit:
    • k_proxy is authoritative for session creation, expiry, lookup, and logout
    • k_server is authoritative for the protected monotonic counter
    • k_client is a client only; it holds bearer tokens but is not a state authority
  • Current validation boundary is explicit:
    • k_proxy validates bearer tokens against its in-memory session store
    • k_server trusts only requests that arrive with the configured X-Proxy-Token
    • k_server does not currently validate end-user session tokens directly
  • Current concurrency strategy is explicit:
    • k_proxy uses ThreadingHTTPServer plus one lock around the in-memory session map
    • k_server uses ThreadingHTTPServer plus one lock around counter increments
    • upstream HTTPS calls from k_proxy are made outside the session-store lock
  • Current runtime limits are explicit:
    • sessions are process-local and disappear on k_proxy restart
    • counter state is process-local and resets on k_server restart
    • transport relies on Qubes localhost forwarders 9771 and 9780
  • Phase 2.5 is complete for the current prototype shape.

Phase 3: Recover Basic Device Visibility on k_proxy (Blocking)

  1. Verify physical + USB enumeration path.
  • Check cable/port and confirm device appears in USB listings.
  • Confirm /dev/hidraw* nodes appear when card is connected.
  1. Validate Linux permissions.
  • Install/update udev rule for ChromeCard HID VID/PID.
  • Reload udev and verify non-root read/write access to hidraw node.
  1. Re-run host probe.
  • Run python3 /home/user/chromecard/fido2_probe.py --list.
  • Run python3 /home/user/chromecard/fido2_probe.py --json.
  • Record VID/PID/path and CTAP2 getInfo output in Setup.md.

Exit criteria:

  • At least one CTAP HID device is listed.
  • --json returns valid ctap2_info.

Phase 4: Re-validate Local WebAuthn Demo on k_proxy

  1. Start local demo server.
  • Run python3 /home/user/chromecard/webauthn_local_demo.py.
  • Confirm URL is http://localhost:8765.
  1. Exercise register/login.
  • Register a test user.
  • Authenticate with same user.
  • Capture errors (if any) and update Setup.md.
  1. Decide next demo hardening step.
  • Keep bring-up-only mode, or
  • add signature verification for attestation/assertion.

Exit criteria:

  • Register and login both complete with card interaction prompts.

Status (2026-04-24):

  • Completed in k_proxy using http://localhost:8765.
  • Registration result: ok=true, username=alice, credential_count=1.
  • Authentication result: ok=true, username=alice, authenticated=true.

Phase 5: Implement Proxy Auth + Session Reuse

  1. Authenticate via card once per session window.
  • k_proxy handles initial auth using connected card.
  • On success, create session state for k_client.
  1. Session model.
  • Prefer server-side session store or signed session token.
  • Include TTL/expiry, rotation, and explicit invalidation/logout path.
  • Do not expose card secrets or long-lived auth material to k_client.
  1. Proxying behavior.
  • With valid session: k_proxy forwards request to k_server and returns result.
  • Without valid session: require fresh card-backed auth flow.

Exit criteria:

  • Repeated authorized requests do not require card interaction until session expiry.
  • Expired/invalid sessions are correctly rejected.

Status (2026-04-24):

  • Started with a runnable prototype:
    • /home/user/chromecard/k_proxy_app.py
    • /home/user/chromecard/k_server_app.py
    • /home/user/chromecard/PHASE5_RUNBOOK.md
  • Implemented in prototype:
    • session create/status/logout endpoints in k_proxy
    • TTL-based server-side session store with expiry garbage collection
    • protected monotonic counter endpoint in k_server with thread-safe increments
    • proxy forwarding from k_proxy to k_server using a shared upstream token
  • Current auth gate for session creation is card-presence probe (fido2_probe.py --json), pending upgrade to full assertion verification path.

Status (2026-04-25):

  • Prototype services were re-started successfully after VM restart.
  • Current split-VM test shape is:
    • k_proxy listening on 127.0.0.1:8771
    • k_server listening on 127.0.0.1:8780
  • Phase 5 application logic is runnable locally inside each VM, but end-to-end validation is still blocked by Phase 1 qrexec forwarding refusal.

Phase 5.5: Implement Dummy Resource + Access Policy on k_server

  1. Protected dummy resource.
  • Add endpoint returning increasing number.
  • Require valid upstream auth/session context from k_proxy.
  1. Optional user/session handling.
  • Add minimal user/session checks if k_server is chosen as authority (or partial authority).
  1. Correctness under concurrency.
  • Ensure increments are monotonic and race-safe under parallel calls.

Exit criteria:

  • Authorized requests obtain consistent increasing values.
  • Unauthorized requests are rejected.

Phase 6: Integrate Client Enrollment + Proxy Login Flow

  1. Enrollment process in k_client.
  • Start process from k_client that captures new-user enrollment intent/data.
  • Route enrollment requests to k_proxy over TLS.
  1. Card-mediated login in k_proxy.
  • k_proxy uses connected card for FIDO2/WebAuthn operations.
  • k_proxy authenticates toward k_server over TLS.
  1. Browser flow in k_client.
  • Browser traffic goes only to k_proxy.
  • Validate end-to-end login to k_server resource through proxy chain.

Exit criteria:

  • Enrollment and login both function end-to-end via k_client -> k_proxy -> k_server.

Status (2026-04-25):

  • Added first k_client implementation at /home/user/chromecard/k_client_portal.py.
  • Current prototype flow:
    • browser now targets k_proxy directly over https://127.0.0.1:9771
    • k_client_portal.py remains only as a temporary bridge page
    • k_proxy continues to authenticate with the card and forward to k_server
  • Verified end-to-end through the portal:
    • enroll alice
    • login succeeds
    • session status succeeds
    • protected counter succeeds repeatedly with session reuse
    • logout succeeds
  • Enrollment contract progress:
    • k_proxy now exposes prototype enrollment endpoints
    • proxy-side enrollment storage exists and is checked before login is allowed
    • direct browser/API traffic can now use those proxy endpoints without going through the local bridge
  • Phase 6 is materially further along for the current prototype shape:
    • direct browser target is on k_proxy
    • login/resource flow is integrated on the direct proxy path
    • enrollment now has a real client->proxy path
    • the k_client bridge remains only for transition/compatibility
    • final enrollment semantics are still provisional

Phase 6.5: Concurrency and Multi-Client Test Setup

  1. Single-VM concurrency tests.
  • Generate parallel request bursts from k_client to k_proxy.
  • Verify response integrity, session reuse behavior, and error rates.
  1. Multi-client tests.
  • Run requests from multiple k_client instances (or equivalent parallel clients) concurrently.
  • Verify isolation between users/sessions.
  1. Acceptance checks.
  • No race-related crashes/corruption in k_proxy or k_server.
  • Counter/resource behavior remains correct under load.
  • Session reuse reduces card prompts while preserving authorization checks.

Exit criteria:

  • Test results demonstrate stable concurrent operation with documented limits.

Phase 7: Restore Firmware Build/Flash Path

  1. Validate SDK tree completeness.
  • Confirm presence of mvp, setup, components, samples under CR_SDK_CK-main.
  • If missing, obtain full repository/checkpoint and document source.
  1. Install/enable build tools.
  • Ensure west and nrfjprog are available in shell.
  • Confirm target board/toolchain match (nrf7002dk/nrf5340/cpuapp, NCS v2.9.2 baseline in docs).
  1. Run baseline build+flash.
  • From CR_SDK_CK-main, run ./scripts/build_flash_mvp.sh.
  • If flashing fails, run documented recovery and retry.

Exit criteria:

  • Successful west build and west flash.

Phase 8: Consolidate Documentation and Paths

  1. Remove path drift between docs and actual files.
  • Keep fido2_probe.py and webauthn_local_demo.py at workspace root.
  • Ensure docs never instruct placing helper scripts under CR_SDK_CK-main.
  • Update references consistently in all docs.
  1. Keep Setup.md current.
  • After each significant change, update status snapshot and outcomes.
  1. Add minimal reproducibility checklist.
  • One command list for probe + demo + build/flash prechecks.
  1. Maintain Markdown execution records continuously.
  • Setup.md and Workplan.md are the canonical living docs for this workspace.
  • Re-scan relevant .md files before each new execution cycle and reconcile drift.
  • Record date-stamped session notes when priorities or blockers change.

Status (2026-04-24, markdown maintenance):

  • Re-scanned the active workspace Markdown set and the main source-tree reference docs.
  • No workplan phase change was required from this pass.
  • Ongoing documentation watch item remains path drift in CR_SDK_CK-main/README_HOST.md, which still uses historical ./scripts/... helper locations instead of workspace-root helper paths.
  • Operational note: the markdown scan path now runs cleanly after policy adjustment when invoked without a login shell.

Status (2026-04-24, chain probe retry):

  • Phase 1 remains blocked, but the failure point is now narrowed further:
    • current refusal occurs at Qubes qubes.ConnectTCP policy/service evaluation for ports 22, 8770, and 8780
    • this happens before any end-to-end app-level request can be retried
  • Practical implication:
    • do not spend time on k_proxy_app.py / k_server_app.py request handling until qrexec forwarding is permitting the intended hops again
    • next recovery action is to fix/activate the relevant Qubes qubes.ConnectTCP policy and then re-run the qrexec bridge checks before testing HTTP flow

Status (2026-04-25, post-restart probe):

  • Corrected the client-facing proxy port reference to 8771.
  • SSH access to k_proxy and card visibility recovered after VM restart.
  • New immediate blockers are:
    • k_proxy service not listening on 127.0.0.1:8771
    • k_server service not listening on 127.0.0.1:8780
    • qrexec forwarding for 8771 and 8780 still returns Request refused
  • Next retry should start services first, then re-test qrexec forwarding and only then attempt end-to-end client flow.

Status (2026-04-25, service restart):

  • Local VM services are running again on the intended loopback ports:
    • k_server: 127.0.0.1:8780
    • k_proxy: 127.0.0.1:8771
  • Phase 1 remains blocked specifically by qrexec policy/forwarding refusal on those ports.
  • Next action is no longer app startup; it is fixing the qubes.ConnectTCP allow path for 8771 and 8780.

Status (2026-04-25, in-VM forwarding test):

  • Verified that using qvm-connect-tcp inside the source VMs still does not complete the client->proxy hop:
    • bind succeeds locally, but first real connection gets Request refused
  • Independent app-layer blocker also found in k_proxy:
    • python-fido2 is missing there, so local /session/login currently fails before card auth can succeed
  • Current ordered blockers:
    • first: effective Qubes/qrexec allow path for k_client -> k_proxy:8771
    • second: install python-fido2 in k_proxy
    • third: re-test end-to-end login and then proxy->server counter flow

Status (2026-04-25, after python3-fido2 install):

  • python3-fido2 blocker in k_proxy is resolved.
  • Updated ordered blockers:
    • first: effective Qubes/qrexec allow path for k_client -> k_proxy:8771
    • second: restore CTAP HID device visibility/access in k_proxy (No CTAP HID devices found)
    • third: re-test end-to-end login and then proxy->server counter flow

Status (2026-04-25, card reattached):

  • CTAP HID visibility/access in k_proxy is restored.
  • Local proxy login is working again with the attached card.
  • The only currently confirmed blocker for the end-to-end path is the k_client -> k_proxy:8771 qrexec/qvm-connect-tcp refusal.

Status (2026-04-25, clean forward retest):

  • The retest shows the same qrexec failure mode on both hops, not just the client-facing one.
  • Updated blocker statement:
    • effective qubes.ConnectTCP allow path is failing for both
      • k_client -> k_proxy:8771
      • k_proxy -> k_server:8780
  • App services and card path are currently good; forwarding remains the single active system blocker.

Status (2026-04-25, dom0 policy fix validated):

  • The explicit-destination dom0 qubes.ConnectTCP policy fix resolved forwarding on both hops.
  • Current verified working chain:
    • k_client -> k_proxy:8771
    • k_proxy -> k_server:8780
  • Current verified prototype behavior:
    • session login works from k_client
    • session status works
    • protected counter flow reaches k_server
    • session reuse avoids re-login for repeated counter calls
    • logout invalidates the session and subsequent protected access returns 401
  • Immediate networking blocker is cleared.

Exit criteria:

  • New team member can follow docs end-to-end without path or tooling ambiguity.

Phase 9: Migrate to Phone-Mediated Wireless Validation (Future)

  1. Auth transport abstraction in k_proxy.
  • Introduce/keep a transport interface for authenticator operations.
  • Implement at least two backends:
  • USB-direct backend (current).
  • Phone-wireless backend (future).
  1. Wireless phone integration.
  • Define protocol between k_proxy and phone service.
  • Define secure pairing/authentication and message integrity for wireless link.
  • Add timeout/retry behavior and offline handling.
  1. Functional equivalence tests.
  • Verify login/enrollment behavior is unchanged at API level for k_client.
  • Verify session reuse still works and card prompts are not increased unexpectedly.

Exit criteria:

  • k_proxy can validate via wireless phone path with no client-facing API changes.

Inputs Expected During This Session

  • Exact observed behavior on reconnect attempts (USB/hidraw/probe).
  • Whether we should pull server-side code now.
  • Any board/firmware variants different from default documentation assumptions.
  • Preferred TLS ports, certificate approach, and hostname scheme for k_client, k_proxy, k_server.
  • Session TTL and invalidation requirements for cached authenticated access.
  • Decision on where user/session authority lives (k_proxy vs k_server vs split).
  • Target concurrency level for validation (parallel clients and parallel requests per client).
  • Preferred wireless transport/protocol between k_proxy and phone (for future phase).

Session Maintenance Notes (2026-04-24)

  • Top-level Markdown review completed for PHASE5_RUNBOOK.md, Setup.md, and Workplan.md.
  • Current execution plan remains in sync with the Phase 5 runbook:
    • prototype services at /home/user/chromecard/k_proxy_app.py and /home/user/chromecard/k_server_app.py
    • run sequence documented in /home/user/chromecard/PHASE5_RUNBOOK.md
  • No phase ordering or blocker changes were required from this review pass.
  • Remote execution support is now active and validated:
    • ssh command execution works for k_client, k_proxy, k_server
    • scp push to VM home works (validated on k_proxy)