k_card/Workplan.md

42 KiB

Workplan

Last updated: 2026-04-29

This is the execution plan for making ChromeCard FIDO2 development and validation reproducible on this machine.

Constraints

  • Treat /home/user/chromecard/CR_SDK_CK-main as read-only.
  • Keep helper scripts such as fido2_probe.py and webauthn_local_demo.py at /home/user/chromecard.
  • Target deployment model is Qubes OS with 3 AppVMs based on debian-13-xfce: k_client, k_proxy, k_server.
  • Current authenticator link is card->k_proxy (USB), but architecture must allow migration to wireless phone-mediated validation.
  • VM execution path is SSH-first for experiments: ssh <host> <cmd> and scp <file> <host>:~.

Goals

  • Re-establish deterministic host-to-card FIDO2 communication over USB HID/CTAPHID.
  • Restore a buildable/flashable firmware workspace for CR_SDK_CK-main.
  • Turn ad-hoc demos into a repeatable verification flow.
  • Stand up chained TLS communication in Qubes: k_client -> k_proxy -> k_server.
  • Support both login flow (browser in k_client) and user enrollment flow (process in k_client).
  • Minimize repeated card prompts by introducing secure session reuse after successful authentication.
  • Implement a protected dummy resource on k_server (monotonic counter) for end-to-end validation.
  • Ensure k_proxy and k_server are thread-safe and support concurrent access.
  • Prepare k_proxy auth path for future transport shift: USB-direct -> wireless phone bridge.

Phase 0: Qubes VM Baseline (Blocking)

  1. Provision/verify AppVMs.
  • Ensure k_client, k_proxy, k_server exist and are based on debian-13-xfce.
  1. Assign functional responsibilities.
  • k_client: browser client + enrollment process.
  • k_proxy: USB card access + proxy/auth bridge.
  • k_server: protected resource/service endpoint.
  1. Define TLS endpoints and certificates.
  • k_proxy presents TLS service to k_client.
  • k_server presents TLS service to k_proxy.
  • Trust roots and cert distribution model documented per VM.

Exit criteria:

  • All 3 VMs exist, boot, and have clearly defined service ownership.

Phase 1: Qubes Firewall Policy

  1. Enforce allowed forward paths only.
  • Allow k_client outbound TLS only to k_proxy service port(s).
  • Allow k_proxy outbound TLS only to k_server service port(s).
  • Deny direct k_client to k_server traffic.
  1. Validate return path behavior.
  • Confirm responses propagate back through established flows.
  1. Verify with simple probes.
  • TLS handshake and HTTP(S) checks from k_client to k_proxy.
  • TLS handshake and HTTP(S) checks from k_proxy to k_server.

Exit criteria:

  • Policy matches intended chain and is test-verified.

Status (2026-04-24, remote diagnostics):

  • Confirmed active blocker remains Phase 1 network policy/pathing.
  • Evidence from live VM probes:
    • k_client (10.137.0.16) -> k_proxy (10.137.0.12:8771): TCP timeout.
    • k_proxy (10.137.0.12) -> k_server (10.137.0.13:8780): upstream timeout.
  • Local service health inside each VM is good, so failure is inter-VM reachability, not local process startup.

Status (2026-04-25, after restart and service recovery):

  • Refined blocker: this is currently a qrexec/qubes.ConnectTCP refusal problem, not an app-local listener problem.
  • Current evidence:
    • k_proxy local /health is up on 127.0.0.1:8771
    • k_server local /health is up on 127.0.0.1:8780
    • qrexec-client-vm k_proxy qubes.ConnectTCP+8771 -> Request refused
    • qrexec-client-vm k_server qubes.ConnectTCP+8780 -> Request refused
  • Immediate next action for Phase 1:
    • verify and fix the dom0 policy/mechanism that should permit qubes.ConnectTCP forwarding for the chain ports

Status (2026-04-25, dom0 policy fix validated):

  • The forwarding blocker is cleared for the current prototype shape.
  • Verified working chain:
    • k_client localhost 9771 -> k_proxy:8771
    • k_proxy localhost 9780 -> k_server:8780
  • Verified outcome:
    • TLS health checks pass on both hops
    • end-to-end login, session status, protected counter access, and logout all succeed from k_client
  • Phase 1 is complete for the current localhost-forwarded qubes.ConnectTCP design.

Phase 2: TLS Certificates and Service Endpoints

  1. Certificate model.
  • Create or import CA and issue certs for k_proxy and k_server.
  • Install trust roots in client VM(s) that need validation.
  1. Service shape.
  • k_server: HTTPS service exposing protected resource endpoint(s), including a monotonic counter endpoint.
  • k_proxy: minimal HTTPS API gateway service (full web server framework not required).
  1. Endpoint contract.
  • Define request/response schema between k_client and k_proxy.
  • Define upstream request contract from k_proxy to k_server.

Exit criteria:

  • Mutual TLS trust decisions are documented and tested.
  • HTTPS calls succeed on both links with expected cert validation.

Status (2026-04-25):

  • Implemented HTTPS listeners in both prototype services.
  • Added local CA + service certificate generation in generate_phase2_certs.py.
  • Verified the working Qubes path is localhost forwarding plus TLS:
    • k_client local 9771 forwards to k_proxy:8771
    • k_proxy local 9780 forwards to k_server:8780
  • Verified cert validation on both hops using the generated CA.
  • Verified end-to-end HTTPS flow:
    • k_client -> k_proxy login over TLS
    • k_proxy -> k_server protected counter call over TLS
    • session reuse still works across repeated protected requests
  • Phase 2 is now effectively complete for the current prototype shape.

Phase 2.5: Define State Ownership and Concurrency Model

  1. State ownership.
  • Decide where user/session state is authoritative (k_proxy, k_server, or split model).
  • Define token/session format and validation boundary.
  1. Concurrency controls.
  • Define thread-safe strategy for session store and shared counters.
  • Define locking/atomic/update semantics for counter increments and session updates.
  1. Runtime model.
  • Choose service runtime/config that supports simultaneous requests safely.

Exit criteria:

  • Architecture clearly documents state authority and race-free update rules.

Next action (2026-04-25):

  • Move into Phase 2.5 and make the current prototype decisions explicit:
    • authority for session state remains k_proxy
    • k_server remains authority for the protected counter/resource state
    • localhost Qubes forwarders are part of the active runtime model for the two TLS hops
    • define concurrency assumptions and limits around session store, forwarders, and counter access

Status (2026-04-25):

  • Current ownership model is now explicit:
    • k_proxy is authoritative for session creation, expiry, lookup, and logout
    • k_server is authoritative for the protected monotonic counter
    • k_client is a client only; it holds bearer tokens but is not a state authority
  • Current validation boundary is explicit:
    • k_proxy validates bearer tokens against its in-memory session store
    • k_server trusts only requests that arrive with the configured X-Proxy-Token
    • k_server does not currently validate end-user session tokens directly
  • Current concurrency strategy is explicit:
    • k_proxy uses ThreadingHTTPServer plus one lock around the in-memory session map
    • k_server uses ThreadingHTTPServer plus one lock around counter increments
    • upstream HTTPS calls from k_proxy are made outside the session-store lock
  • Current runtime limits are explicit:
    • sessions are process-local and disappear on k_proxy restart
    • counter state is process-local and resets on k_server restart
    • transport relies on Qubes localhost forwarders 9771 and 9780
  • Phase 2.5 is complete for the current prototype shape.

Phase 3: Recover Basic Device Visibility on k_proxy (Blocking)

  1. Verify physical + USB enumeration path.
  • Check cable/port and confirm device appears in USB listings.
  • Confirm /dev/hidraw* nodes appear when card is connected.
  1. Validate Linux permissions.
  • Install/update udev rule for ChromeCard HID VID/PID.
  • Reload udev and verify non-root read/write access to hidraw node.
  1. Re-run host probe.
  • Run python3 /home/user/chromecard/fido2_probe.py --list.
  • Run python3 /home/user/chromecard/fido2_probe.py --json.
  • Record VID/PID/path and CTAP2 getInfo output in Setup.md.

Exit criteria:

  • At least one CTAP HID device is listed.
  • --json returns valid ctap2_info.

Phase 4: Re-validate Local WebAuthn Demo on k_proxy

  1. Start local demo server.
  • Run python3 /home/user/chromecard/webauthn_local_demo.py.
  • Confirm URL is http://localhost:8765.
  1. Exercise register/login.
  • Register a test user.
  • Authenticate with same user.
  • Capture errors (if any) and update Setup.md.
  1. Decide next demo hardening step.
  • Keep bring-up-only mode, or
  • add signature verification for attestation/assertion.

Exit criteria:

  • Register and login both complete with card interaction prompts.

Status (2026-04-24):

  • Completed in k_proxy using http://localhost:8765.
  • Registration result: ok=true, username=alice, credential_count=1.
  • Authentication result: ok=true, username=alice, authenticated=true.

Phase 5: Implement Proxy Auth + Session Reuse

  1. Authenticate via card once per session window.
  • k_proxy handles initial auth using connected card.
  • On success, create session state for k_client.
  1. Session model.
  • Prefer server-side session store or signed session token.
  • Include TTL/expiry, rotation, and explicit invalidation/logout path.
  • Do not expose card secrets or long-lived auth material to k_client.
  1. Proxying behavior.
  • With valid session: k_proxy forwards request to k_server and returns result.
  • Without valid session: require fresh card-backed auth flow.

Exit criteria:

  • Repeated authorized requests do not require card interaction until session expiry.
  • Expired/invalid sessions are correctly rejected.

Status (2026-04-24):

  • Started with a runnable prototype:
    • /home/user/chromecard/k_proxy_app.py
    • /home/user/chromecard/k_server_app.py
    • /home/user/chromecard/PHASE5_RUNBOOK.md
  • Implemented in prototype:
    • session create/status/logout endpoints in k_proxy
    • TTL-based server-side session store with expiry garbage collection
    • protected monotonic counter endpoint in k_server with thread-safe increments
    • proxy forwarding from k_proxy to k_server using a shared upstream token
  • Current auth gate for session creation is card-presence probe (fido2_probe.py --json), pending upgrade to full assertion verification path.

Status (2026-04-25):

  • Prototype services were re-started successfully after VM restart.
  • Current split-VM test shape is:
    • k_proxy listening on 127.0.0.1:8771
    • k_server listening on 127.0.0.1:8780
  • End-to-end validation is now passing through the live chain from k_client.
  • Current verified behavior:
    • login succeeds for alice
    • session status succeeds
    • repeated protected counter requests succeed with session reuse
    • logout succeeds
    • post-logout protected access returns 401
  • Added repeatable host-side regression helper:
    • /home/user/chromecard/phase5_chain_regression.sh
  • Phase 5 is complete for the current prototype semantics.
  • Experimental follow-up in code:
    • k_proxy_app.py now also has --auth-mode fido2-direct
    • this mode attempts direct credential registration and direct assertion verification with python-fido2
    • it is not the deployed default because direct registration currently fails on k_proxy with No compatible PIN/UV protocols supported!
    • /home/user/chromecard/raw_ctap_probe.py now exists for lower-level CTAP2 probing with keepalive/error logging
    • latest retry result: after reattaching the card, k_proxy again exposes /dev/hidraw0 and /dev/hidraw1, but raw makeCredential still reaches no Yes/No card prompt
    • /dev/hidraw0 opens successfully as the normal user; /dev/hidraw1 is still permission-denied
    • manual CTAPHID testing now shows /dev/hidraw0 is the correct FIDO interface and a direct INIT write gets no response at all
    • rerunning webauthn_local_demo.py inside k_proxy also still gives no card prompt, so the current break is below both browser WebAuthn and direct host probes
    • after a full power cycle and reattach, manual CTAPHID INIT replies again and browser registration in webauthn_local_demo.py succeeds again
    • direct raw_ctap_probe.py --device-path /dev/hidraw0 make-credential --rp-id localhost now also succeeds again after card confirmation
    • k_proxy_app.py --auth-mode fido2-direct has been moved onto low-level CTAP2 with hidraw auto-detection; it still accepts --direct-device-path, but no longer breaks if the card re-enumerates onto /dev/hidraw1
    • after repeated fixes for hidraw lifetime, VM-side python-fido2 response mapping, and CTAP payload shape, real app registration now succeeds for directtest

Phase 5.5: Implement Dummy Resource + Access Policy on k_server

  1. Protected dummy resource.
  • Add endpoint returning increasing number.
  • Require valid upstream auth/session context from k_proxy.
  1. Optional user/session handling.
  • Add minimal user/session checks if k_server is chosen as authority (or partial authority).
  1. Correctness under concurrency.
  • Ensure increments are monotonic and race-safe under parallel calls.

Exit criteria:

  • Authorized requests obtain consistent increasing values.
  • Unauthorized requests are rejected.

Status (2026-04-25):

  • The protected counter resource is implemented and validated in the live split-VM chain.
  • Verified behavior:
    • authorized requests from k_proxy obtain increasing values
    • unauthorized post-logout requests from k_client are rejected with 401
    • 20 concurrent protected requests through the chain returned unique, gap-free values
  • Phase 5.5 is complete for the current prototype shape.

Phase 6: Integrate Client Enrollment + Proxy Login Flow

  1. Enrollment process in k_client.
  • Start process from k_client that captures new-user enrollment intent/data.
  • Route enrollment requests to k_proxy over TLS.
  1. Card-mediated login in k_proxy.
  • k_proxy uses connected card for FIDO2/WebAuthn operations.
  • k_proxy authenticates toward k_server over TLS.
  1. Browser flow in k_client.
  • Browser traffic goes only to k_proxy.

Immediate next action:

  • Preserve the now-working direct auth path as a tested option while keeping the default deployed baseline stable.
  • Verified end-to-end state:
    • direct /enroll/register succeeds for directtest
    • direct /session/login succeeds for directtest
    • /session/status succeeds
    • protected /resource/counter succeeds through k_proxy -> k_server
    • /session/logout succeeds
    • post-logout protected access returns 401
  • Next work should be cleanup/hardening:
    • decide whether to keep directtest enrollment
    • rerun phase5_chain_regression.sh --interactive-card --expect-auth-mode fido2_assertion against the current direct-auth baseline
    • decide when fido2-direct should replace probe as the default deployed auth mode

Exit criteria:

  • Enrollment and login both function end-to-end via k_client -> k_proxy -> k_server.

Status (2026-04-25):

  • Added first k_client implementation at /home/user/chromecard/k_client_portal.py.
  • Current prototype flow:
    • browser now targets k_proxy directly over https://127.0.0.1:9771
    • k_client_portal.py also serves a local browser flow page on http://127.0.0.1:8766
    • k_proxy continues to authenticate with the card and forward to k_server
    • the k_client page now also lists registered users from k_proxy
    • the k_client page can unregister users from the browser
    • the portal login action now uses the current username field instead of only the remembered local user
    • a Playwright regression spec now exists for the browser flow in tests/k_client_portal.spec.js
    • the Playwright browser regression has now passed end-to-end once from this host against a forwarded portal URL
  • Verified end-to-end through the portal:
    • enroll alice
    • login succeeds
    • session status succeeds
    • protected counter succeeds repeatedly with session reuse
    • logout succeeds
  • Enrollment contract progress:
    • k_proxy now exposes prototype enrollment endpoints
    • proxy-side enrollment storage exists and is checked before login is allowed
    • direct browser/API traffic can now use those proxy endpoints without going through the local bridge
  • Phase 6 is materially further along for the current prototype shape:
    • direct browser target is on k_proxy
    • login/resource flow is integrated on the direct proxy path
    • enrollment now has a real client->proxy path
    • the k_client page is now a usable demo/operator surface in addition to the direct proxy path
    • final enrollment semantics are still provisional

Status (2026-04-25, enrollment hardening):

  • Added a more explicit provisional enrollment contract in k_proxy:
    • username normalization and validation
    • optional display_name
    • separate create, update, delete, status, and list operations
    • delete invalidates existing sessions for that username
  • Verified the hardened behaviors on the direct proxy path.
  • Phase 6 is now strong enough to treat the browser/proxy flow as a stable prototype baseline.
  • The remaining reason Phase 6 is not "final" is product semantics, not missing basic mechanics:
    • whether enrollment should require card presence
    • what user attributes belong in enrollment
    • what re-enroll and recovery should mean

Status (2026-04-25, Phase 6.5 initial concurrency results):

  • Added reproducible probe script at /home/user/chromecard/phase65_concurrency_probe.py.
  • Probe now supports --max-workers so client-side fan-out can be tested separately from total request count.
  • Moderate direct-path concurrency passes:
    • 3 users x 4 requests
    • 12/12 successful protected calls
    • counter values remained unique and contiguous
  • Larger direct-path concurrency currently fails:
    • 5 users x 5 requests
    • only 18/25 successful protected calls
    • failed calls report TLS EOF / upstream unavailable errors
  • Follow-up findings are more precise:
    • body-drain handling was fixed for the HTTP/1.1 keep-alive experiment
    • k_proxy -> k_server upstream concurrency is now clampable and currently tested at one pooled connection
    • 5 users x 5 requests passes at 25/25 when client fan-out is limited to --max-workers 10
    • the same total load still fails at higher fan-out:
      • 22/25 at --max-workers 15
      • 15/25 at fully unbounded 25 workers in the latest rerun
  • Current bottleneck is still not counter correctness:
    • successful results still show unique, contiguous counter values
    • k_proxy and k_server complete the requests that actually arrive
  • Current likely bottleneck is the client-facing Qubes forwarding layer:
    • qvm_connect_9771.log shows qrexec data-vchan failures
    • observed message includes xs_transaction_start: No space left on device
    • qvm_connect_9780.log showed earlier failures too, but the latest threshold test points first to connection fan-out on k_client -> k_proxy
  • Phase 6.5 is therefore started but not complete:
    • application-level concurrency looks acceptable at moderate load
    • current working envelope is roughly 10 in-flight protected calls on the direct browser path
    • higher-load failures still need Qubes forwarding diagnosis before the phase can be closed

Status (2026-04-25, Phase 5 regression helper):

  • Added repeatable split-VM regression helper:
    • /home/user/chromecard/phase5_chain_regression.sh
  • Verified helper result on the live chain:
    • 20 requests at parallelism 8
    • login/session-status/counter/logout sequence completed successfully
    • returned counter values were unique and gap-free
    • latest verified helper range was 43..62
  • Current implication:
    • the Phase 5 baseline is now reproducible
    • next work should target auth semantics rather than basic chain bring-up

Phase 6.5: Concurrency and Multi-Client Test Setup

  1. Single-VM concurrency tests.
  • Generate parallel request bursts from k_client to k_proxy.
  • Verify response integrity, session reuse behavior, and error rates.
  1. Multi-client tests.
  • Run requests from multiple k_client instances (or equivalent parallel clients) concurrently.
  • Verify isolation between users/sessions.
  1. Acceptance checks.
  • No race-related crashes/corruption in k_proxy or k_server.
  • Counter/resource behavior remains correct under load.
  • Session reuse reduces card prompts while preserving authorization checks.

Exit criteria:

  • Test results demonstrate stable concurrent operation with documented limits.

Phase 7: Restore Firmware Build/Flash Path

  1. Validate SDK tree completeness.
  • Confirm presence of mvp, setup, components, samples under CR_SDK_CK-main.
  • If missing, obtain full repository/checkpoint and document source.
  1. Install/enable build tools.
  • Ensure west and nrfjprog are available in shell.
  • Confirm target board/toolchain match (nrf7002dk/nrf5340/cpuapp, NCS v2.9.2 baseline in docs).
  1. Run baseline build+flash.
  • From CR_SDK_CK-main, run ./scripts/build_flash_mvp.sh.
  • If flashing fails, run documented recovery and retry.

Exit criteria:

  • Successful west build and west flash.

Phase 8: Consolidate Documentation and Paths

  1. Remove path drift between docs and actual files.
  • Keep fido2_probe.py and webauthn_local_demo.py at workspace root.
  • Ensure docs never instruct placing helper scripts under CR_SDK_CK-main.
  • Update references consistently in all docs.
  1. Keep Setup.md current.
  • After each significant change, update status snapshot and outcomes.
  1. Add minimal reproducibility checklist.
  • One command list for probe + demo + build/flash prechecks.
  1. Maintain Markdown execution records continuously.
  • Setup.md and Workplan.md are the canonical living docs for this workspace.
  • Re-scan relevant .md files before each new execution cycle and reconcile drift.
  • Record date-stamped session notes when priorities or blockers change.

Status (2026-04-24, markdown maintenance):

  • Re-scanned the active workspace Markdown set and the main source-tree reference docs.
  • No workplan phase change was required from this pass.
  • Ongoing documentation watch item remains path drift in CR_SDK_CK-main/README_HOST.md, which still uses historical ./scripts/... helper locations instead of workspace-root helper paths.
  • Operational note: the markdown scan path now runs cleanly after policy adjustment when invoked without a login shell.

Status (2026-04-24, chain probe retry):

  • Phase 1 remains blocked, but the failure point is now narrowed further:
    • current refusal occurs at Qubes qubes.ConnectTCP policy/service evaluation for ports 22, 8770, and 8780
    • this happens before any end-to-end app-level request can be retried
  • Practical implication:
    • do not spend time on k_proxy_app.py / k_server_app.py request handling until qrexec forwarding is permitting the intended hops again
    • next recovery action is to fix/activate the relevant Qubes qubes.ConnectTCP policy and then re-run the qrexec bridge checks before testing HTTP flow

Status (2026-04-25, post-restart probe):

  • Corrected the client-facing proxy port reference to 8771.
  • SSH access to k_proxy and card visibility recovered after VM restart.
  • New immediate blockers are:
    • k_proxy service not listening on 127.0.0.1:8771
    • k_server service not listening on 127.0.0.1:8780
    • qrexec forwarding for 8771 and 8780 still returns Request refused
  • Next retry should start services first, then re-test qrexec forwarding and only then attempt end-to-end client flow.

Status (2026-04-25, service restart):

  • Local VM services are running again on the intended loopback ports:
    • k_server: 127.0.0.1:8780
    • k_proxy: 127.0.0.1:8771
  • Phase 1 remains blocked specifically by qrexec policy/forwarding refusal on those ports.
  • Next action is no longer app startup; it is fixing the qubes.ConnectTCP allow path for 8771 and 8780.

Status (2026-04-25, in-VM forwarding test):

  • Verified that using qvm-connect-tcp inside the source VMs still does not complete the client->proxy hop:
    • bind succeeds locally, but first real connection gets Request refused
  • Independent app-layer blocker also found in k_proxy:
    • python-fido2 is missing there, so local /session/login currently fails before card auth can succeed
  • Current ordered blockers:
    • first: effective Qubes/qrexec allow path for k_client -> k_proxy:8771
    • second: install python-fido2 in k_proxy
    • third: re-test end-to-end login and then proxy->server counter flow

Status (2026-04-25, after python3-fido2 install):

  • python3-fido2 blocker in k_proxy is resolved.
  • Updated ordered blockers:
    • first: effective Qubes/qrexec allow path for k_client -> k_proxy:8771
    • second: restore CTAP HID device visibility/access in k_proxy (No CTAP HID devices found)
    • third: re-test end-to-end login and then proxy->server counter flow

Status (2026-04-25, card reattached):

  • CTAP HID visibility/access in k_proxy is restored.
  • Local proxy login is working again with the attached card.
  • The only currently confirmed blocker for the end-to-end path is the k_client -> k_proxy:8771 qrexec/qvm-connect-tcp refusal.

Status (2026-04-25, clean forward retest):

  • The retest shows the same qrexec failure mode on both hops, not just the client-facing one.
  • Updated blocker statement:
    • effective qubes.ConnectTCP allow path is failing for both
      • k_client -> k_proxy:8771
      • k_proxy -> k_server:8780
  • App services and card path are currently good; forwarding remains the single active system blocker.

Status (2026-04-25, dom0 policy fix validated):

  • The explicit-destination dom0 qubes.ConnectTCP policy fix resolved forwarding on both hops.
  • Current verified working chain:
    • k_client -> k_proxy:8771
    • k_proxy -> k_server:8780
  • Current verified prototype behavior:
    • session login works from k_client
    • session status works
    • protected counter flow reaches k_server
    • session reuse avoids re-login for repeated counter calls
    • logout invalidates the session and subsequent protected access returns 401
  • Immediate networking blocker is cleared.

Exit criteria:

  • New team member can follow docs end-to-end without path or tooling ambiguity.

Phase 9: Migrate to Phone-Mediated Wireless Validation

Status (2026-05-04): ACTIVE — Architecture v2 adopted; Component 1 + Component 2 CONNECT handler complete

Architecture v2 changes (2026-05-04)

The following changes replace the v1 architecture. Source: chromecard_arkitektur_v2.docx.

Component 2 no longer calls endpoints: Component 2 returns the WebAuthn token to whoever asked (Component 1). It is Component 1 that calls the endpoint with the token. This is the most important behavioral change.

New Component 3 (external client): A compiled binary (Go recommended, Rust alternative) installed on external client computers. Replaces the old browser-proxy-configuration approach. Tasks: find the phone (currently hardcoded IP+port — rendezvous TBD), forward validation requests to Component 1, receive token back, call the protected endpoint directly, return response to browser.

Flow A splits into two paths:

  • Phone browser: Browser → Component 1 → Component 2 (returns token) → Component 1 calls endpoint → resource
  • External client: Browser → Component 3 → Component 1 → Component 2 (returns token) → Component 1 → Component 3 calls endpoint → resource

Platform note: Android needs no extra infrastructure. iOS requires a push-relay (APNs) for background operation — platform priority is an open decision.

New open decisions: Rendezvous mechanism for Component 3; iOS vs Android priority.

Architectural decision (2026-05-08) — token binding model: Current choice: per-request authentication. No session is opened. Each request to a gated resource requires a fresh FIDO2 assertion from the card, with the challenge bound to the specific request (URL + method + nonce). The server verifies that the assertion's challenge matches the resource being requested. A token cannot be replayed for a different resource. Consequence: one card interaction per request. This is intentional for now. May change to: session model (one card interaction opens a time-limited session for all gated resources). If changed, token must at minimum be bound to a specific server (audience) to prevent cross-server replay. Trigger for revisiting: user experience — if per-request card interaction proves too slow or disruptive.

Target architecture (v2)

Four physical devices: optional client computer, phone, chromecard, server.

Phone components:

  • Component 1 — Proxy + gating filter: Receives requests from phone browser and from external clients via Component 3. Per-request: gated host → forward to Component 2, receive WebAuthn token back, call endpoint with token (TLS); non-gated → forward directly to internet on port 80 (no TLS, bypasses auth entirely).
  • Component 2 — WebAuthn client + URL recognition: Always returns token to caller, never calls endpoints itself. Detects registration URL → admin registration flow (admin fingerprint); other gated URLs → FIDO2 assertion flow (user fingerprint → token returned to Component 1).
  • Registration page: Local web app on phone; admin fingerprint access control enforced by card.
  • Component 3 (external client): Compiled binary, finds phone, relays auth through Component 1, calls endpoint with received token.

Three flows:

  • Flow A (phone browser): Browser → Comp 1 → Comp 2 → card → token → Comp 1 → endpoint → resource
  • Flow A (external client): Browser → Comp 3 → Comp 1 → Comp 2 → card → token → Comp 1 → Comp 3 → endpoint → resource
  • Flow B: Browser → Comp 1 → Comp 2 (registration URL) → card (admin biometric) → enroll/delete user
  • Flow C: Non-gated host → Comp 1 → internet port 80 (no TLS, no card)

Open decisions: PIN on card; user DB on-card vs. external; network-level access control on registration page; Component 3 rendezvous mechanism; iOS vs Android priority.

Development chain (Qubes): k_client browser → k_phone (Flutter Android) → USB HID → ChromeCard → k_server

The k_phone Flutter app replaces k_proxy entirely. It presents the same HTTP API as k_proxy_app.py so k_client_portal.py and the browser portal work without changes.

Development environment: Mac (not Qubes). Android emulator is incompatible with Xen/Qubes. All k_phone development and testing runs on the Mac with the Android emulator and card_emulator_bridge.py.

Work completed (2026-04-29)

  • Flutter project scaffolded at k_phone/ (no flutter create — fully hand-written)
  • 10+ Android build issues resolved (AGP, Gradle, Kotlin, desugaring, notification channel, foreground service type)
  • k_phone/lib/ctaphid_channel.dart: full CTAPHID framing + USB/emulator dual-transport
    • Fixed: persistent socket subscription (single-subscription stream cannot use await for ... break per packet)
    • Fixed: _emulatorSocketOpen flag prevents dead-socket writes from raising StateError
    • Fixed: emulator round-trip sends all request packets before reading (no per-packet blocking)
  • k_phone/lib/proxy_service.dart: full HTTP proxy — all endpoints implemented, error handling hardened
    • Fixed: card-error try-catch separated from DB StateError catch (was masking socket errors as "user already enrolled")
    • autoStart: true for emulator testing; revert to false for production builds
  • k_phone/lib/enrollment_db.dart: enrollment model + JSON persistence via path_provider
  • k_phone/lib/fido2_ops.dart: CTAP2 makeCredential, getAssertion, ECDSA-P256 assertion verification
    • Fixed: CTAP2 command prefix bytes (0x01/0x02) prepended to CBOR payload per CTAP2-over-CTAPHID spec
  • k_phone/lib/session_manager.dart: in-memory bearer token sessions; hasAnyActiveSession() added for gated-proxy forwarding (personal-device model: any live session authorises gated traffic)
  • k_phone/lib/k_server_client.dart: HTTP forwarder to k_server
  • k_phone/android/app/src/main/kotlin/.../MainActivity.kt: USB HID Kotlin platform channel
  • tests/card_emulator_bridge.py: asyncio CTAPHID TCP bridge wrapping CardEmulator for emulator dev

Work completed (2026-05-02)

  • k_phone/lib/filter_proxy.dart: Component 1 implemented — HTTP proxy with gating filter
    • Plain HTTP to gated host: rewritten to relative path and forwarded to Component 2
    • HTTPS CONNECT to gated host: CONNECT request relayed to Component 2; tunnel opened on 200, denied on 4xx
    • All other traffic forwarded directly to target host
    • Gated hosts file: gated_hosts.txt in app documents directory (one host or host:port per line)
    • Default seeded with httpbin.org on first run
  • k_phone/test/filter_proxy_test.dart: full test suite for Component 1 (gated matching, HTTP routing, CONNECT routing, edge cases)
  • k_phone/test/enrollment_test.dart: full test suite for EnrollmentDb (register, list, delete, persistence, update)

Work completed (2026-05-02, session 2)

  • k_phone/lib/proxy_service.dart: _handleConnect added to _ProxyServer
    • Dispatched from _handleRequest for CONNECT method
    • Checks _sessions.hasAnyActiveSession() — returns 407 if no active session
    • Extracts upstream host:port from Host header
    • Opens TCP socket to upstream target (the real external server — httpbin.org, etc.)
    • Detaches the HTTP socket (detachSocket(writeHeaders: false)) and writes 200 Connection Established manually
    • Pipes bytes bidirectionally: client ↔ upstream
    • k_server is not involved in CONNECT tunnels; Component 2 connects directly to the real target

Verified on emulator (2026-04-29)

POST /enroll/register  → makeCredential via bridge → has_credential: true  ✓
POST /session/login    → getAssertion + ECDSA verify → auth_mode: fido2_assertion  ✓
POST /session/status   → 299 s remaining  ✓
POST /session/logout   → invalidated: true  ✓
POST /resource/counter → internal error (k_server not running locally — expected)
POST /resource/counter (after logout) → 401 invalid or expired session  ✓

Bridge log confirmed:

CTAP2 cmd=0x01 body=180 bytes → makeCredential OK auth_data=164 bytes
CTAP2 cmd=0x02 body=113 bytes → getAssertion OK auth_data=37 bytes sig=71 bytes

Work completed (2026-05-05, v2 architecture refactor)

k_phone (Dart):

  • filter_proxy_test.dart: rewritten for v2 semantics — gated HTTP now hits a mock endpoint with Bearer token, not Component 2 directly. 24/24 tests pass.
  • filter_proxy.dart: extracted _writeProxyHeaders and _forwardHttpRequest helpers to eliminate ~30 lines of duplication between _handleGatedHttp and _handleDirectHttp; simplified _handleDirectHttp signature (redundant host/port params removed).
  • session_manager.dart: added static const int ttlSeconds = 300 (public); _ttl now references it.
  • portal_html.dart (new): extracted 400-line HTML blobs (kPortalHtml, kEnrollHtml, kPortalHtmlBytes, kEnrollHtmlBytes) from proxy_service.dart.
  • proxy_service.dart: imports portal_html.dart; removed _kSessionTtlSeconds constant (replaced with SessionManager.ttlSeconds); merged _serveHtml/_serveEnrollHtml into _serveHtmlBytes(req, bytes); extracted _parseUsername and _parseUsernameAndDisplay helpers eliminating repeated validation boilerplate; removed dead _loadTlsContext stub; simplified start() TLS branch. File: 872 → 455 lines.
  • k_server_client.dart: deleted (dead code — no longer imported anywhere).

component3 (Go):

  • gated.go: IsGated(host, port string) — was IsGated(host string). Was silently missing host:port entries in gated_hosts.txt. Now checks both bare hostname and host:port.
  • proxy.go: handleHTTP extracts port from URL (defaults "80"), passes to IsGated; handleConnect passes portStr to IsGated.
  • phone.go: added getToken() calling /auth/get-token — avoids FIDO2 card interaction if the phone already has an active session. EnsureSession() tries getToken() first, falls back to login(). Fixed login() JSON field: expires_inttl_seconds (actual server field name). go build ./... passes.

Parallel-change note: Component 1 and Component 3 share the same proxy logic

Component 3 (component3/) and Component 1 (k_phone/lib/filter_proxy.dart) implement the same core behaviour: intercept HTTP/HTTPS traffic, decide per-request whether the target is gated, fetch a WebAuthn token if so, and call the endpoint directly with the token. Any structural change to one (new gating logic, token-binding changes, CONNECT handling, error semantics) will almost certainly need a corresponding change in the other. Treat them as a pair: when modifying Component 3, check Component 1 for the same fix, and vice versa.

Work completed (2026-05-08, per-request token binding)

  • fido2_ops.dart: GetAssertionResult now includes clientDataJson; getAssertion() accepts optional challenge param for binding.
  • proxy_service.dart: _handleAuthGetToken rewritten — accepts {url, method, nonce}, derives challenge = SHA256(url|method|nonce), calls card (getAssertion), returns self-contained assertion bundle as base64url Bearer token. No session involved.
  • filter_proxy.dart: _getAuthToken(uri, method) generates a secure 16-byte nonce, posts {url, method, nonce} to Component 2, uses returned assertion token directly.
  • component3/phone.go: rewritten as stateless GetTokenForRequest(url, method) — no session caching, no mutex, no expiry tracking.
  • component3/proxy.go: handleHTTP uses GetTokenForRequest(r.URL.String(), r.Method).
  • component3/main.go: --user flag removed (Component 2 picks the enrolled user).
  • k_server_app.py: _verify_assertion_token() added — decodes bundle, verifies path+method match, verifies challenge claim, verifies ECDSA-P256 signature over authData||clientDataHash using public key extracted from bundle's credentialData. _is_proxy_authorized() accepts either X-Proxy-Token (legacy k_proxy path) or Bearer assertion token.
  • 46/46 Flutter tests pass; go build ./... clean; flutter analyze no issues.

Next action

  1. Deploy to a real Android phone with physical ChromeCard via USB
  2. Verify USB HID path (Kotlin MainActivity.kt platform channel, hidraw node auto-detection)
  3. Run phase5_chain_regression.sh against k_phone on Android with k_server running

k_phone API contract (must match k_proxy_app.py exactly)

  • GET /health
  • POST /enroll/register {"username","display_name"}
  • GET /enroll/status?username=
  • POST /enroll/update {"username","display_name"}
  • POST /enroll/delete {"username"}
  • GET /enroll/list
  • POST /session/login {"username"}
  • POST /session/status
  • POST /session/logout
  • POST /resource/counter (forwarded to k_server with X-Proxy-Token)

Key design decisions

  • rp_id: "localhost", origin: "https://localhost" (matches k_proxy_app.py defaults)
  • clientDataHash = SHA256(clientDataJSON), where clientDataJSON = {"type":"webauthn.create","challenge":"<b64>","origin":"https://localhost","crossOrigin":false}
  • credential_data_b64 stores AttestedCredentialData bytes = aaguid(16) + credIdLen(2) + credId(n) + coseKey
  • Signature verification: ECDSA-SHA256(authData || clientDataHash, P-256 pubKey extracted from COSE key)
  • No begin/complete HTTP round-trip — registration and auth are each a single HTTP call (same as Python)
  • Sessions: server-side in-memory, TTL 300 s (matching Python default), token = 32-byte hex

start bridge for emulator testing

uv run --python 3.12 --with fido2 --with cbor2 --with cryptography tests/card_emulator_bridge.py

Phase 9 exit criteria

  • k_phone presents identical HTTP API to k_proxy_app.py (so k_client works unchanged)
  • Registration and login both complete via card_emulator_bridge.py in emulator testing
  • With physical ChromeCard plugged into Android phone: full register → login → counter → logout works
  • phase5_chain_regression.sh passes against k_phone on Android

Current Next Step

Status (2026-04-29):

  • Phase 9 emulator milestone complete: makeCredential + getAssertion verified via CardEmulator bridge.
  • Next blocking step: deploy to real Android phone with ChromeCard over USB.
  • k_server is not running in the Mac test environment; counter endpoint will work once running in Qubes.

Phase status (2026-04-29):

  • Phase 6.5 (concurrency): deferred. ~10 in-flight ceiling is acceptable.
  • Phase 7 (firmware build/flash): blocked on Chrome Roads (card vendor).
  • Phase 9 (phone integration): emulator FIDO2 verified; physical phone + USB HID path is next.

Status (2026-04-26, markdown maintenance):

  • Re-scanned Setup.md, Workplan.md, and PHASE5_RUNBOOK.md against the current workspace files.

Inputs Expected During This Session

  • Exact observed behavior on reconnect attempts (USB/hidraw/probe).
  • Whether we should pull server-side code now.
  • Any board/firmware variants different from default documentation assumptions.
  • Preferred TLS ports, certificate approach, and hostname scheme for k_client, k_proxy, k_server.
  • Session TTL and invalidation requirements for cached authenticated access.
  • Decision on where user/session authority lives (k_proxy vs k_server vs split).
  • Target concurrency level for validation (parallel clients and parallel requests per client).
  • Preferred wireless transport/protocol between k_proxy and phone (for future phase).

Session Maintenance Notes (2026-04-24)

  • Top-level Markdown review completed for PHASE5_RUNBOOK.md, Setup.md, and Workplan.md.
  • Current execution plan remains in sync with the Phase 5 runbook:
    • prototype services at /home/user/chromecard/k_proxy_app.py and /home/user/chromecard/k_server_app.py
    • run sequence documented in /home/user/chromecard/PHASE5_RUNBOOK.md
  • No phase ordering or blocker changes were required from this review pass.
  • Remote execution support is now active and validated:
    • ssh command execution works for k_client, k_proxy, k_server
    • scp push to VM home works (validated on k_proxy)