AI Infrastructure · Deep dive 02

Local when it must be, cloud when it can be

The economics of local versus cloud inference, and an architecture that keeps sensitive work inside the boundary while reaching out only when it is safe.

Draft outline · Cost / latency / architecture lens
The anchor

Self-hosting everything is expensive and hard to keep at frontier capability; sending everything to a cloud API is often not permissible and, as EchoLeak showed, not risk-free. The practical answer is hybrid: route by classification and need, local for sensitive work, cloud when the data and the task allow.

Sources we build on
Primary

Neutral framework for the risks that decide what may go to a cloud model and what may not.

Journalism
WIRED / The Register on EchoLeak and cloud-AI exposure

Independent reporting on why routing sensitive context to cloud assistants carries exfiltration risk.

Article outline
  1. The two bad extremes. All-local (costly, capped) versus all-cloud (often not allowed).
  2. Routing by classification. A decision layer that decides local versus cloud per request.
  3. The economics. GPU cost and latency of local inference versus per-token cloud pricing.
  4. Failing safe. What happens when the cloud path is unavailable or disallowed.
  5. Keeping it swappable. Avoiding lock-in at the routing layer.
How it aligns to what we do

An architecture and cost piece. It shows we design for the real constraint set (budget, latency, classification) rather than selling a single fashionable answer, reinforcing the vendor-agnostic, sovereign-first, fixed-price stance.

Points to hit
Control it ratifies
ISM / E8 Supports ISM classification-handling controls by ensuring sensitive requests are served inside the boundary; complements the self-hosted and no-lock-in dives.