Headland: Local when it must be, cloud when it can be

The anchor

Self-hosting everything is expensive and hard to keep at frontier capability; sending everything to a cloud API is often not permissible and, as EchoLeak showed, not risk-free. The practical answer is hybrid: route by classification and need, local for sensitive work, cloud when the data and the task allow.

Sources we build on

Primary

OWASP GenAI Security Project

Neutral framework for the risks that decide what may go to a cloud model and what may not.

Journalism

WIRED / The Register on EchoLeak and cloud-AI exposure

Independent reporting on why routing sensitive context to cloud assistants carries exfiltration risk.

Article outline

The two bad extremes. All-local (costly, capped) versus all-cloud (often not allowed).
Routing by classification. A decision layer that decides local versus cloud per request.
The economics. GPU cost and latency of local inference versus per-token cloud pricing.
Failing safe. What happens when the cloud path is unavailable or disallowed.
Keeping it swappable. Avoiding lock-in at the routing layer.

How it aligns to what we do

An architecture and cost piece. It shows we design for the real constraint set (budget, latency, classification) rather than selling a single fashionable answer, reinforcing the vendor-agnostic, sovereign-first, fixed-price stance.

Points to hit

Frame around cost and latency trade-offs, not a threat.
Make the routing decision layer the centrepiece.
Reference EchoLeak as the why-not-everything-to-cloud point.
Stress swappability and no lock-in.

Control it ratifies

ISM / E8 Supports ISM classification-handling controls by ensuring sensitive requests are served inside the boundary; complements the self-hosted and no-lock-in dives.

← All deep dives AI Infrastructure →