The economics of local versus cloud inference, and an architecture that keeps sensitive work inside the boundary while reaching out only when it is safe.
Draft outline · Cost / latency / architecture lensSelf-hosting everything is expensive and hard to keep at frontier capability; sending everything to a cloud API is often not permissible and, as EchoLeak showed, not risk-free. The practical answer is hybrid: route by classification and need, local for sensitive work, cloud when the data and the task allow.
Neutral framework for the risks that decide what may go to a cloud model and what may not.
Independent reporting on why routing sensitive context to cloud assistants carries exfiltration risk.
An architecture and cost piece. It shows we design for the real constraint set (budget, latency, classification) rather than selling a single fashionable answer, reinforcing the vendor-agnostic, sovereign-first, fixed-price stance.