A wrapper is not a safety architecture

If you open the hood on most child-facing AI products shipped in the last two years, you will find roughly the same thing. A single foundation model. A long system prompt that says, in many polite ways, please do not be unsafe. A profanity blocklist. Maybe a moderation API call on the way out. That is a wrapper. A wrapper is a fine prototype. It is not a safety architecture for a product children actually use.

What a wrapper actually is

A wrapper takes a user message, prepends a prompt, calls a model, sometimes runs a single moderation check, and returns the answer. The entire safety strategy lives in two places: the system prompt and the model's own training. Both are out of your control. The system prompt can be bypassed by clever phrasing, and the model's behaviour changes whenever the provider ships an update.

Wrappers fail in predictable ways. Kids find prompt injections by accident. Role-play loops drift past the guardrails one turn at a time. The model confidently invents facts. A safe topic at turn one becomes an unsafe topic at turn fifteen because nobody is watching the conversation as a whole.

What an architecture looks like

A safety architecture treats the model as one component in a system designed around the child, not the chat box. It has layers, and each layer does one job well.

Input layer: age and identity context, intent classification, prompt sanitisation, risk tagging. Policy layer: per-age-band rules, topic allowlists and blocklists, retrieval constraints, tool permissions. Generation layer: model routing by task, structured output where possible, streaming-aware moderation. Output layer: rewriting unsafe responses into kid-appropriate language instead of blunt refusals, tone checks, citation requirements where relevant. Observability layer: full audit logs, safety scores per turn, escalation paths to humans, parent-visible transparency. Governance layer: red-team suites, regression tests on every model swap, model cards, DPIAs.

Why this matters more for kids

Adults can usually recover from a bad AI answer. They notice it, roll their eyes, and move on. Children often cannot. A confident wrong answer becomes a fact in their head. A friendly tone becomes a relationship. A bad suggestion becomes a dare. The cost of a single unsafe turn is asymmetric, which is exactly why one prompt and one model is not enough.

The honest test

If you removed your system prompt tomorrow, would your product still be safe? If the answer is no, what you have is a wrapper. If the answer is yes, because there are independent layers catching things before they reach a child, you have an architecture. Children deserve the second thing.