Safety — Tell and Show

TL;DR

Every prompt your kid writes and every response the AI sends passes a two-layer filter — local rules first, an external classifier second. Each kid has a G or PG-13 rating you control. Anything blocked is logged with a timestamp, the category, and a redacted excerpt, and you get a parent-readable email. Nothing is hidden from you. The AI cannot publish anything until you approve it.

The two-layer filter.

Every AI request and every AI response passes through two filters in order. They are not interchangeable; they catch different things.

Layer 1 — local rules

A deterministic library of rules runs in our own servers, instantly, before anything reaches the AI. The rules look for jailbreak attempts, leaked personal info (phone numbers, emails, full names paired with a location), self-harm signals, sexual content, graphic violence, profanity, illicit-activity prompts, harassment, and instructions for building weapons.

These rules are simple, fast, and run synchronously. They have no network dependency and never time out. If a request hits a local rule, it stops there and never reaches Layer 2 or the AI.

Layer 2 — external classifier

If the local rules pass, the request and the AI’s response are sent to an external content classifier with a 10-second timeout. The classifier returns confidence scores across moderation categories. Our policy reads those scores against the kid’s rating: on G, any flag at all blocks; on PG-13, only the most-severe categories block at low confidence, while mid-tier categories require a confidence above 0.85 to block.

If the external service is unreachable and the kid’s configuration requires it, the request is blocked rather than letting unverified content through. We default to closed.

The nine categories the filter watches.

Each category has its own rules, its own thresholds, and is independently configurable per kid. A change in one category does not affect the others.

1. Jailbreak attempts

Anything that tries to coax the AI into ignoring its safety configuration. This includes role-play framings, prompt-injection patterns, and instructions to "pretend the rules don’t apply." Blocked on both G and PG-13.

2. Privacy leaks

Phone numbers, email addresses, full names paired with a city or school, and any pattern that looks like a home address. Blocked in both directions — we don’t want a kid telling the AI their address, and we don’t want the AI surfacing one back.

3. Self-harm

Any language indicating self-harm or distress. Blocked. The kid sees a kid-friendly message; the parent gets an email alert with the redacted excerpt. We do not host this kind of conversation, but we do not ignore it either.

4. Sexual content

Sexual content of any kind, blocked across both ratings. Romantic mentions are permitted in PG-13 mode within the limits of mainstream PG-13 media.

5. Graphic violence

Detailed depictions of injury, gore, or torture. Blocked on G; mid-confidence threshold on PG-13. Mild action and conflict in service of a story (a sword fight, a chase) is permitted on PG-13.

6. Profanity

Blocked on G. On PG-13, the same threshold mainstream PG-13 films use applies — mild language permitted, strong slurs blocked.

7. Illicit activity

Step-by-step instructions for crime, drug manufacture, or evading the law. Blocked across both ratings.

8. Harassment

Targeted insults, group hatred, or anything that reads as bullying-aimed-at-a-real-person. Blocked across both.

9. Weapons

Instructions or schematics for building functional weapons. Blocked across both ratings, regardless of whether the kid framed it as fiction.

What gets checked — and when.

The filter runs in three directions. Each direction has its own rule set; what’s acceptable in one is not always acceptable in another.

Child-to-AI. Every prompt the kid writes or wizards through gets checked before it reaches the AI partner. The kid can’t accidentally (or deliberately) prompt the AI into a category they shouldn’t be in.
AI-to-child. Every response the AI produces gets checked before it reaches the kid. Even a perfectly-safe prompt can sometimes produce a flagged response; the response is intercepted, the AI partner refuses to act, and the kid sees a friendly message.
Publish. Every project that goes through the parent-approval flow gets a final pass before a public URL is minted. Content that slipped through earlier or got added offline still has to clear this gate.

What happens when something is blocked.

Three things happen, in this order, and they happen every time without exception.

1. The kid sees a friendly message.

The AI partner receives no response (if the block is on the way out) or refuses to act (if the block is on the way in). The kid sees something like: "That’s outside the safety setting for this account. I stopped it and let your parent know." The message explains that something was blocked. It does not echo the blocked text. Kids do not learn what to retype to get around the filter.

2. An audit event is recorded.

The block is written to your kid’s safety log with the category, the rating in effect at the time, the direction (in / out / publish), a timestamp, and a SHA-256 hash of the offending content. The plain-text content is never stored, only the hash — enough for us to confirm "yes, this exact phrase was blocked" without holding the words themselves.

Each kid’s log retains up to 50 events. A global log retains the most recent 300 events across the whole product for system-wide auditing — never tied back to a kid’s identity.

3. The parent is emailed.

The subject line is Tell and Show safety alert. The body includes your kid’s name, the category, the rating, a redacted excerpt (phone numbers replaced with [phone], emails replaced with [email]), and a link to open the studio and review. You can disable the alert email per category in your notification preferences if you prefer to read the log directly.

The G vs. PG-13 rating.

Every kid profile carries one of two safety ratings. You set this from your parental-controls surface in the dashboard.

G — appropriate for the youngest end of 6–14

Strict filters on violence, scary content, romantic themes, profanity, and external links. Mainstream G-rated movie content as a reference: Frozen, Toy Story, Paddington. If a sentence would feel out of place in those, it’s blocked on G.

PG-13 — appropriate for the older end

Permits mild action, mild conflict, and stronger language. Still blocks sexual content, self-harm, graphic violence, and identifying personal information. Mainstream PG-13 movie content as a reference: The Hunger Games, Spider-Man, Marvel. Theo (age 9) is on G; an older sibling could be on PG-13 in the same household without one bleeding into the other.

The rating is per kid, not per family. Siblings of different ages can have different settings. Changes apply on the next AI request — not retroactively to past projects.

The safety log.

The safety log is your audit trail. It lives in your dashboard under each kid’s profile and you can read it without learning the studio.

For each event, you see:

When it happened (timestamp)
The category (one of the nine)
The direction (child-to-AI, AI-to-child, or publish)
The rating in effect at the time
A redacted excerpt of the content (with PII replaced)

The plaintext content is never stored. Only a SHA-256 hash sits behind each event, enough to confirm a specific phrase tripped the filter without us holding the phrase itself.

What parents cannot do.

Two things, deliberately.

You cannot read every prompt.

The audit log captures blocks, not the full transcript of what your kid types to the AI. We treat day-to-day creative chat the same way we’d treat a paper journal — private to the kid, with the safety net that anything boundary-crossing surfaces to you. We made this choice deliberately: a panopticon doesn’t produce kids who think out loud, and thinking out loud is most of what creative work is.

You cannot bypass server-side enforcement.

The safety policy is authoritative on our servers. There is no client-only "trust mode" you can flip that makes the AI partner ignore the rules for your kid. The filter cannot be disabled from your dashboard. The rating can be changed (G vs. PG-13). The category sensitivities can be tuned. But the filter itself runs every time, in every direction, on every account.

Per-track narrative guardrails.

The nine categories sit on top of additional guardrails tuned per creative track. In the Game track, weapons can appear as game mechanics (a bow in a sword-and-arrow level) but not as build instructions. In the Story track, dramatic tension is permitted; gore is not. In the Site track, public-by-default URLs come with their own publishing review. In the Movie track, music and dialogue go through both audio and text moderation.

These guardrails are tuned by us, not the kid — and they are reviewed every time a track ships a new wizard or expansion.

Reporting and edge cases.

If you see something in your safety log you want a human to look at, or if you think the filter let something through, write us at hello@tellandshow.ai. We read every safety report personally and respond within 24 hours on business days.

If your kid is using PG-13 mode and you want a tighter filter on a specific category — for instance, strict-G on violence but PG-13 on language — ask. We can do that today. The tuning UI for it is coming to the dashboard.

Last updated 2026-05-15. Material changes are dated and announced via email to active accounts.

The most boring page on the site. By design.