Production-Hardening a Voice AI App No One Asked For

The app worked. That was the problem. It worked so well that anyone who found the URL could burn through my ElevenLabs credits running mock interviews all day. No auth, no rate limiting, no session tracking. A fully open voice AI endpoint connected to a pay-per-minute API.

Time to close the door.

Email Gate

The first real hardening pass was an email-gated access system. A Cloudflare Worker sits in front of the app: you submit your email, the worker validates it against an allowlist, issues a session token, and enforces a per-user session budget. Once you’ve used your allocation, you’re done until I manually top you up. Simple, brutal, effective.

The tricky part was the prod-demo branch — a separate deployment at tutor-prod.tonari.ai that bypasses the email gate for live demos. Two deploys from the same codebase, different auth behavior. Cloudflare Pages made this painless with branch-based deployments.

Webhook Hell

ElevenLabs sends post-call data — transcript, scores, evaluation criteria — via webhook to a CF Worker after each session ends. The worker stores it in KV and emails a formatted report to the user. Straightforward on paper.

In practice: webhook signature verification was a mess. ElevenLabs shipped a v1 signature format without deprecating v0, so my worker had to accept both. The March 20 fix was just adding a fallback chain — try v1, fall back to v0, reject if neither matches.

But the bigger problem was timing. Webhooks are async — sometimes seconds, sometimes minutes. Users would finish an interview and stare at a loading screen. I built a polling mechanism with a countdown timer and a manual fallback button. That helped, but still depended on the webhook eventually arriving.

The real fix: a direct ElevenLabs API fallback. If the webhook hasn’t arrived by countdown expiry, the worker queries the API directly. The webhook path is preferred (richer evaluation data), but the API fallback means users always get something.

Mic Check

This one should have been in from day one. Users were starting interviews with broken audio — wrong input device, muted mic, Bluetooth headphones half-connected. I added a device selector and audio test on Step 2. Pick your mic, see the level meter move, confirm it works. Simple gate, huge reduction in wasted sessions.

The XSS Fix I Should Have Caught Earlier

The email report template was interpolating user-supplied values — name, email, session notes — directly into HTML. No escaping. A user could inject arbitrary HTML into their own email report, which is a vector I don’t want to think about. The March 14 fix was a basic escapeHtml utility applied to every user-supplied value before it hits the template. Embarrassing that it shipped without this.

Design System Migration

On March 1, before any of the security work, I migrated Tutor from its ad-hoc styles to the Tonari design system — the same gold-accent dark theme running across all our apps. This was mostly a find-and-replace of raw hex values with semantic tokens, but it meant Tutor finally looked like it belonged to the same family as Beat and the main site.

The whole month was a grind from “it works for me” to “it works for other people without burning money or leaking data.” Not glamorous. But this is the work that separates a prototype from something you can actually hand to someone.