Research, findings, and thoughts from TONARI LABS. Currently exploring audio-to-synth-parameter estimation, ML for sound design, and whatever else catches my ear.
Path traversal, error leaks, missing input limits, and HTTP webhooks — what a security audit found before publishing to npm.
16 tools across 4 domains, built in a single session. How the MCP SDK makes it fast to stand up a dedicated server for an API that isn't covered by official connectors.
A technical look at how Google's new open model maps onto the Patch Pilot roadmap — what it replaces, what it can't, and why the v4 agentic layer just became a lot more real.
Current state of the voice interview practice tool: four languages, structured feedback, SEO groundwork, and what's next.
Google's most capable open model family just shipped under Apache 2.0 — with native audio input, function calling, and local inference. Here's what it means for indie audio tool builders.
ElevenLabs has an MCP connector. It handles TTS and voice cloning. It doesn't touch the Conversational AI API.
The manual workflow of managing four voice agents through a web dashboard, and why automation became necessary.
Email gates, webhook pain, XSS fixes, and the long road from 'it works on my machine' to something I'd let other people use.
The concrete step-by-step plan for building the retrieval-first MVP — target synth, renderer, dataset generation, embedding pipeline, and the four experiments that determine if it works.
Why one multilingual agent didn't work and how separate per-language agents with tuned TTS configs solved the problem.
Existing synth parameter datasets, why we chose Surge XT for data legality, and the synthetic generation strategy.
Most popular audio embedding models are the wrong tool for synth patch retrieval. Here's the model-by-model breakdown and our two-stage architecture.
An honest assessment of Google Magenta's DDSP for subtractive synth parameter inference — what it can do, what breaks, and how we'll use it instead.
Supervised regression, DDSP, retrieval-based matching, reinforcement learning, and generative models — what works, what doesn't, and what we're using.
Why turning audio into synth parameters is mathematically hard, what tools exist today, and what open-source research is available.
Why the official embed was a dead end for a custom interview UI, and what it took to migrate to the WebSocket SDK.
Why I built a voice-based interview practice tool on ElevenLabs Conversational AI, and what the first commit looked like.
What it was like to build an ML audio tool using Claude Code, the mistakes that come from moving fast without understanding deeply, and why we're doing research before code this time.
A walkthrough of the four-layer architecture behind the original Patch Pilot — from input handling to synth parameter output — and the wild December sprint that brought it to life.
Why we shelved the v1, what we learned, and how we're approaching the research reboot for an audio-to-synth-parameter tool.