Context

I’m not an ML engineer. I’m a music producer and bass player who works in catalog operations by day and builds audio tools by night. Everything I know about Python, machine learning, and audio DSP has been learned in the process of building things — mostly with AI assistance.

Patch Pilot was the most ambitious thing I’d attempted. And the way it was built says a lot about both the power and the pitfalls of AI-assisted development when you’re learning as you go.

The Claude Code workflow

Every major phase of Patch Pilot was co-authored with Claude Code. The commit messages don’t hide it — they all carry the Co-Authored-By: Claude tag.

Here’s what that looked like in practice: I’d describe what I wanted (“build an audio analysis module that extracts spectral features, pitch, and ADSR envelope from a WAV file”), Claude Code would generate the implementation with tests, I’d review it, run the tests, and commit. The entire Phase 2-2 through Phase 3 — audio analysis, synth matching, and CLI — was built this way in a single afternoon.

That speed is real. 3,500 lines of well-structured, well-tested Python in four hours is not something I could do alone. The code was clean, typed, and organized.

But speed has a cost.

What I didn’t understand

The code worked. The tests passed. But I didn’t deeply understand several things:

The embedding models were set up and never used. CLAP, VGGish, and Wav2Vec2 were all configured in the embeddings module from Phase 1. They loaded models, generated embeddings, cached weights. But the actual synth matching in Phase 2-3 used rule-based heuristics — spectral centroid maps to filter cutoff, that sort of thing. The embeddings were meant for a retrieval-based approach (compare input audio to a database of known synth patches), but that database never existed. I didn’t fully grasp that the embeddings were infrastructure for a system that hadn’t been designed yet.

The Vital preset format was a bad bet. We spent significant effort building a VitalPresetGenerator that produced .vital JSON files. But Vital’s preset format is complex and undocumented — the generator never produced files that actually loaded correctly in Vital. It was removed entirely five days after the audio synthesis layer made it redundant. I should have validated that the format was achievable before investing in it.

The audio synthesis was too simple to be useful. Basic waveforms (sine, saw, square) with Butterworth filters and ADSR envelopes can approximate simple sounds, but they can’t reproduce the complex timbres that make reverse sound design interesting — FM synthesis, wavetable morphing, granular textures, layered effects chains. The output sounded like a textbook example, not like the input.

There was no feedback loop. The system analyzed a sound, guessed at parameters, and output results — but never checked whether the output actually resembled the input. That comparison (the “does this sound right?” step) is arguably the core of the problem, and it was completely absent.

The dormancy pattern

The git history shows a clear pattern: intense bursts of activity separated by months of silence.

  • July 2025: Project setup. 1 day.
  • August 2025: Input layer. 1 day. Then dormant for 3.5 months.
  • Late November 2025: Environment setup refresh. Then dormant for 3 days.
  • December 3-12, 2025: The entire rest of the project. 4 sessions over 10 days. Then dormant for 3 months.
  • March 2026: Revival — but as a research reboot, not continued development.

This isn’t unusual for side projects, but it’s worth noting that the AI-assisted workflow may have contributed. When you can build a lot in one session, there’s less natural momentum to come back the next day — the “I need to finish this” pressure doesn’t build up the same way. And when you do come back months later, the gap between “code that exists” and “code I understand” has widened.

The pre-commit friction

A small but telling detail: several commits were blocked by mypy strict mode catching type annotation issues. The actual code was correct and tested, but couldn’t pass the pre-commit hooks. This friction directly contributed to the project stalling — the Phase 1 audio analysis was “complete and tested” but the commit was blocked by type mismatches in dictionary return types and missing scipy stubs.

When you’re learning and building with AI, strict linting can feel like a wall. The AI generates code that works but doesn’t satisfy every mypy constraint. Fixing those issues requires understanding the type system at a level that’s separate from understanding the domain logic. It’s one more thing that can kill momentum.

What I’d do differently

Going into the research reboot, a few principles:

  1. Research before code. The v1 jumped straight to building without understanding what existed. DDSP, existing synth parameter datasets, and specialized audio embedding models were all out there — we just didn’t look.

  2. Validate the hard parts first. The hard part of Patch Pilot isn’t input processing or CLI design — it’s the comparison between input audio and synthesized output. That should have been the first thing prototyped, not the last thing planned.

  3. Understand what the AI builds. Claude Code is incredible for velocity, but the developer still needs to understand the architecture well enough to make good decisions about what to build next. The embedding models sitting unused for months is a direct result of not fully grasping the system design.

  4. Smaller scope, tighter loop. Instead of building all four layers end-to-end, the next version should start with the smallest possible feedback loop: “given a simple synth sound, can we estimate parameters and synthesize something that sounds similar?” If that works for sine waves, try sawtooths. Build outward from a working core.


This is part of an ongoing series documenting the Patch Pilot project. See the v1 architecture overview and the research reboot plan.