The Dataset Problem: What Exists and What We're Building
Existing synth parameter datasets, why we chose Surge XT for data legality, and the synthetic generation strategy.
Existing paired datasets
Paired datasets exist but have significant constraints. “Paired” often means (features, params) rather than full-waveform audio, and datasets from commercial synths are typically not redistributable.
Multi-Task ASP Preset Dataset (Faronbi, NYU / Zenodo) — ~12.9 GB mel NPY files from Serum, Diva, and TyrellN6. No raw audio, but mel spectrograms + param arrays are ready to use for a quick supervised prototype without rendering.
DAFx24 Massive Dataset (Bruford et al.) — ~22 GB, 1M samples from NI Massive. 64-bin mels + 16 continuous params. Not publicly available (commercial synth), but a template for building our own pipeline.
Sound2Synth / Preset-Gen-VAE — ~30k examples each from Dexed (DX7 FM). 155 and 144 params respectively. FM-only, narrow scope but well-documented.
synth1B1 (torchsynth) — 1 billion 4-second sounds, GPU-rendered from a generic “Voice” synth. Massive scale but not a real commercial/open-source synth. Good for pretraining.
SynthCAT — 3M monophonic samples from Xfer Serum. 250 timbres × 120 ADSR configurations. Excellent for learning how envelopes shape wavetable sounds.
Why we chose Surge XT
This is a real constraint that affects whether we can publish work, share datasets, or distribute a tool. Commercial synths (Serum, Diva, Massive) require treating all generated data as non-redistributable by default — even if the audio output may technically be yours.
Surge XT is explicitly free and open-source under GPL3. The Surge Synth Team FAQ confirms: GPL3 governs modifications and distribution of the software code; audio output is ours. This means we can generate, publish, and share our patch-audio dataset freely.
It’s also a hybrid engine — subtractive, FM, and wavetable — making it a genuine “universal” starting point. And it’s Python-controllable via Pedalboard and surgepy.
Synthetic data generation strategy
Generating our own dataset is not only feasible — it’s the standard approach in published research. The DAFx24 paper generated 1 million samples in ~24 hours on a 2018 Mac Mini using Pedalboard with multiple parallel plugin instances.
Primary tool: Pedalboard (Spotify) — Loads VST3 instruments on Windows, supports MIDI-driven rendering, exposes plugin parameters programmatically. Used in the DAFx24 pipeline.
Alternative: DawDreamer — More complex graph support, better for multi-effect chains.
GPU-native: SynthAX / torchsynth — Up to 90,000x real-time on GPU. Use when we need millions of samples quickly but don’t need a specific VST.
Discovery: Pluginary — Scans VST3 plugins, extracts parameter metadata, caches to SQLite. Essential for reliably enumerating Surge XT’s full parameter list.
Critical requirements for our pipeline
Fixed MIDI note/velocity/duration for each render (C4, standard velocity, 2–4 seconds). Silence rejection below -60 dB RMS. Parameter sampling from preset distributions, not uniform — biases toward “usable” sounds. State reset between renders to prevent leakage. Process isolation per plugin instance with crash auto-restart. Determinism checks: same parameter seed → same audio hash.
Difficulty by synthesis paradigm
Not all synthesis types are equally amenable to ML estimation. This directly affects our scoping:
Simple subtractive (virtual analog) — LOW difficulty. Filter cutoff, resonance, ADSR are visually distinct in spectrograms. Effectively solved for in-domain matching.
Complex subtractive (Surge XT, Serum) — MEDIUM. Unison detune, hard sync, wavetable scanning, modulation routing increase parameter count and interdependence. Workable with a constrained parameter subset (16–32 params).
Wavetable — MEDIUM-HIGH. Wavetable index selection is a retrieval problem in itself (thousands of frames).
FM (Dexed, DX7) — HIGH. Non-linear, small parameter changes cause massive spectral shifts. Operator permutation symmetry. Requires specialist models.
Additive / Granular — VERY HIGH. Research frontier. A 16-oscillator additive synth has 16! (>20 trillion) equivalent parameter configurations.
We’re starting with Surge XT in subtractive mode, restricting to continuous parameters only. This is the “solved-ish” regime that gets a working system up quickly.
Part of the Patch Pilot research series. Previous: Audio embeddings. Next: MVP architecture & build plan.