The connector exists

ElevenLabs ships an official MCP connector. It works. For its intended purpose — text-to-speech generation, voice cloning, sound effects, audio isolation — it’s genuinely useful. You can pipe text into it and get audio back. You can manage your voice library, clone a new voice from a sample, isolate vocals from a track. The audio generation surface of ElevenLabs is well covered.

That’s not the surface I need.

Two products, one brand

ElevenLabs has two largely independent product surfaces. The first is their audio generation API: TTS, voice design, sound effects. The second is their Conversational AI API: real-time voice agents with system prompts, LLM routing, knowledge bases, webhooks, and conversation history.

The official MCP connector is built for the first surface. It doesn’t know the second one exists.

What it can do: generate speech from text, clone voices, isolate audio, manage the voice library, create sound effects.

What it cannot do: list conversational agents, read or update system prompts, change the backing LLM model or temperature, view conversation transcripts, manage knowledge base documents, configure webhooks, check agent analytics.

For someone building a conversational AI product on ElevenLabs — which is exactly what Tonari Tutor is — the official connector covers zero of the daily workflow I described in the previous post.

The API key problem

It goes deeper than missing endpoints. The official connector’s API key doesn’t even request convai_write permission scope. Agent configuration changes — updating a system prompt, swapping the LLM, modifying knowledge base documents — require a key with Conversational AI scopes explicitly enabled. The standard TTS key that the official connector expects simply can’t authorize those operations.

This isn’t a bug. It’s a scope boundary that reflects how ElevenLabs treats these as separate products internally. The audio generation API and the Conversational AI API have different permission models, different rate limits, and different pricing structures. The official connector was built by (or for) the audio team, not the conversational AI team.

The gap in practice

My daily loop looks like this: read an agent’s current system prompt, edit it, push the update, test a conversation, read the transcript, adjust. Repeat across four language-specific agents. None of that is possible through the official connector.

Before MCP, this meant keeping the ElevenLabs dashboard open in a browser tab and copy-pasting between the UI and my editor. The workflow I documented last time was entirely manual — not because tooling didn’t exist, but because the tooling that existed was pointed at the wrong API surface.

What this motivated

Once I understood the gap wasn’t going to close on its own timeline, the path was obvious. The Conversational AI API is well-documented, REST-based, and straightforward. The MCP protocol is a standard interface. Wiring one to the other is not complicated work.

So I built my own. That’s next.