Choosing the right mode can improve response time, naturalness, and overall call experience.
1. Pipeline
| Label in UI | Pipeline |
| How it works | Speech-to-Text → LLM → Text-to-Speech |
| Latency | ~800 – 1500 ms (depends on language & model) |
| Best for | Complex reasoning, dynamic prompts, multi-sentence replies |
- Supports all voices in the library (including custom-cloned voices).
- Handles long-form answers or paragraph-style responses well.
- Allows the LLM to inject variables and reference earlier context cleanly.
When to choose Pipeline
- You need rich, multi-sentence answers (e.g. support queries, detailed explanations).
- The assistant must reason over structured data or complex prompts.
- You prefer absolute control of the spoken voice (clone or brand voice).
2. Speech-to-Speech (Multimodal)
| Label in UI | Speech-to-speech |
| How it works | Direct speech-to-speech generation (no intermediate text) |
| Latency | ~300 – 600 ms (ultra low) |
| Best for | Natural back-and-forth, short & reactive replies |
- Fast turn-taking – callers experience near-instant responses.
- Generates more expressive prosody natively (intonation, fillers).
- Currently supports a limited voice set, but more are added regularly.
When to choose Speech-to-Speech
- The conversation needs to feel snappy (sales, booking confirmations).
- Your replies are generally short sentences or quick acknowledgements.
- You’re okay with the system-provided voice options for faster interaction.
Speech-to-speech is evolving rapidly. If you need a custom cloned voice with low latency, try Dualplex.
3. Dualplex (Beta)
| Label in UI | Dualplex |
| How it works | Multimodal STT + LLM (speech-to-speech) with ElevenLabs TTS output |
| Latency | Low (varies by voice and model) |
| Best for | Fast, natural replies with high-quality/brand voices (cloned) |
- Near-instant turn-taking similar to speech-to-speech.
- Access to ElevenLabs voice library, including custom-cloned voices.
- Great for short to medium replies with expressive prosody.
- Recommended default for most use-cases today; currently in Beta.
When to choose Dualplex
- You want fast back-and-forth but need a branded or cloned voice.
- You want more expressive delivery without giving up precise voice choice.
- You’re comfortable using a new feature that is still in Beta.
Switching modes
You can pick the mode for each assistant in Assistant → Settings → Voice Engine. Test all three modes to see which delivers the best balance of speed and quality for your use-case.Dualplex is currently labeled Beta.
Pro Tip: Record two calls – one in each mode – and compare the caller’s perceived latency and engagement level to decide which fits your flow.

