What Are Configurations?
Configurations are the AI model settings that power your voice agents. They control how agents listen, think, and speak during conversations. Proper configuration is essential for creating natural, responsive, and effective voice AI experiences. Configurations include:- STT (Speech-to-Text) - How agents convert customer speech to text
- TTS (Text-to-Speech) - How agents convert responses to speech
- LLM (Language Models) - The AI that powers conversation understanding and generation
- Voices - Specific voice characteristics for agent speech
Configuration Architecture
Default vs Override Configurations
Agents can use configurations in two ways: Default Configurations:Speech-to-Text (STT) Configurations
What is STT?
STT converts customer speech into text that the LLM can understand. Accurate STT is critical for:- Understanding customer intent
- Capturing key information
- Reducing misunderstandings
- Enabling fast responses
Available STT Configs
Get available STT configurations:STT Configuration Options
Common STT configurations include:Common STT Providers
Deepgram Nova 3 - High accuracy, low latencyText-to-Speech (TTS) Configurations
What is TTS?
TTS converts agent text responses into natural-sounding speech. Good TTS creates:- Natural conversation flow
- Clear, understandable speech
- Appropriate pacing and emotion
- Consistent voice quality
Available TTS Configs
Get available TTS configurations:TTS Configuration Options
Common TTS Providers
Deepgram Aura 2 - Ultra-low latencyLanguage Model (LLM) Configurations
What is an LLM?
The LLM is the “brain” that:- Understands customer intent
- Generates appropriate responses
- Decides when to use tools
- Maintains conversation context
- Makes decisions during calls
Available LLM Configs
Get available LLM configurations:LLM Configuration Options
Common LLM Models
GPT-4o - Most capable, best reasoning- Best for: Complex conversations, nuanced understanding
- Context window: 128K tokens
- Speed: Moderate
- Cost: Higher
- Best for: Simple conversations, high volume
- Context window: 128K tokens
- Speed: Very fast
- Cost: Lower
- Best for: Natural dialogue, ethical reasoning
- Context window: 200K tokens
- Speed: Fast
- Cost: Moderate
Temperature Settings Guide
Voices
Available Voices
Get list of available voices:Voice Selection
Choose voices based on: Brand alignment:- Professional and authoritative
- Friendly and approachable
- Young and energetic
- Mature and experienced
- Support: Calm, patient, helpful
- Sales: Confident, enthusiastic
- Scheduling: Efficient, clear
- Notifications: Neutral, informative
- Match target audience
- Consider cultural preferences
- Gender considerations
- Age appropriateness
Voice Examples
Configuration Best Practices
STT Best Practices
- Language matching - Use correct language code for your audience
- Enable punctuation - Improves LLM understanding
- Test in production conditions - Account for phone line quality
- Monitor accuracy - Track misrecognitions in transcripts
- Consider latency - Balance accuracy with response time
TTS Best Practices
- Natural speech rate - 0.95-1.05 speed for most use cases
- Consistent voice - Use same voice throughout conversation
- Test with real users - Verify voice quality and clarity
- Match brand personality - Voice should align with brand
- Phone optimization - Enable phone filter for call quality
LLM Best Practices
- Start conservative - Lower temperature (0.5-0.7) for consistency
- Limit response length - Shorter responses for voice (200-300 tokens)
- Monitor token usage - Optimize for cost and performance
- Test extensively - Validate behavior across scenarios
- Version control - Track configuration changes
Voice Best Practices
- Listen to samples - Test voices before deployment
- Consider context - Different voices for different agents
- Get feedback - Ask customers about voice quality
- A/B test - Compare voice performance
- Match expectations - Voice should fit agent personality
Performance Optimization
Latency Optimization
Reduce response time:Quality Optimization
Maximize quality:Cost Optimization
Reduce costs:Common Use Cases
Customer Support
STT: Accurate, TTS: Calm & Clear, LLM: Helpful & Patient
Sales Outreach
STT: Fast, TTS: Energetic & Warm, LLM: Persuasive
Appointment Booking
STT: Reliable, TTS: Efficient & Professional, LLM: Task-focused
Surveys & Feedback
STT: Patient, TTS: Neutral, LLM: Question-focused
Troubleshooting
Poor Speech Recognition
Solutions:- Try different STT provider
- Enable punctuation and diarization
- Check audio quality settings
- Test with different accents/languages
Unnatural Speech
Solutions:- Adjust TTS speed (0.95-1.05)
- Try different voice
- Reduce pitch modifications
- Test with different TTS provider
Slow Response Times
Solutions:- Use faster STT/TTS/LLM configs
- Reduce maxTokens
- Enable interim results
- Lower audio sample rates
Inconsistent Behavior
Solutions:- Lower temperature (0.3-0.5)
- Add stop sequences
- Reduce top_p
- Use more specific prompts