Skip to main content

What Are Configurations?

Configurations are the AI model settings that power your voice agents. They control how agents listen, think, and speak during conversations. Proper configuration is essential for creating natural, responsive, and effective voice AI experiences. Configurations include:
  • STT (Speech-to-Text) - How agents convert customer speech to text
  • TTS (Text-to-Speech) - How agents convert responses to speech
  • LLM (Language Models) - The AI that powers conversation understanding and generation
  • Voices - Specific voice characteristics for agent speech

Configuration Architecture

Default vs Override Configurations

Agents can use configurations in two ways: Default Configurations:
{
  sttConfigId: "nova-3-stt",      // Reference to predefined STT config
  ttsConfigId: "aura-2-tts",      // Reference to predefined TTS config
  llmConfigId: "gpt-4o-mini-llm", // Reference to predefined LLM config
  voiceId: "6011b4c8-6140-4b7e-8a92-d9880de97b77"
}
Override Configurations:
{
  sttConfigId: "nova-3-stt",
  overrideSTTConfig: {
    // Custom STT settings
    language: "es-US",
    enablePunctuation: true
  },

  ttsConfigId: "aura-2-tts",
  overrideTTSConfig: {
    // Custom TTS settings
    speed: 1.1,
    pitch: 0
  },

  llmConfigId: "gpt-4o-mini-llm",
  overrideLLMConfig: {
    // Custom LLM settings
    temperature: 0.7,
    maxTokens: 500
  }
}

Speech-to-Text (STT) Configurations

What is STT?

STT converts customer speech into text that the LLM can understand. Accurate STT is critical for:
  • Understanding customer intent
  • Capturing key information
  • Reducing misunderstandings
  • Enabling fast responses

Available STT Configs

Get available STT configurations:
const response = await fetch('https://prod.featherhq.com/api/v1/stt-configs', {
  headers: {
    'X-API-Key': API_KEY
  }
});

const sttConfigs = await response.json();

sttConfigs.forEach(config => {
  console.log(`${config.id}: ${config.name}`);
  console.log(`  Provider: ${config.provider}`);
  console.log(`  Languages: ${config.supportedLanguages.join(', ')}`);
});

STT Configuration Options

Common STT configurations include:
{
  // Base config to use
  sttConfigId: "nova-3-stt",

  // Override settings
  overrideSTTConfig: {
    // Language and locale
    language: "en-US",          // Language code

    // Accuracy settings
    enablePunctuation: true,    // Add punctuation to transcript
    enableDiarization: false,   // Separate speakers
    profanityFilter: false,     // Filter profanity

    // Performance settings
    interimResults: true,       // Stream partial results
    singleUtterance: false,     // End on first pause

    // Model selection
    model: "latest",            // Use latest model version

    // Audio settings
    sampleRate: 16000,          // Audio sample rate (Hz)
    encoding: "LINEAR16"        // Audio encoding format
  }
}

Common STT Providers

Deepgram Nova 3 - High accuracy, low latency
{
  sttConfigId: "nova-3-stt",
  overrideSTTConfig: {
    language: "en-US",
    model: "nova-3",
    enablePunctuation: true
  }
}
Google Speech-to-Text - Wide language support
{
  sttConfigId: "google-stt",
  overrideSTTConfig: {
    language: "en-US",
    model: "phone_call",
    enableAutomaticPunctuation: true
  }
}
OpenAI Whisper - Excellent for noisy environments
{
  sttConfigId: "whisper-stt",
  overrideSTTConfig: {
    language: "en",
    model: "whisper-1"
  }
}

Text-to-Speech (TTS) Configurations

What is TTS?

TTS converts agent text responses into natural-sounding speech. Good TTS creates:
  • Natural conversation flow
  • Clear, understandable speech
  • Appropriate pacing and emotion
  • Consistent voice quality

Available TTS Configs

Get available TTS configurations:
const response = await fetch('https://prod.featherhq.com/api/v1/tts-configs', {
  headers: {
    'X-API-Key': API_KEY
  }
});

const ttsConfigs = await response.json();

ttsConfigs.forEach(config => {
  console.log(`${config.id}: ${config.name}`);
  console.log(`  Provider: ${config.provider}`);
  console.log(`  Quality: ${config.quality}`);
});

TTS Configuration Options

{
  // Base config to use
  ttsConfigId: "aura-2-tts",

  // Voice selection
  voiceId: "6011b4c8-6140-4b7e-8a92-d9880de97b77",

  // Override settings
  overrideTTSConfig: {
    // Speech characteristics
    speed: 1.0,              // 0.5 - 2.0 (1.0 = normal)
    pitch: 0,                // -20 to +20 (0 = normal)
    volume: 0,               // -96 to +16 dB (0 = normal)

    // Quality settings
    sampleRate: 24000,       // Audio quality (Hz)
    encoding: "LINEAR16",    // Audio encoding

    // Prosody
    emphasisLevel: "moderate",  // "strong", "moderate", "reduced"

    // Effects
    phoneFilterEnabled: true    // Simulate phone call quality
  }
}

Common TTS Providers

Deepgram Aura 2 - Ultra-low latency
{
  ttsConfigId: "aura-2-tts",
  voiceId: "aura-helios-en",  // Warm, friendly male voice
  overrideTTSConfig: {
    speed: 1.0
  }
}
ElevenLabs - Highly natural, expressive
{
  ttsConfigId: "elevenlabs-tts",
  voiceId: "pNInz6obpgDQGcFmaJgB",  // Adam voice
  overrideTTSConfig: {
    stability: 0.5,          // Voice consistency
    similarityBoost: 0.75    // Match to original voice
  }
}
OpenAI TTS - Natural conversation
{
  ttsConfigId: "openai-tts",
  voiceId: "alloy",          // Neutral, balanced voice
  overrideTTSConfig: {
    model: "tts-1-hd",       // High-definition model
    speed: 1.0
  }
}

Language Model (LLM) Configurations

What is an LLM?

The LLM is the “brain” that:
  • Understands customer intent
  • Generates appropriate responses
  • Decides when to use tools
  • Maintains conversation context
  • Makes decisions during calls

Available LLM Configs

Get available LLM configurations:
const response = await fetch('https://prod.featherhq.com/api/v1/llm-configs', {
  headers: {
    'X-API-Key': API_KEY
  }
});

const llmConfigs = await response.json();

llmConfigs.forEach(config => {
  console.log(`${config.id}: ${config.name}`);
  console.log(`  Provider: ${config.provider}`);
  console.log(`  Context Window: ${config.contextWindow} tokens`);
});

LLM Configuration Options

{
  // Base config to use
  llmConfigId: "gpt-4o-mini-llm",

  // Override settings
  overrideLLMConfig: {
    // Creativity vs consistency
    temperature: 0.7,         // 0.0 - 2.0 (lower = more consistent)

    // Response length
    maxTokens: 500,           // Max response length

    // Sampling control
    topP: 1.0,               // Nucleus sampling (0.0 - 1.0)
    frequencyPenalty: 0.0,   // Penalize repetition (-2.0 to 2.0)
    presencePenalty: 0.0,    // Encourage topic diversity (-2.0 to 2.0)

    // Advanced
    stop: ["\n\n", "###"],   // Stop sequences
    logitBias: {}            // Token probability adjustments
  }
}

Common LLM Models

GPT-4o - Most capable, best reasoning
{
  llmConfigId: "gpt-4o-llm",
  overrideLLMConfig: {
    temperature: 0.7,
    maxTokens: 500
  }
}
  • Best for: Complex conversations, nuanced understanding
  • Context window: 128K tokens
  • Speed: Moderate
  • Cost: Higher
GPT-4o-mini - Fast and cost-effective
{
  llmConfigId: "gpt-4o-mini-llm",
  overrideLLMConfig: {
    temperature: 0.7,
    maxTokens: 300
  }
}
  • Best for: Simple conversations, high volume
  • Context window: 128K tokens
  • Speed: Very fast
  • Cost: Lower
Claude 3.5 Sonnet - Excellent conversation quality
{
  llmConfigId: "claude-3-5-sonnet-llm",
  overrideLLMConfig: {
    temperature: 0.7,
    maxTokens: 500
  }
}
  • Best for: Natural dialogue, ethical reasoning
  • Context window: 200K tokens
  • Speed: Fast
  • Cost: Moderate

Temperature Settings Guide

// Very consistent - customer support
temperature: 0.2
// Agent gives same answer every time
// Good for: FAQs, policies, factual information

// Balanced - general use
temperature: 0.7
// Natural variation while staying on topic
// Good for: Sales, general conversations

// Creative - exploratory
temperature: 1.0
// More creative and varied responses
// Good for: Brainstorming, discovery calls

Voices

Available Voices

Get list of available voices:
const response = await fetch('https://prod.featherhq.com/api/v1/available-voices', {
  headers: {
    'X-API-Key': API_KEY
  }
});

const voices = await response.json();

voices.forEach(voice => {
  console.log(`${voice.name} (${voice.gender}, ${voice.language})`);
  console.log(`  ID: ${voice.id}`);
  console.log(`  Provider: ${voice.provider}`);
  console.log(`  ${voice.description}`);
});

Voice Selection

Choose voices based on: Brand alignment:
  • Professional and authoritative
  • Friendly and approachable
  • Young and energetic
  • Mature and experienced
Use case:
  • Support: Calm, patient, helpful
  • Sales: Confident, enthusiastic
  • Scheduling: Efficient, clear
  • Notifications: Neutral, informative
Demographics:
  • Match target audience
  • Consider cultural preferences
  • Gender considerations
  • Age appropriateness

Voice Examples

// Professional male - tech support
{
  voiceId: "aura-helios-en",
  overrideTTSConfig: {
    speed: 0.95,  // Slightly slower for clarity
    pitch: -2      // Slightly lower for authority
  }
}

// Friendly female - sales
{
  voiceId: "aura-stella-en",
  overrideTTSConfig: {
    speed: 1.05,   // Slightly faster for energy
    pitch: 2       // Slightly higher for warmth
  }
}

// Neutral - automated notifications
{
  voiceId: "aura-luna-en",
  overrideTTSConfig: {
    speed: 1.0,
    pitch: 0
  }
}

Configuration Best Practices

STT Best Practices

  1. Language matching - Use correct language code for your audience
  2. Enable punctuation - Improves LLM understanding
  3. Test in production conditions - Account for phone line quality
  4. Monitor accuracy - Track misrecognitions in transcripts
  5. Consider latency - Balance accuracy with response time

TTS Best Practices

  1. Natural speech rate - 0.95-1.05 speed for most use cases
  2. Consistent voice - Use same voice throughout conversation
  3. Test with real users - Verify voice quality and clarity
  4. Match brand personality - Voice should align with brand
  5. Phone optimization - Enable phone filter for call quality

LLM Best Practices

  1. Start conservative - Lower temperature (0.5-0.7) for consistency
  2. Limit response length - Shorter responses for voice (200-300 tokens)
  3. Monitor token usage - Optimize for cost and performance
  4. Test extensively - Validate behavior across scenarios
  5. Version control - Track configuration changes

Voice Best Practices

  1. Listen to samples - Test voices before deployment
  2. Consider context - Different voices for different agents
  3. Get feedback - Ask customers about voice quality
  4. A/B test - Compare voice performance
  5. Match expectations - Voice should fit agent personality

Performance Optimization

Latency Optimization

Reduce response time:
{
  // Use fastest configs
  sttConfigId: "nova-3-stt",      // Deepgram - very fast
  ttsConfigId: "aura-2-tts",       // Deepgram Aura - ultra-low latency
  llmConfigId: "gpt-4o-mini-llm",  // Fast and capable

  overrideSTTConfig: {
    interimResults: true    // Stream results
  },

  overrideLLMConfig: {
    maxTokens: 200,         // Shorter responses
    temperature: 0.5        // Less variation = faster
  },

  overrideTTSConfig: {
    sampleRate: 16000       // Lower quality = faster
  }
}

Quality Optimization

Maximize quality:
{
  // Use best configs
  sttConfigId: "whisper-stt",      // Most accurate
  ttsConfigId: "elevenlabs-tts",   // Most natural
  llmConfigId: "gpt-4o-llm",       // Most capable

  overrideSTTConfig: {
    enablePunctuation: true,
    enableDiarization: true
  },

  overrideLLMConfig: {
    temperature: 0.7,
    maxTokens: 500
  },

  overrideTTSConfig: {
    sampleRate: 24000,        // Higher quality
    model: "tts-1-hd"
  }
}

Cost Optimization

Reduce costs:
{
  // Use economical configs
  llmConfigId: "gpt-4o-mini-llm",  // Lower cost

  overrideLLMConfig: {
    maxTokens: 200,           // Shorter responses
    temperature: 0.5          // More consistent
  }
}

Common Use Cases

Customer Support

STT: Accurate, TTS: Calm & Clear, LLM: Helpful & Patient

Sales Outreach

STT: Fast, TTS: Energetic & Warm, LLM: Persuasive

Appointment Booking

STT: Reliable, TTS: Efficient & Professional, LLM: Task-focused

Surveys & Feedback

STT: Patient, TTS: Neutral, LLM: Question-focused

Troubleshooting

Poor Speech Recognition

Solutions:
  • Try different STT provider
  • Enable punctuation and diarization
  • Check audio quality settings
  • Test with different accents/languages

Unnatural Speech

Solutions:
  • Adjust TTS speed (0.95-1.05)
  • Try different voice
  • Reduce pitch modifications
  • Test with different TTS provider

Slow Response Times

Solutions:
  • Use faster STT/TTS/LLM configs
  • Reduce maxTokens
  • Enable interim results
  • Lower audio sample rates

Inconsistent Behavior

Solutions:
  • Lower temperature (0.3-0.5)
  • Add stop sequences
  • Reduce top_p
  • Use more specific prompts

Next Steps