Skip to main content

What is the Testing Lab?

The Testing Lab is Feather’s automated testing framework for voice AI agents. It allows you to create test scenarios, run them against your agents, and validate behavior before production deployment. Think of it as your agent’s quality assurance system that:
  • Creates test scenarios - Define specific conversation flows to test
  • Simulates conversations - Run automated tests without real phone calls
  • Validates outcomes - Check if agents behave as expected
  • Generates test cases - AI-powered scenario generation
  • Tracks performance - Monitor test results over time
  • Prevents regressions - Catch issues before they reach customers

Why Test Your Agents?

Before Production Deployment

  • Verify agents handle common scenarios correctly
  • Test edge cases and error conditions
  • Validate tool integrations work as expected
  • Ensure conversation flow is natural
  • Check prompt changes don’t break functionality

Continuous Validation

  • Regression testing after prompt updates
  • Validate new agent versions
  • Test different configurations
  • Compare agent performance
  • Quality assurance for agent updates

Test Scenarios

What is a Test Scenario?

A scenario defines a specific test case with:
{
  id: "scenario-123",
  name: "Product Inquiry - Enterprise Plan",
  personality: "interested-buyer",  // How the test "customer" behaves
  phoneNumber: "+14155551234",      // Test phone number
  instructions: "Ask about Enterprise plan pricing and features. Show high interest. Request a demo.",
  expectedOutcomePrompt: "Agent should provide pricing, explain key features, and schedule a demo using Calendly.",
  scenarioLanguage: "en-US"
}

Scenario Components

Name: Descriptive test case name
"Product Inquiry - Enterprise Plan"
"Support - Password Reset Request"
"Objection Handling - Price Too High"
Personality: How the simulated customer behaves
"interested-buyer"     // Engaged and ready to purchase
"skeptical-prospect"   // Hesitant, asks lots of questions
"angry-customer"       // Frustrated, needs de-escalation
"confused-user"        // Needs extra guidance
"price-sensitive"      // Focused on cost
"technical-buyer"      // Asks detailed technical questions
Instructions: What the test customer should do
"Call to inquire about return policy. Provide order number ORD-12345. Ask about timeline for refund."
Expected Outcome: What success looks like
"Agent should locate order, explain 30-day return policy, initiate return process, and provide return label."

Managing Test Scenarios

List All Scenarios

const response = await fetch('https://prod.featherhq.com/api/v1/testing-lab/scenarios', {
  headers: {
    'X-API-Key': API_KEY
  }
});

const scenarios = await response.json();

scenarios.forEach(scenario => {
  console.log(`${scenario.name} - ${scenario.personality}`);
});

Create a Scenario

const scenario = {
  name: "Product Demo Request",
  personality: "interested-buyer",
  phoneNumber: "+14155551234",
  instructions: `You are interested in the Enterprise plan. Ask about:
1. Pricing for 50 users
2. API integration capabilities
3. SSO support
Request a demo when offered.`,
  expectedOutcomePrompt: "Agent provides Enterprise pricing, confirms API and SSO features, schedules demo via Calendly, and ends with CALENDLY disposition.",
  scenarioLanguage: "en-US"
};

const response = await fetch('https://prod.featherhq.com/api/v1/testing-lab/scenarios', {
  method: 'POST',
  headers: {
    'X-API-Key': API_KEY,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(scenario)
});

Generate Scenarios with AI

Let AI create test scenarios for you:
const generateRequest = {
  agentId: "agent-123",
  count: 10,
  focus: "customer-objections" // Optional: focus area
};

const response = await fetch('https://prod.featherhq.com/api/v1/testing-lab/scenarios/generate', {
  method: 'POST',
  headers: {
    'X-API-Key': API_KEY,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(generateRequest)
});

const generatedScenarios = await response.json();
AI generation creates diverse scenarios based on:
  • Agent’s purpose and configuration
  • Common use cases for your industry
  • Edge cases and error handling
  • Tool usage validation
  • Conversation flow testing

Update a Scenario

const updates = {
  instructions: "Updated instructions for the test scenario",
  expectedOutcomePrompt: "Updated expected outcome"
};

const response = await fetch(
  `https://prod.featherhq.com/api/v1/testing-lab/scenarios/${scenarioId}`,
  {
    method: 'PATCH',
    headers: {
      'X-API-Key': API_KEY,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(updates)
  }
);

Delete a Scenario

const response = await fetch(
  `https://prod.featherhq.com/api/v1/testing-lab/scenarios/${scenarioId}`,
  {
    method: 'DELETE',
    headers: {
      'X-API-Key': API_KEY
    }
  }
);

Running Tests

Execute Test Scenarios

Run scenarios against an agent:
const testRun = {
  agentId: "agent-123",
  scenarioIds: [
    "scenario-001",
    "scenario-002",
    "scenario-003"
  ]
};

const response = await fetch('https://prod.featherhq.com/api/v1/testing-lab/run', {
  method: 'POST',
  headers: {
    'X-API-Key': API_KEY,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(testRun)
});

const results = await response.json();

Test Results

Results include:
{
  runId: "run-456",
  agentId: "agent-123",
  startedAt: "2024-01-20T10:00:00Z",
  completedAt: "2024-01-20T10:15:30Z",
  totalScenarios: 10,
  passed: 8,
  failed: 2,
  results: [
    {
      scenarioId: "scenario-001",
      scenarioName: "Product Demo Request",
      status: "passed",
      callId: "call-789",
      disposition: "CALENDLY",
      transcript: [...],
      evaluation: {
        matchesExpectedOutcome: true,
        score: 95,
        feedback: "Agent successfully provided pricing, explained features, and scheduled demo."
      }
    },
    {
      scenarioId: "scenario-002",
      scenarioName: "Price Objection",
      status: "failed",
      callId: "call-790",
      disposition: "ENDED",
      evaluation: {
        matchesExpectedOutcome: false,
        score: 60,
        feedback: "Agent did not offer alternative pricing options or discounts as expected."
      }
    }
  ]
}

Test Evaluation

Automatic Evaluation

Feather automatically evaluates test results: Pass Criteria:
  • Conversation follows expected flow
  • Correct disposition
  • Required tools were called
  • Key information was provided
  • Expected outcome achieved
Evaluation Score (0-100):
  • 90-100: Excellent - Exceeded expectations
  • 75-89: Good - Met all requirements
  • 60-74: Acceptable - Minor issues
  • Below 60: Needs improvement

Manual Review

Review failed tests:
// Get detailed test result
const result = testResults.results.find(r => r.status === 'failed');

console.log('Scenario:', result.scenarioName);
console.log('Expected:', result.scenario.expectedOutcomePrompt);
console.log('Score:', result.evaluation.score);
console.log('Feedback:', result.evaluation.feedback);

// Review transcript
result.transcript.forEach(turn => {
  console.log(`${turn.speaker}: ${turn.text}`);
});

// Listen to recording
console.log('Recording:', result.recordingUrl);

Test Scenario Patterns

Happy Path Testing

Test ideal customer journeys:
{
  name: "Perfect Demo Booking",
  personality: "interested-buyer",
  instructions: "You're interested in the product. Answer all questions positively. Book a demo when offered.",
  expectedOutcomePrompt: "Demo successfully booked with CALENDLY disposition"
}

Edge Case Testing

Test unusual situations:
{
  name: "Multiple Interruptions",
  personality: "distracted-customer",
  instructions: "Interrupt the agent frequently. Change topics mid-conversation. Ask unrelated questions.",
  expectedOutcomePrompt: "Agent stays patient, redirects conversation, and attempts to achieve goal"
}

Objection Handling

Test sales objection scenarios:
{
  name: "Price Too High",
  personality: "price-sensitive",
  instructions: "Express strong interest but object to the price. Say competitors offer lower pricing.",
  expectedOutcomePrompt: "Agent explains value proposition, offers to schedule call with sales team, or mentions current promotions"
}

Error Handling

Test system failures:
{
  name: "CRM System Down",
  personality: "interested-buyer",
  instructions: "Provide information for CRM update. Agent should handle gracefully if CRM tool fails.",
  expectedOutcomePrompt: "Agent acknowledges issue, collects information manually, and promises follow-up"
}

Tool Usage Validation

Test specific tool integrations:
{
  name: "Knowledge Base Query",
  personality: "confused-user",
  instructions: "Ask specific question that requires knowledge base search: 'What's your return policy for opened items?'",
  expectedOutcomePrompt: "Agent uses knowledge-search tool and provides accurate answer from documentation"
}

Best Practices

Scenario Design

  1. Be specific - Clear instructions and expected outcomes
  2. Test one thing - Each scenario should focus on specific behavior
  3. Use realistic data - Test with production-like information
  4. Cover edge cases - Don’t just test happy paths
  5. Update regularly - Evolve scenarios as agents improve

Test Coverage

Create scenarios for:
  • All major conversation flows
  • Each tool integration
  • Common customer objections
  • Error conditions
  • Edge cases and exceptions
  • Different customer personalities
  • Various outcomes (success, transfer, decline, etc.)

Continuous Testing

  1. Test before deployment - Run full suite before going live
  2. Regression testing - Re-run tests after changes
  3. Version comparison - Test new vs old versions
  4. Monitor failures - Track which scenarios fail frequently
  5. Iterate on failures - Improve prompts based on test results

Test Organization

// Organize scenarios by category
const scenarioCategories = {
  "sales": [
    "demo-booking",
    "price-objection",
    "feature-inquiry"
  ],
  "support": [
    "password-reset",
    "refund-request",
    "technical-issue"
  ],
  "edge-cases": [
    "multiple-interruptions",
    "system-failure",
    "angry-customer"
  ]
};

Integration with CI/CD

Automated Testing Pipeline

// In your CI/CD pipeline
async function testAgentBeforeDeployment(agentId: string, versionId: string) {
  // Get all test scenarios
  const scenarios = await getAllScenarios();

  // Run tests
  const results = await runTestScenarios({
    agentId,
    versionId,
    scenarioIds: scenarios.map(s => s.id)
  });

  // Check pass rate
  const passRate = results.passed / results.totalScenarios;

  if (passRate < 0.9) { // Require 90% pass rate
    throw new Error(`Tests failed. Pass rate: ${passRate * 100}%`);
  }

  console.log(`✅ All tests passed. Deploying agent version ${versionId}`);

  // Deploy the version
  await deployAgentVersion(agentId, versionId);
}

Git Hooks

# pre-commit hook
#!/bin/bash

# Run tests before allowing commit
echo "Running agent tests..."
node scripts/run-tests.js

if [ $? -ne 0 ]; then
  echo "❌ Tests failed. Fix issues before committing."
  exit 1
fi

echo "✅ Tests passed"

Common Use Cases

Pre-Deployment Testing

Validate agents before production release

Regression Testing

Ensure updates don’t break existing functionality

A/B Testing

Compare different agent configurations

Quality Assurance

Maintain consistent agent performance

Tool Validation

Verify custom tool integrations work correctly

Edge Case Coverage

Test unusual scenarios and error conditions

Troubleshooting

Tests Failing Unexpectedly

Check:
  • Agent version is deployed
  • Tools and integrations are working
  • Test scenario instructions are clear
  • Expected outcomes are realistic
  • Phone numbers in scenarios are valid

Inconsistent Results

Causes:
  • Non-deterministic LLM behavior
  • External API variability
  • Timing-dependent scenarios
Solutions:
  • Run tests multiple times
  • Make expected outcomes more flexible
  • Use appropriate LLM temperature settings
  • Add retry logic for flaky tests

Slow Test Execution

Optimize:
  • Run scenarios in parallel
  • Use shorter test conversations
  • Reduce number of scenarios in each run
  • Cache scenario results

Next Steps