Testing Lab - Feather

What is the Testing Lab?

The Testing Lab is Feather’s automated testing framework for voice AI agents. It allows you to create test scenarios, run them against your agents, and validate behavior before production deployment. Think of it as your agent’s quality assurance system that:

Creates test scenarios - Define specific conversation flows to test
Simulates conversations - Run automated tests without real phone calls
Validates outcomes - Check if agents behave as expected
Generates test cases - AI-powered scenario generation
Tracks performance - Monitor test results over time
Prevents regressions - Catch issues before they reach customers

Why Test Your Agents?

Before Production Deployment

Verify agents handle common scenarios correctly
Test edge cases and error conditions
Validate tool integrations work as expected
Ensure conversation flow is natural
Check prompt changes don’t break functionality

Continuous Validation

Regression testing after prompt updates
Validate new agent versions
Test different configurations
Compare agent performance
Quality assurance for agent updates

Test Scenarios

What is a Test Scenario?

A scenario defines a specific test case with:

{
  id: "scenario-123",
  name: "Product Inquiry - Enterprise Plan",
  personality: "interested-buyer",  // How the test "customer" behaves
  phoneNumber: "+14155551234",      // Test phone number
  instructions: "Ask about Enterprise plan pricing and features. Show high interest. Request a demo.",
  expectedOutcomePrompt: "Agent should provide pricing, explain key features, and schedule a demo using Calendly.",
  scenarioLanguage: "en-US"
}

Scenario Components

Name: Descriptive test case name

"Product Inquiry - Enterprise Plan"
"Support - Password Reset Request"
"Objection Handling - Price Too High"

Personality: How the simulated customer behaves

"interested-buyer"     // Engaged and ready to purchase
"skeptical-prospect"   // Hesitant, asks lots of questions
"angry-customer"       // Frustrated, needs de-escalation
"confused-user"        // Needs extra guidance
"price-sensitive"      // Focused on cost
"technical-buyer"      // Asks detailed technical questions

Instructions: What the test customer should do

"Call to inquire about return policy. Provide order number ORD-12345. Ask about timeline for refund."

Expected Outcome: What success looks like

"Agent should locate order, explain 30-day return policy, initiate return process, and provide return label."

Managing Test Scenarios

List All Scenarios

const response = await fetch('https://prod.featherhq.com/api/v1/testing-lab/scenarios', {
  headers: {
    'X-API-Key': API_KEY
  }
});

const scenarios = await response.json();

scenarios.forEach(scenario => {
  console.log(`${scenario.name} - ${scenario.personality}`);
});

Create a Scenario

const scenario = {
  name: "Product Demo Request",
  personality: "interested-buyer",
  phoneNumber: "+14155551234",
  instructions: `You are interested in the Enterprise plan. Ask about:
1. Pricing for 50 users
2. API integration capabilities
3. SSO support
Request a demo when offered.`,
  expectedOutcomePrompt: "Agent provides Enterprise pricing, confirms API and SSO features, schedules demo via Calendly, and ends with CALENDLY disposition.",
  scenarioLanguage: "en-US"
};

const response = await fetch('https://prod.featherhq.com/api/v1/testing-lab/scenarios', {
  method: 'POST',
  headers: {
    'X-API-Key': API_KEY,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(scenario)
});

Generate Scenarios with AI

Let AI create test scenarios for you:

const generateRequest = {
  agentId: "agent-123",
  count: 10,
  focus: "customer-objections" // Optional: focus area
};

const response = await fetch('https://prod.featherhq.com/api/v1/testing-lab/scenarios/generate', {
  method: 'POST',
  headers: {
    'X-API-Key': API_KEY,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(generateRequest)
});

const generatedScenarios = await response.json();

AI generation creates diverse scenarios based on:

Agent’s purpose and configuration
Common use cases for your industry
Edge cases and error handling
Tool usage validation
Conversation flow testing

Update a Scenario

const updates = {
  instructions: "Updated instructions for the test scenario",
  expectedOutcomePrompt: "Updated expected outcome"
};

const response = await fetch(
  `https://prod.featherhq.com/api/v1/testing-lab/scenarios/${scenarioId}`,
  {
    method: 'PATCH',
    headers: {
      'X-API-Key': API_KEY,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(updates)
  }
);

Delete a Scenario

const response = await fetch(
  `https://prod.featherhq.com/api/v1/testing-lab/scenarios/${scenarioId}`,
  {
    method: 'DELETE',
    headers: {
      'X-API-Key': API_KEY
    }
  }
);

Running Tests

Execute Test Scenarios

Run scenarios against an agent:

const testRun = {
  agentId: "agent-123",
  scenarioIds: [
    "scenario-001",
    "scenario-002",
    "scenario-003"
  ]
};

const response = await fetch('https://prod.featherhq.com/api/v1/testing-lab/run', {
  method: 'POST',
  headers: {
    'X-API-Key': API_KEY,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify(testRun)
});

const results = await response.json();

Test Results

Results include:

{
  runId: "run-456",
  agentId: "agent-123",
  startedAt: "2024-01-20T10:00:00Z",
  completedAt: "2024-01-20T10:15:30Z",
  totalScenarios: 10,
  passed: 8,
  failed: 2,
  results: [
    {
      scenarioId: "scenario-001",
      scenarioName: "Product Demo Request",
      status: "passed",
      callId: "call-789",
      disposition: "CALENDLY",
      transcript: [...],
      evaluation: {
        matchesExpectedOutcome: true,
        score: 95,
        feedback: "Agent successfully provided pricing, explained features, and scheduled demo."
      }
    },
    {
      scenarioId: "scenario-002",
      scenarioName: "Price Objection",
      status: "failed",
      callId: "call-790",
      disposition: "ENDED",
      evaluation: {
        matchesExpectedOutcome: false,
        score: 60,
        feedback: "Agent did not offer alternative pricing options or discounts as expected."
      }
    }
  ]
}

Test Evaluation

Automatic Evaluation

Feather automatically evaluates test results: Pass Criteria:

Conversation follows expected flow
Correct disposition
Required tools were called
Key information was provided
Expected outcome achieved

Evaluation Score (0-100):

90-100: Excellent - Exceeded expectations
75-89: Good - Met all requirements
60-74: Acceptable - Minor issues
Below 60: Needs improvement

Manual Review

Review failed tests:

// Get detailed test result
const result = testResults.results.find(r => r.status === 'failed');

console.log('Scenario:', result.scenarioName);
console.log('Expected:', result.scenario.expectedOutcomePrompt);
console.log('Score:', result.evaluation.score);
console.log('Feedback:', result.evaluation.feedback);

// Review transcript
result.transcript.forEach(turn => {
  console.log(`${turn.speaker}: ${turn.text}`);
});

// Listen to recording
console.log('Recording:', result.recordingUrl);

Test Scenario Patterns

Happy Path Testing

Test ideal customer journeys:

{
  name: "Perfect Demo Booking",
  personality: "interested-buyer",
  instructions: "You're interested in the product. Answer all questions positively. Book a demo when offered.",
  expectedOutcomePrompt: "Demo successfully booked with CALENDLY disposition"
}

Edge Case Testing

Test unusual situations:

{
  name: "Multiple Interruptions",
  personality: "distracted-customer",
  instructions: "Interrupt the agent frequently. Change topics mid-conversation. Ask unrelated questions.",
  expectedOutcomePrompt: "Agent stays patient, redirects conversation, and attempts to achieve goal"
}

Objection Handling

Test sales objection scenarios:

{
  name: "Price Too High",
  personality: "price-sensitive",
  instructions: "Express strong interest but object to the price. Say competitors offer lower pricing.",
  expectedOutcomePrompt: "Agent explains value proposition, offers to schedule call with sales team, or mentions current promotions"
}

Error Handling

Test system failures:

{
  name: "CRM System Down",
  personality: "interested-buyer",
  instructions: "Provide information for CRM update. Agent should handle gracefully if CRM tool fails.",
  expectedOutcomePrompt: "Agent acknowledges issue, collects information manually, and promises follow-up"
}

Tool Usage Validation

Test specific tool integrations:

{
  name: "Knowledge Base Query",
  personality: "confused-user",
  instructions: "Ask specific question that requires knowledge base search: 'What's your return policy for opened items?'",
  expectedOutcomePrompt: "Agent uses knowledge-search tool and provides accurate answer from documentation"
}

Best Practices

Scenario Design

Be specific - Clear instructions and expected outcomes
Test one thing - Each scenario should focus on specific behavior
Use realistic data - Test with production-like information
Cover edge cases - Don’t just test happy paths
Update regularly - Evolve scenarios as agents improve

Test Coverage

Create scenarios for:

All major conversation flows
Each tool integration
Common customer objections
Error conditions
Edge cases and exceptions
Different customer personalities
Various outcomes (success, transfer, decline, etc.)

Continuous Testing

Test before deployment - Run full suite before going live
Regression testing - Re-run tests after changes
Version comparison - Test new vs old versions
Monitor failures - Track which scenarios fail frequently
Iterate on failures - Improve prompts based on test results

Test Organization

// Organize scenarios by category
const scenarioCategories = {
  "sales": [
    "demo-booking",
    "price-objection",
    "feature-inquiry"
  ],
  "support": [
    "password-reset",
    "refund-request",
    "technical-issue"
  ],
  "edge-cases": [
    "multiple-interruptions",
    "system-failure",
    "angry-customer"
  ]
};

Integration with CI/CD

Automated Testing Pipeline

// In your CI/CD pipeline
async function testAgentBeforeDeployment(agentId: string, versionId: string) {
  // Get all test scenarios
  const scenarios = await getAllScenarios();

  // Run tests
  const results = await runTestScenarios({
    agentId,
    versionId,
    scenarioIds: scenarios.map(s => s.id)
  });

  // Check pass rate
  const passRate = results.passed / results.totalScenarios;

  if (passRate < 0.9) { // Require 90% pass rate
    throw new Error(`Tests failed. Pass rate: ${passRate * 100}%`);
  }

  console.log(`✅ All tests passed. Deploying agent version ${versionId}`);

  // Deploy the version
  await deployAgentVersion(agentId, versionId);
}

Git Hooks

# pre-commit hook
#!/bin/bash

# Run tests before allowing commit
echo "Running agent tests..."
node scripts/run-tests.js

if [ $? -ne 0 ]; then
  echo "❌ Tests failed. Fix issues before committing."
  exit 1
fi

echo "✅ Tests passed"

Common Use Cases

Pre-Deployment Testing

Validate agents before production release

Regression Testing

Ensure updates don’t break existing functionality

A/B Testing

Compare different agent configurations

Quality Assurance

Maintain consistent agent performance

Tool Validation

Verify custom tool integrations work correctly

Edge Case Coverage

Test unusual scenarios and error conditions

Troubleshooting

Tests Failing Unexpectedly

Check:

Agent version is deployed
Tools and integrations are working
Test scenario instructions are clear
Expected outcomes are realistic
Phone numbers in scenarios are valid

Inconsistent Results

Causes:

Non-deterministic LLM behavior
External API variability
Timing-dependent scenarios

Solutions:

Run tests multiple times
Make expected outcomes more flexible
Use appropriate LLM temperature settings
Add retry logic for flaky tests

Slow Test Execution

Optimize:

Run scenarios in parallel
Use shorter test conversations
Reduce number of scenarios in each run
Cache scenario results

Next Steps

Agents

Build agents ready for testing

Tools

Create tools to test in scenarios

Configurations

Optimize STT, TTS, and LLM for tests

API Reference

Explore the testing lab API

Getting Started

Core Concepts

​What is the Testing Lab?

​Why Test Your Agents?

​Before Production Deployment

​Continuous Validation

​Test Scenarios

​What is a Test Scenario?

​Scenario Components

​Managing Test Scenarios

​List All Scenarios

​Create a Scenario

​Generate Scenarios with AI

​Update a Scenario

​Delete a Scenario

​Running Tests

​Execute Test Scenarios

​Test Results

​Test Evaluation

​Automatic Evaluation

​Manual Review

​Test Scenario Patterns

​Happy Path Testing

​Edge Case Testing

​Objection Handling

​Error Handling

​Tool Usage Validation

​Best Practices

​Scenario Design

​Test Coverage

​Continuous Testing

​Test Organization

​Integration with CI/CD

​Automated Testing Pipeline

​Git Hooks

​Common Use Cases