Deploying Your Agent

Configuration

Configure your agent's session with resources, memory, language, and provider settings

When you create a session via WebRTC or WebSockets, you send a request body that configures the agent for that session. This page documents every field in that configuration.

Agent configuration

These top-level fields define which companion, resources, and metadata to use for the session.

ParameterTypeRequiredDescription
companionIdstringYesThe ID of the companion to use for this session.
functionsstring[]NoTool IDs to attach to the session. The companion can invoke these during the conversation.
faqCollectionsstring[]NoFAQ collection IDs. The companion prioritizes matching Q&A pairs from these collections when responding.
knowledgeBaseIdstringNoKnowledge collection ID. The companion retrieves relevant documents from this collection during the conversation.
languagestringNoThe language for the session (e.g. English, Spanish, French).
externalClientIdstringNoYour own user identifier. Enables cross-session memory for the companion.
externalClientProfileobjectNoArbitrary key-value object with metadata about the end user. The companion can reference this context during the conversation.
tagsobjectNoKey-value string pairs for labeling the session. Use tags to categorize sessions by source, campaign, environment, or any dimension relevant to your application.
disableIdleTimeoutbooleanNoWhen true, the session stays open indefinitely even if no audio or messages are exchanged. By default, idle sessions are automatically closed after a timeout.

Attaching resources

The functions, faqCollections, and knowledgeBaseId fields let you wire resources to a session at creation time. These resources must already exist in your project — create them via the Tools, FAQs, and Knowledge endpoints first.

{
  "companionId": "comp_abc123",
  "functions": ["fn_weather", "fn_booking"],
  "faqCollections": ["faqc_returns", "faqc_shipping"],
  "knowledgeBaseId": "col_product_docs"
}

You can attach multiple tools and FAQ collections to a single session. Only one knowledge collection is supported per session.

Language

The language field sets the language for the session. Use plain language names — English, Spanish, French, German, etc. — not ISO codes.

{
  "companionId": "comp_abc123",
  "language": "Spanish"
}

Memory and user identity

When you pass an externalClientId, the Companion API enables memory for that session. The companion remembers what was said and can pick up the conversation in future sessions with the same user.

{
  "externalClientId": "user_12345"
}

Memory is scoped to a companion + external client ID pair. This means:

  • If the same user (user_12345) talks to a different companion, memory starts fresh — conversations are not shared across companions.
  • If you change the externalClientId for the same user, the companion treats them as a new person with no history.
  • If the same externalClientId connects to the same companion again, the companion recalls previous conversations and can reference them.

Use a stable, unique identifier from your system — such as a user ID or account ID — so memory persists reliably across sessions.

User context

The externalClientProfile field lets you pass structured information about the end user into the session. The companion can reference this data during the conversation to personalize its responses.

{
  "externalClientId": "user_12345",
  "externalClientProfile": {
    "name": "Jane",
    "plan": "premium",
    "company": "Acme Corp"
  }
}

The object is unstructured — include any fields that would help the companion tailor its behavior. For example, passing the user's name lets the companion greet them personally, and passing their subscription tier lets it adjust recommendations accordingly.

Tagging sessions

Tags are string-to-string key-value pairs attached to the session for your own tracking:

{
  "tags": {
    "source": "ios-app",
    "experiment": "v2-onboarding",
    "env": "production"
  }
}

Tag keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

Tags are stored on the session and returned when you retrieve session details. Use them for filtering, analytics, or debugging.

Provider configuration

The providerConfig object controls the voice, video, and model behavior for the session.

The provider settings below are specific to OpenAI Realtime. Support for additional providers may be added in the future.

ParameterTypeRequiredDescription
providerConfig.voiceIdstringNoThe OpenAI voice to use for the companion's speech output (e.g. alloy, echo, shimmer).
providerConfig.useGreenVideobooleanNoWhen true, the companion's video stream uses a green-screen background that you can composite in your application. Only applies to WebRTC connections with video enabled.
providerConfig.settingsobjectNoModel and audio processing settings for the session.

Settings

The providerConfig.settings object lets you fine-tune the model and audio pipeline:

ParameterTypeDescription
settings.instructionsstringOverrides the companion's system prompt for this session only. Use this to replace the default prompt with session-specific instructions without modifying the companion itself.
settings.temperaturefloatControls randomness in the companion's responses. Lower values (e.g. 0.3) produce more focused output; higher values (e.g. 0.9) increase variety.
settings.turnDetectionobjectVoice Activity Detection (VAD) configuration. Controls how the system detects when the user starts and stops speaking. Pass an empty object {} for defaults.
settings.noiseReductionobjectNoise reduction configuration. Reduces background noise from the user's microphone input.

Turn detection

The turnDetection object configures Voice Activity Detection (VAD) — how the system decides when the user has started and stopped speaking. This directly affects interruption behavior and response timing.

ParameterTypeDescription
thresholdfloatActivation threshold for detecting speech. Higher values (e.g. 0.8) require louder, clearer speech to trigger; lower values (e.g. 0.3) are more sensitive.
prefix_padding_msintegerMilliseconds of audio to include before detected speech begins. Prevents clipping the start of an utterance.
silence_duration_msintegerMilliseconds of silence required before the system considers the user done speaking. Lower values (e.g. 200) make the agent respond faster; higher values (e.g. 800) wait longer for the user to continue.
{
  "providerConfig": {
    "settings": {
      "turnDetection": {
        "threshold": 0.9,
        "prefix_padding_ms": 400,
        "silence_duration_ms": 500
      }
    }
  }
}

Pass an empty object {} to use the provider's default VAD settings.

Noise reduction

The noiseReduction object accepts a single type field:

ValueDescription
nearFieldOptimized for close-range microphones — laptops, headsets, phones held to the ear.
farFieldOptimized for distant microphones — speakerphones, smart speakers, conference room setups.
{
  "providerConfig": {
    "settings": {
      "noiseReduction": {
        "type": "nearField"
      }
    }
  }
}

If you omit noiseReduction, no noise processing is applied. Choose the mode that matches your user's typical microphone distance for best results.

On this page