Configuration
Configure your agent's session with resources, memory, language, and provider settings
When you create a session via WebRTC or WebSockets, you send a request body that configures the agent for that session. This page documents every field in that configuration.
Attaching resources
Wire tools, FAQ collections, and knowledge to a session
Language
Set the session language for speech and responses
Memory and user identity
Enable cross-session memory with external client IDs
User context
Pass end-user metadata the companion can reference
Tagging sessions
Label sessions with key-value pairs for filtering and analytics
Provider configuration
Voice selection, model settings, and noise reduction
Agent configuration
These top-level fields define which companion, resources, and metadata to use for the session.
| Parameter | Type | Required | Description |
|---|---|---|---|
companionId | string | Yes | The ID of the companion to use for this session. |
functions | string[] | No | Tool IDs to attach to the session. The companion can invoke these during the conversation. |
faqCollections | string[] | No | FAQ collection IDs. The companion prioritizes matching Q&A pairs from these collections when responding. |
knowledgeBaseId | string | No | Knowledge collection ID. The companion retrieves relevant documents from this collection during the conversation. |
language | string | No | The language for the session (e.g. English, Spanish, French). |
externalClientId | string | No | Your own user identifier. Enables cross-session memory for the companion. |
externalClientProfile | object | No | Arbitrary key-value object with metadata about the end user. The companion can reference this context during the conversation. |
tags | object | No | Key-value string pairs for labeling the session. Use tags to categorize sessions by source, campaign, environment, or any dimension relevant to your application. |
disableIdleTimeout | boolean | No | When true, the session stays open indefinitely even if no audio or messages are exchanged. By default, idle sessions are automatically closed after a timeout. |
Attaching resources
The functions, faqCollections, and knowledgeBaseId fields let you wire resources to a session at creation time. These resources must already exist in your project — create them via the Tools, FAQs, and Knowledge endpoints first.
{
"companionId": "comp_abc123",
"functions": ["fn_weather", "fn_booking"],
"faqCollections": ["faqc_returns", "faqc_shipping"],
"knowledgeBaseId": "col_product_docs"
}You can attach multiple tools and FAQ collections to a single session. Only one knowledge collection is supported per session.
Language
The language field sets the language for the session. Use plain language names — English, Spanish, French, German, etc. — not ISO codes.
{
"companionId": "comp_abc123",
"language": "Spanish"
}Memory and user identity
When you pass an externalClientId, the Companion API enables memory for that session. The companion remembers what was said and can pick up the conversation in future sessions with the same user.
{
"externalClientId": "user_12345"
}Memory is scoped to a companion + external client ID pair. This means:
- If the same user (
user_12345) talks to a different companion, memory starts fresh — conversations are not shared across companions. - If you change the
externalClientIdfor the same user, the companion treats them as a new person with no history. - If the same
externalClientIdconnects to the same companion again, the companion recalls previous conversations and can reference them.
Use a stable, unique identifier from your system — such as a user ID or account ID — so memory persists reliably across sessions.
User context
The externalClientProfile field lets you pass structured information about the end user into the session. The companion can reference this data during the conversation to personalize its responses.
{
"externalClientId": "user_12345",
"externalClientProfile": {
"name": "Jane",
"plan": "premium",
"company": "Acme Corp"
}
}The object is unstructured — include any fields that would help the companion tailor its behavior. For example, passing the user's name lets the companion greet them personally, and passing their subscription tier lets it adjust recommendations accordingly.
Tagging sessions
Tags are string-to-string key-value pairs attached to the session for your own tracking:
{
"tags": {
"source": "ios-app",
"experiment": "v2-onboarding",
"env": "production"
}
}Tag keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.
Tags are stored on the session and returned when you retrieve session details. Use them for filtering, analytics, or debugging.
Provider configuration
The providerConfig object controls the voice, video, and model behavior for the session.
The provider settings below are specific to OpenAI Realtime. Support for additional providers may be added in the future.
| Parameter | Type | Required | Description |
|---|---|---|---|
providerConfig.voiceId | string | No | The OpenAI voice to use for the companion's speech output (e.g. alloy, echo, shimmer). |
providerConfig.useGreenVideo | boolean | No | When true, the companion's video stream uses a green-screen background that you can composite in your application. Only applies to WebRTC connections with video enabled. |
providerConfig.settings | object | No | Model and audio processing settings for the session. |
Settings
The providerConfig.settings object lets you fine-tune the model and audio pipeline:
| Parameter | Type | Description |
|---|---|---|
settings.instructions | string | Overrides the companion's system prompt for this session only. Use this to replace the default prompt with session-specific instructions without modifying the companion itself. |
settings.temperature | float | Controls randomness in the companion's responses. Lower values (e.g. 0.3) produce more focused output; higher values (e.g. 0.9) increase variety. |
settings.turnDetection | object | Voice Activity Detection (VAD) configuration. Controls how the system detects when the user starts and stops speaking. Pass an empty object {} for defaults. |
settings.noiseReduction | object | Noise reduction configuration. Reduces background noise from the user's microphone input. |
Turn detection
The turnDetection object configures Voice Activity Detection (VAD) — how the system decides when the user has started and stopped speaking. This directly affects interruption behavior and response timing.
| Parameter | Type | Description |
|---|---|---|
threshold | float | Activation threshold for detecting speech. Higher values (e.g. 0.8) require louder, clearer speech to trigger; lower values (e.g. 0.3) are more sensitive. |
prefix_padding_ms | integer | Milliseconds of audio to include before detected speech begins. Prevents clipping the start of an utterance. |
silence_duration_ms | integer | Milliseconds of silence required before the system considers the user done speaking. Lower values (e.g. 200) make the agent respond faster; higher values (e.g. 800) wait longer for the user to continue. |
{
"providerConfig": {
"settings": {
"turnDetection": {
"threshold": 0.9,
"prefix_padding_ms": 400,
"silence_duration_ms": 500
}
}
}
}Pass an empty object {} to use the provider's default VAD settings.
Noise reduction
The noiseReduction object accepts a single type field:
| Value | Description |
|---|---|
nearField | Optimized for close-range microphones — laptops, headsets, phones held to the ear. |
farField | Optimized for distant microphones — speakerphones, smart speakers, conference room setups. |
{
"providerConfig": {
"settings": {
"noiseReduction": {
"type": "nearField"
}
}
}
}If you omit noiseReduction, no noise processing is applied. Choose the mode that matches your user's typical microphone distance for best results.