Configuration

Configure your agent's session with resources, memory, language, and provider settings

When you create a session via WebRTC or WebSockets, you send a request body that configures the agent for that session. This page documents every field in that configuration.

Attaching resources

Wire tools, FAQ collections, and knowledge to a session

Language

Set the session language for speech and responses

Memory and user identity

Enable cross-session memory with external client IDs

User context

Pass end-user metadata the companion can reference

Tagging sessions

Label sessions with key-value pairs for filtering and analytics

Provider configuration

Voice selection, model settings, and noise reduction

Agent configuration

These top-level fields define which companion, resources, and metadata to use for the session.

Parameter	Type	Required	Description
`companionId`	string	Yes	The ID of the companion to use for this session.
`functions`	string[]	No	Tool IDs to attach to the session. The companion can invoke these during the conversation.
`faqCollections`	string[]	No	FAQ collection IDs. The companion prioritizes matching Q&A pairs from these collections when responding.
`knowledgeBaseId`	string	No	Knowledge collection ID. The companion retrieves relevant documents from this collection during the conversation.
`language`	string	No	The language for the session (e.g. `English`, `Spanish`, `French`).
`externalClientId`	string	No	Your own user identifier. Enables cross-session memory for the companion.
`externalClientProfile`	object	No	Arbitrary key-value object with metadata about the end user. The companion can reference this context during the conversation.
`tags`	object	No	Key-value string pairs for labeling the session. Use tags to categorize sessions by source, campaign, environment, or any dimension relevant to your application.
`disableIdleTimeout`	boolean	No	When `true`, the session stays open indefinitely even if no audio or messages are exchanged. By default, idle sessions are automatically closed after a timeout.

The functions, faqCollections, and knowledgeBaseId fields let you wire resources to a session at creation time. These resources must already exist in your project — create them via the Tools, FAQs, and Knowledge endpoints first.

{
  "companionId": "comp_abc123",
  "functions": ["fn_weather", "fn_booking"],
  "faqCollections": ["faqc_returns", "faqc_shipping"],
  "knowledgeBaseId": "col_product_docs"
}

You can attach multiple tools and FAQ collections to a single session. Only one knowledge collection is supported per session.

Language

The language field sets the language for the session. Use plain language names — English, Spanish, French, German, etc. — not ISO codes.

{
  "companionId": "comp_abc123",
  "language": "Spanish"
}

Memory and user identity

When you pass an externalClientId, the Companion API enables memory for that session. The companion remembers what was said and can pick up the conversation in future sessions with the same user.

{
  "externalClientId": "user_12345"
}

Memory is scoped to a companion + external client ID pair. This means:

If the same user (user_12345) talks to a different companion, memory starts fresh — conversations are not shared across companions.
If you change the externalClientId for the same user, the companion treats them as a new person with no history.
If the same externalClientId connects to the same companion again, the companion recalls previous conversations and can reference them.

Use a stable, unique identifier from your system — such as a user ID or account ID — so memory persists reliably across sessions.

User context

The externalClientProfile field lets you pass structured information about the end user into the session. The companion can reference this data during the conversation to personalize its responses.

{
  "externalClientId": "user_12345",
  "externalClientProfile": {
    "name": "Jane",
    "plan": "premium",
    "company": "Acme Corp"
  }
}

The object is unstructured — include any fields that would help the companion tailor its behavior. For example, passing the user's name lets the companion greet them personally, and passing their subscription tier lets it adjust recommendations accordingly.

Tagging sessions

Tags are string-to-string key-value pairs attached to the session for your own tracking:

{
  "tags": {
    "source": "ios-app",
    "experiment": "v2-onboarding",
    "env": "production"
  }
}

Tag keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

Tags are stored on the session and returned when you retrieve session details. Use them for filtering, analytics, or debugging.

Provider configuration

The providerConfig object controls the voice, video, and model behavior for the session.

The provider settings below are specific to OpenAI Realtime. Support for additional providers may be added in the future.

Parameter	Type	Required	Description
`providerConfig.voiceId`	string	No	The OpenAI voice to use for the companion's speech output (e.g. `alloy`, `echo`, `shimmer`).
`providerConfig.useGreenVideo`	boolean	No	When `true`, the companion's video stream uses a green-screen background that you can composite in your application. Only applies to WebRTC connections with video enabled.
`providerConfig.settings`	object	No	Model and audio processing settings for the session.

Settings

The providerConfig.settings object lets you fine-tune the model and audio pipeline:

Parameter	Type	Description
`settings.instructions`	string	Overrides the companion's system prompt for this session only. Use this to replace the default prompt with session-specific instructions without modifying the companion itself.
`settings.temperature`	float	Controls randomness in the companion's responses. Lower values (e.g. `0.3`) produce more focused output; higher values (e.g. `0.9`) increase variety.
`settings.turnDetection`	object	Voice Activity Detection (VAD) configuration. Controls how the system detects when the user starts and stops speaking. Pass an empty object `{}` for defaults.
`settings.noiseReduction`	object	Noise reduction configuration. Reduces background noise from the user's microphone input.

Turn detection

The turnDetection object configures Voice Activity Detection (VAD) — how the system decides when the user has started and stopped speaking. This directly affects interruption behavior and response timing.

Parameter	Type	Description
`threshold`	float	Activation threshold for detecting speech. Higher values (e.g. `0.8`) require louder, clearer speech to trigger; lower values (e.g. `0.3`) are more sensitive.
`prefix_padding_ms`	integer	Milliseconds of audio to include before detected speech begins. Prevents clipping the start of an utterance.
`silence_duration_ms`	integer	Milliseconds of silence required before the system considers the user done speaking. Lower values (e.g. `200`) make the agent respond faster; higher values (e.g. `800`) wait longer for the user to continue.

{
  "providerConfig": {
    "settings": {
      "turnDetection": {
        "threshold": 0.9,
        "prefix_padding_ms": 400,
        "silence_duration_ms": 500
      }
    }
  }
}

Pass an empty object {} to use the provider's default VAD settings.

Noise reduction

The noiseReduction object accepts a single type field:

Value	Description
`nearField`	Optimized for close-range microphones — laptops, headsets, phones held to the ear.
`farField`	Optimized for distant microphones — speakerphones, smart speakers, conference room setups.

{
  "providerConfig": {
    "settings": {
      "noiseReduction": {
        "type": "nearField"
      }
    }
  }
}

If you omit noiseReduction, no noise processing is applied. Choose the mode that matches your user's typical microphone distance for best results.

Configuration

Attaching resources

Language

Memory and user identity

User context

Tagging sessions

Provider configuration

On this page