WebSockets

WebSockets provide a lightweight, audio-only connection to your agent. This is ideal for server-side integrations, headless clients, or any use case where you don't need video.

Connecting to an agent

Create a session

Call POST /public/ws-connections to create a WebSocket session. Specify the companion, provider configuration, and any resources you want to attach:

curl -X POST https://companion-api.napster.com/public/ws-connections \
  -H "X-Api-Key: $NAPSTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "companionId": "comp_abc123",
    "providerConfig": {
      "voiceId": "alloy",
      "settings": {}
    },
    "functions": ["fn_tool1", "fn_tool2"],
    "faqCollections": ["faqc_01"],
    "knowledgeBaseId": "col_8f2a3b"
  }'

The response returns a JSON object containing an encoded token and a connection object with the session id:

{
  "token": "eyJhbGci...",
  "connection": {
    "id": "conn_abc123"
  }
}

Store the connection id on your backend to retrieve session details later, such as transcripts or duration.

For the full list of request body fields — including language, tags, external client identification, and provider settings — see Configuration.

Decode the token

The token is base64-encoded. Decode it to extract two values:

WebSocket URL — the endpoint to connect to
Authentication token — used to authenticate the WebSocket connection

const decoded = JSON.parse(Buffer.from(token, "base64").toString());
const { url, authToken } = decoded;

import base64
import json

decoded = json.loads(base64.b64decode(token))
url = decoded["url"]
auth_token = decoded["authToken"]

const decoded = JSON.parse(atob(token));
const { url, authToken } = decoded;

Open the WebSocket connection

Use the decoded URL and authentication token to establish the WebSocket connection:

import WebSocket from "ws";

const ws = new WebSocket(url, {
  headers: {
    Authorization: `Bearer ${authToken}`,
  },
});

ws.on("open", () => {
  console.log("Connected to agent");
});

ws.on("message", (data) => {
  const event = JSON.parse(data);
  // Handle incoming events from the agent
});

import websocket
import json

ws = websocket.WebSocketApp(
    url,
    header={"Authorization": f"Bearer {auth_token}"},
    on_open=lambda ws: print("Connected to agent"),
    on_message=lambda ws, msg: print(json.loads(msg)),
)

ws.run_forever()

The browser WebSocket API does not support custom headers. Pass the authentication token as a query parameter instead:

const ws = new WebSocket(`${url}?token=${authToken}`);

ws.addEventListener("open", () => {
  console.log("Connected to agent");
});

ws.addEventListener("message", (event) => {
  const data = JSON.parse(event.data);
  // Handle incoming events from the agent
});

Audio protocol

Audio is streamed as 16-bit integer PCM, 16 kHz, mono, base64-encoded in both directions.

Sending audio

Send audio to the agent using the send_audio message type:

{
  "type": "send_audio",
  "data": {
    "data": "<base64-encoded PCM audio>"
  }
}

Receiving audio

The agent sends audio back via audio_received events:

{
  "event": "audio_received",
  "data": {
    "data": "<base64-encoded PCM audio>"
  }
}

The server sends events to your client throughout the session — state changes, speech activity, transcription, and response streaming. For the full event reference, message structure, and lifecycle details, see Server Events.

You can also send commands back to the server during a session to inject text messages or update configuration in real time. See Client Commands.

Barge-in (interruption)

Turn detection is always active. The server continuously listens for user speech during agent output and sends a speech_started event when the user interrupts. To handle barge-in correctly, your client should:

Stop all queued audio playback immediately. When the server detects an interruption, it cancels the current response — but any audio already buffered on the client will keep playing unless you clear it.
Keep sending microphone audio at all times. Do not mute the mic during agent playback. If the mic is muted, the server cannot detect that the user is speaking, and interruption becomes impossible.
Enable echo cancellation. In browser environments, set echoCancellation: true in getUserMedia. Without it, the agent's own audio gets picked up by the microphone and creates a feedback loop where the agent constantly interrupts itself.

WebSocket connections are audio-only. If you need video, use WebRTC with the Web SDK instead.