Web Voice Testing

Test your Vocobase voice agent directly from a web application using real-time WebRTC audio. This guide covers everything you need to connect your frontend to a voice agent — works with any JavaScript framework or vanilla JS.

How it works

Your web app starts a voice session via the Vocobase API, receives WebRTC credentials, and connects using the open-source Pipecat client SDK. Audio flows in real time — your user speaks, the agent responds.

Your App                      Vocobase API                  WebRTC
  |                               |                           |
  |  POST /api/v2/sessions/webrtc |                           |
  |  { agent_id }                 |                           |
  | ----------------------------> |                           |
  |                               |  (starts bot + room)      |
  | <---------------------------- |                           |
  |  { session_id,                |                           |
  |    daily_room_url,            |                           |
  |    daily_token }              |                           |
  |                               |                           |
  |  client.connect({ url, token })                           |
  | --------------------------------------------------------> |
  |                               |                           |
  | <========= real-time voice conversation (WebRTC) =======> |
  |                               |                           |
  |  client.disconnect()                                      |
  | --------------------------------------------------------> |
  |                               |                           |
  |                               |  (session ends,           |
  |                               |   credits deducted,       |
  |                               |   session.completed       |
  |                               |   webhook fires)          |

Install dependencies

npm install @pipecat-ai/client-js @pipecat-ai/daily-transport

yarn add @pipecat-ai/client-js @pipecat-ai/daily-transport

pnpm add @pipecat-ai/client-js @pipecat-ai/daily-transport

For React projects, also install:

npm install @pipecat-ai/client-react

yarn add @pipecat-ai/client-react

pnpm add @pipecat-ai/client-react

Package	Purpose
`@pipecat-ai/client-js`	Core client — manages connection, events, audio
`@pipecat-ai/daily-transport`	WebRTC transport layer
`@pipecat-ai/client-react`	React provider + audio component (optional)

Start a session

Call the Vocobase API to create a voice session. The response contains WebRTC credentials you’ll use to connect.

curl -X POST https://api.vocobase.com/api/v2/sessions/webrtc \
  -H "Authorization: Bearer rg_live_abc123def456ghi789jkl012" \
  -H "Content-Type: application/json" \
  -d '{"agent_id": "a1234567-abcd-1234-abcd-123456789012"}'

const response = await fetch("https://api.vocobase.com/api/v2/sessions/webrtc", {
  method: "POST",
  headers: {
    "Authorization": "Bearer rg_live_abc123def456ghi789jkl012",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ agent_id: "a1234567-abcd-1234-abcd-123456789012" }),
});
const { data: session } = await response.json();
// { session_id, agent_id, daily_room_url, daily_token }

import requests

response = requests.post(
    "https://api.vocobase.com/api/v2/sessions/webrtc",
    headers={
        "Authorization": "Bearer rg_live_abc123def456ghi789jkl012",
        "Content-Type": "application/json",
    },
    json={"agent_id": "a1234567-abcd-1234-abcd-123456789012"},
)
session = response.json()["data"]

Response:

{
  "success": true,
  "data": {
    "session_id": "550e8400-e29b-41d4-a716-446655440000",
    "agent_id": "a1234567-abcd-1234-abcd-123456789012",
    "daily_room_url": "https://voice-session.example.com/abc123",
    "daily_token": "eyJhbGciOi..."
  }
}

Field	Description
`session_id`	Unique session identifier. Echoed in the `session.completed` webhook.
`agent_id`	The agent handling this session.
`daily_room_url`	WebRTC room URL. Pass this to `client.connect()`.
`daily_token`	One-time access token for the room. Pass this to `client.connect()`.

You can also pass variables in the request body to substitute pre-call values into the agent’s prompt and greeting (see Pre-call Variables).

Error responses

Status	Cause	Action
`400`	Missing or invalid `agent_id`	Check the request body
`401`	Invalid API key	Verify key format (`rg_live_*`)
`403`	Account not active or no V2 access	Contact Vocobase support
`404`	Agent not found	Verify the agent UUID and that it’s active
`429`	Rate or concurrency limit	Wait and retry
`502`	Voice session temporarily unavailable	Retry after a few seconds

Never expose your API key in client-side code. Proxy the session start call through your own backend — your backend adds the API key, gets the credentials, and returns daily_room_url + daily_token to the browser. The WebRTC connection itself requires no API key.

Connect with vanilla JavaScript

This approach works with any framework — plain JS, Vue, Angular, Svelte, or no framework at all.

Basic connection

import { PipecatClient, RTVIEvent } from "@pipecat-ai/client-js";
import { DailyTransport } from "@pipecat-ai/daily-transport";

// Handle bot audio playback
function handleBotAudio(track, participant) {
  if (participant.local || track.kind !== "audio") return;
  const audio = document.createElement("audio");
  audio.srcObject = new MediaStream([track]);
  audio.play();
}

// Create client (once per page)
const client = new PipecatClient({
  transport: new DailyTransport(),
  enableMic: true,
  enableCam: false,
  callbacks: {
    onTrackStarted: handleBotAudio,
    onBotReady: () => console.log("Agent is ready"),
  },
});

// Start session via your backend (which calls Vocobase API)
const res = await fetch("/api/voice-session", { method: "POST" });
const session = await res.json();

// Connect to the voice agent
await client.connect({
  url: session.daily_room_url,
  token: session.daily_token,
});

Listen to events

// User's speech (transcribed in real time)
client.on(RTVIEvent.UserTranscript, (data) => {
  if (!data.final) return; // Only use final transcripts
  console.log("User:", data.text);
});

// Agent's response (arrives in TTS chunks — see aggregation below)
client.on(RTVIEvent.BotTtsText, (data) => {
  console.log("Agent:", data.text);
});

// Connection lifecycle
client.on(RTVIEvent.Connected, () => console.log("Connected"));
client.on(RTVIEvent.Disconnected, () => console.log("Disconnected"));
client.on(RTVIEvent.BotReady, () => console.log("Agent ready to talk"));

// Errors
client.on(RTVIEvent.Error, (message) => {
  console.error("Error:", message?.data?.message || message);
});

End the call

await client.disconnect();

Microphone controls

// Mute
client.enableMic(false);

// Unmute
client.enableMic(true);

Aggregating bot responses

RTVIEvent.BotTtsText fires once per TTS chunk, not once per complete response. To build full agent messages, aggregate chunks until the next user turn:

let currentBotMessage = "";

client.on(RTVIEvent.BotTtsText, (data) => {
  currentBotMessage += (currentBotMessage ? " " : "") + data.text;
  updateLastAgentMessage(currentBotMessage);
});

client.on(RTVIEvent.UserTranscript, (data) => {
  if (!data.final) return;
  currentBotMessage = ""; // Reset for next agent turn
  addUserMessage(data.text);
});

Complete HTML example

A self-contained page you can use to test your integration:

<!DOCTYPE html>
<html>
<head><title>Voice Agent Test</title></head>
<body>
  <div id="status">Ready</div>
  <div id="transcript"></div>
  <button id="startBtn">Start Call</button>
  <button id="endBtn" disabled>End Call</button>

  <script type="module">
    import { PipecatClient, RTVIEvent } from "@pipecat-ai/client-js";
    import { DailyTransport } from "@pipecat-ai/daily-transport";

    const statusEl = document.getElementById("status");
    const transcriptEl = document.getElementById("transcript");
    const startBtn = document.getElementById("startBtn");
    const endBtn = document.getElementById("endBtn");

    function handleBotAudio(track, participant) {
      if (participant.local || track.kind !== "audio") return;
      const audio = document.createElement("audio");
      audio.srcObject = new MediaStream([track]);
      audio.play();
    }

    const client = new PipecatClient({
      transport: new DailyTransport(),
      enableMic: true,
      enableCam: false,
      callbacks: { onTrackStarted: handleBotAudio },
    });

    let currentBotMsg = "";

    function addMessage(role, text) {
      const div = document.createElement("div");
      div.textContent = `${role}: ${text}`;
      transcriptEl.appendChild(div);
    }

    client.on(RTVIEvent.UserTranscript, (data) => {
      if (!data.final) return;
      currentBotMsg = "";
      addMessage("You", data.text);
    });

    client.on(RTVIEvent.BotTtsText, (data) => {
      currentBotMsg += (currentBotMsg ? " " : "") + data.text;
      // Update last agent message or add new one
      const lastDiv = transcriptEl.lastElementChild;
      if (lastDiv && lastDiv.textContent.startsWith("Agent:")) {
        lastDiv.textContent = "Agent: " + currentBotMsg;
      } else {
        addMessage("Agent", currentBotMsg);
      }
    });

    client.on(RTVIEvent.Connected, () => {
      statusEl.textContent = "Connected";
      startBtn.disabled = true;
      endBtn.disabled = false;
    });

    client.on(RTVIEvent.Disconnected, () => {
      statusEl.textContent = "Disconnected";
      startBtn.disabled = false;
      endBtn.disabled = true;
    });

    client.on(RTVIEvent.Error, (msg) => {
      statusEl.textContent = "Error: " + (msg?.data?.message || "Unknown");
    });

    startBtn.addEventListener("click", async () => {
      statusEl.textContent = "Connecting...";
      startBtn.disabled = true;

      try {
        // Replace with your backend endpoint
        const res = await fetch("/api/voice-session", { method: "POST" });
        if (!res.ok) throw new Error(`HTTP ${res.status}`);
        const session = await res.json();

        await client.connect({
          url: session.daily_room_url,
          token: session.daily_token,
        });
      } catch (err) {
        statusEl.textContent = "Error: " + err.message;
        startBtn.disabled = false;
      }
    });

    endBtn.addEventListener("click", async () => {
      statusEl.textContent = "Disconnecting...";
      await client.disconnect();
    });
  </script>
</body>
</html>

Connect with React

The @pipecat-ai/client-react package provides PipecatClientProvider and PipecatClientAudio — a provider for context and a component that automatically handles bot audio playback (replacing the manual onTrackStarted approach).

import { useState, useEffect, useCallback } from "react";
import { PipecatClient, RTVIEvent } from "@pipecat-ai/client-js";
import { DailyTransport } from "@pipecat-ai/daily-transport";
import {
  PipecatClientProvider,
  PipecatClientAudio,
} from "@pipecat-ai/client-react";

interface TranscriptEntry {
  role: "user" | "agent";
  content: string;
}

function VoiceChat({ agentName }: { agentName: string }) {
  const [client, setClient] = useState<PipecatClient | null>(null);
  const [status, setStatus] = useState<
    "idle" | "connecting" | "connected" | "error"
  >("idle");
  const [transcript, setTranscript] = useState<TranscriptEntry[]>([]);
  const [error, setError] = useState<string | null>(null);
  const [isMuted, setIsMuted] = useState(false);

  // Create client once
  useEffect(() => {
    const pc = new PipecatClient({
      transport: new DailyTransport(),
      enableMic: true,
      enableCam: false,
    });
    setClient(pc);
    return () => { pc.disconnect().catch(() => {}); };
  }, []);

  // Subscribe to events
  useEffect(() => {
    if (!client) return;

    const onUserTranscript = (data: any) => {
      if (!data.final) return;
      setTranscript((prev) => [
        ...prev,
        { role: "user", content: data.text },
      ]);
    };

    const onBotTtsText = (data: any) => {
      setTranscript((prev) => {
        const last = prev[prev.length - 1];
        if (last?.role === "agent") {
          return [
            ...prev.slice(0, -1),
            { ...last, content: last.content + " " + data.text },
          ];
        }
        return [...prev, { role: "agent", content: data.text }];
      });
    };

    const onConnected = () => setStatus("connected");
    const onDisconnected = () => { setStatus("idle"); setIsMuted(false); };
    const onError = (msg: any) => {
      setError(msg?.data?.message || msg?.data?.error || "Connection error");
      setStatus("error");
    };

    client.on(RTVIEvent.UserTranscript, onUserTranscript);
    client.on(RTVIEvent.BotTtsText, onBotTtsText);
    client.on(RTVIEvent.Connected, onConnected);
    client.on(RTVIEvent.Disconnected, onDisconnected);
    client.on(RTVIEvent.Error, onError);

    return () => {
      client.off(RTVIEvent.UserTranscript, onUserTranscript);
      client.off(RTVIEvent.BotTtsText, onBotTtsText);
      client.off(RTVIEvent.Connected, onConnected);
      client.off(RTVIEvent.Disconnected, onDisconnected);
      client.off(RTVIEvent.Error, onError);
    };
  }, [client]);

  const connect = useCallback(async () => {
    if (!client || status === "connecting") return;
    setError(null);
    setStatus("connecting");
    setTranscript([]);

    try {
      // Call YOUR backend, which proxies to Vocobase API
      const res = await fetch("/api/voice-session", { method: "POST" });
      if (!res.ok) throw new Error(`HTTP ${res.status}`);
      const session = await res.json();

      await client.connect({
        url: session.daily_room_url,
        token: session.daily_token,
      });
    } catch (err) {
      setError(err instanceof Error ? err.message : "Failed to connect");
      setStatus("error");
    }
  }, [client, status]);

  const disconnect = useCallback(async () => {
    if (!client) return;
    await client.disconnect();
  }, [client]);

  const toggleMute = useCallback(() => {
    if (!client) return;
    client.enableMic(isMuted);
    setIsMuted(!isMuted);
  }, [client, isMuted]);

  if (!client) return null;

  return (
    <PipecatClientProvider client={client}>
      <div>
        <p>Status: {status}</p>
        {error && <p style={{ color: "red" }}>{error}</p>}

        {transcript.map((entry, i) => (
          <div key={i}>
            <strong>{entry.role === "user" ? "You" : "Agent"}:</strong>{" "}
            {entry.content}
          </div>
        ))}

        {status === "connected" ? (
          <>
            <button onClick={toggleMute}>
              {isMuted ? "Unmute" : "Mute"}
            </button>
            <button onClick={disconnect}>End Call</button>
          </>
        ) : (
          <button
            onClick={connect}
            disabled={status === "connecting"}
          >
            {status === "connecting" ? "Connecting..." : "Start Call"}
          </button>
        )}

        {/* Handles bot audio playback automatically */}
        {status === "connected" && <PipecatClientAudio />}
      </div>
    </PipecatClientProvider>
  );
}

export default function App() {
  return <VoiceChat agentName="my-agent" />;
}

PipecatClientAudio renders a hidden <audio> element that plays the agent’s voice. Mount it when connected — it replaces the manual onTrackStarted callback used in the vanilla JS approach.

Events reference

Event	Payload	Description
`RTVIEvent.Connected`	—	WebRTC connection established
`RTVIEvent.Disconnected`	—	Connection closed
`RTVIEvent.BotReady`	—	Agent initialized and ready to talk
`RTVIEvent.UserTranscript`	`{ text, final }`	User’s speech transcribed. Only use entries where `final` is `true`.
`RTVIEvent.BotTtsText`	`{ text }`	Agent’s response text. Arrives in chunks — aggregate them per turn.
`RTVIEvent.TransportStateChanged`	`state: string`	Low-level transport state changes
`RTVIEvent.Error`	`{ data: { message, error } }`	Connection or runtime error

Backend proxy example

Your backend should proxy the session start call to keep the API key server-side.

const express = require("express");
const app = express();

app.post("/api/voice-session", async (req, res) => {
  try {
    const response = await fetch(
      "https://api.vocobase.com/api/v2/sessions/webrtc",
      {
        method: "POST",
        headers: {
          "Authorization": `Bearer ${process.env.VOCOBASE_API_KEY}`,
          "Content-Type": "application/json",
        },
        body: JSON.stringify({ agent_id: process.env.VOCOBASE_AGENT_ID }),
      }
    );

    const body = await response.json();
    if (!response.ok || !body.success) {
      return res.status(response.status).json(body);
    }

    // Forward only the WebRTC credentials to the browser.
    res.json(body.data);
  } catch (err) {
    res.status(500).json({ error: "Failed to start voice session" });
  }
});

from fastapi import FastAPI, HTTPException
import httpx
import os

app = FastAPI()

@app.post("/api/voice-session")
async def start_voice_session():
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://api.vocobase.com/api/v2/sessions/webrtc",
            headers={
                "Authorization": f"Bearer {os.environ['VOCOBASE_API_KEY']}",
                "Content-Type": "application/json",
            },
            json={"agent_id": os.environ["VOCOBASE_AGENT_ID"]},
        )

    body = response.json()
    if response.status_code >= 400 or not body.get("success"):
        raise HTTPException(status_code=response.status_code, detail=body)

    # Forward only the WebRTC credentials to the browser.
    return body["data"]

Receiving the transcript and recording

After the session ends, the platform fires a session.completed webhook to each enabled webhook endpoint containing the transcript, recording URL, credit usage, pre-call variables, and any post-call extraction. The call block is omitted for browser WebRTC sessions because there’s no associated phone call. Configure webhook endpoints with POST /api/v2/config/webhooks; enabled endpoints receive events for both telephony and WebRTC sessions. See Webhook Setup for setup and Webhook Payloads for the full schema. If your stack can’t accept inbound webhooks, poll GET /api/v2/sessions/{session_id} instead — it returns the same fields with a freshly-minted recording URL on each call:

curl -X GET https://api.vocobase.com/api/v2/sessions/SESSION_ID \
  -H "Authorization: Bearer rg_live_abc123def456ghi789jkl012"

Billing

Voice sessions are billed at 1 credit per 60 seconds, pro-rata by the second. Credits are deducted after the session ends. See Credits & Billing for full details.

Rate limits

Limit	Value
Session starts	10 per minute per API key
Concurrent sessions	Configurable per key (default: 5)

Rate limit headers are included on all responses: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After (on 429). See Authentication for full rate limit details.

Troubleshooting

Issue	Solution
No audio from agent	Vanilla JS: Ensure `onTrackStarted` creates an `<audio>` element and calls `.play()`. React: Ensure `<PipecatClientAudio />` is mounted when connected. Browsers may block autoplay — always connect via a user gesture (button click).
Microphone not working	Check `navigator.mediaDevices.getUserMedia` permission. HTTPS is required in production.
Connection drops immediately	Verify `daily_room_url` and `daily_token` from the API are passed correctly to `client.connect()`. Tokens are single-use.
Multiple audio elements	In vanilla JS, track the `<audio>` element and remove the old one before creating a new one in `onTrackStarted`.
`enableCam: false` but camera prompt	The WebRTC transport layer may request camera access internally. Setting `enableCam: false` ensures the camera is never activated.

HTTPS is required for microphone access in all browsers except localhost. Your production deployment must use HTTPS.

Next steps

Credits & Billing

Understand how voice sessions consume credits.

Authentication

API key format, rate limits, and error codes.

Webhook Payloads

Receive session.completed events with transcripts and duration.

Quick Start

Create an agent and make your first call.

Getting Started

Guides

Partner Onboarding

API Reference

Webhooks

Telephony

Web Voice Testing

Web Voice Testing

How it works

Install dependencies

Start a session

Error responses

Connect with vanilla JavaScript

Basic connection

Listen to events

End the call

Microphone controls

Aggregating bot responses

Complete HTML example

Connect with React

Events reference

Backend proxy example

Receiving the transcript and recording

Billing

Rate limits

Troubleshooting

Next steps

Credits & Billing

Authentication

Webhook Payloads

Quick Start

​Web Voice Testing

​How it works

​Install dependencies

​Start a session

​Error responses

​Connect with vanilla JavaScript

​Basic connection

​Listen to events

​End the call

​Microphone controls

​Aggregating bot responses

​Complete HTML example

​Connect with React

​Events reference

​Backend proxy example

​Receiving the transcript and recording

​Billing

​Rate limits

​Troubleshooting

​Next steps

Credits & Billing

Authentication

Webhook Payloads

Quick Start

Web Voice Testing

How it works

Install dependencies

Start a session

Error responses

Connect with vanilla JavaScript

Basic connection

Listen to events

End the call

Microphone controls

Aggregating bot responses

Complete HTML example

Connect with React

Events reference

Backend proxy example

Receiving the transcript and recording

Billing

Rate limits

Troubleshooting

Next steps