Skip to main content

Web Voice Testing

Test your Vocobase voice agent directly from a web application using real-time WebRTC audio. This guide covers everything you need to connect your frontend to a voice agent — works with any JavaScript framework or vanilla JS.

How it works

Your web app starts a voice session via the Vocobase API, receives WebRTC credentials, and connects using the open-source Pipecat client SDK. Audio flows in real time — your user speaks, the agent responds.
Your App                      Vocobase API                  WebRTC
  |                               |                           |
  |  POST /api/sessions/start     |                           |
  |  { agent_name }               |                           |
  | ----------------------------> |                           |
  |                               |  (starts bot + room)      |
  | <---------------------------- |                           |
  |  { session_id,                |                           |
  |    room_url, token }          |                           |
  |                               |                           |
  |  client.connect({ url, token })                           |
  | --------------------------------------------------------> |
  |                               |                           |
  | <========= real-time voice conversation (WebRTC) =======> |
  |                               |                           |
  |  client.disconnect()                                      |
  | --------------------------------------------------------> |
  |                               |                           |
  |                               |  (session ends,           |
  |                               |   credits deducted)       |

Install dependencies

npm install @pipecat-ai/client-js @pipecat-ai/daily-transport
For React projects, also install:
npm install @pipecat-ai/client-react
PackagePurpose
@pipecat-ai/client-jsCore client — manages connection, events, audio
@pipecat-ai/daily-transportWebRTC transport layer
@pipecat-ai/client-reactReact provider + audio component (optional)

Start a session

Call the Vocobase API to create a voice session. The response contains WebRTC credentials you’ll use to connect.
curl -X POST https://api.vocobase.com/api/sessions/start \
  -H "Authorization: Bearer rg_live_abc123def456ghi789jkl012" \
  -H "Content-Type: application/json" \
  -d '{"agent_name": "my-agent"}'
Response:
{
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "room_url": "https://vocobase.daily.co/abc123",
  "token": "eyJhbGciOi..."
}
FieldDescription
session_idUnique session identifier. Use this to fetch session details later.
room_urlWebRTC room URL. Pass this to client.connect().
tokenOne-time access token for the room. Pass this to client.connect().

Error responses

StatusCauseAction
400Missing or invalid agent_nameCheck the request body
401Invalid API keyVerify key format (rg_live_*)
402Insufficient creditsTop up in the dashboard
403Account not activeContact Vocobase support
404Agent not foundVerify agent name (case-sensitive)
429Rate or concurrency limitWait and retry
502Bot server temporarily unavailableRetry after a few seconds
Never expose your API key in client-side code. Proxy the session start call through your own backend — your backend adds the API key, gets the credentials, and returns room_url + token to the browser. The WebRTC connection itself requires no API key.

Connect with vanilla JavaScript

This approach works with any framework — plain JS, Vue, Angular, Svelte, or no framework at all.

Basic connection

import { PipecatClient, RTVIEvent } from "@pipecat-ai/client-js";
import { DailyTransport } from "@pipecat-ai/daily-transport";

// Handle bot audio playback
function handleBotAudio(track, participant) {
  if (participant.local || track.kind !== "audio") return;
  const audio = document.createElement("audio");
  audio.srcObject = new MediaStream([track]);
  audio.play();
}

// Create client (once per page)
const client = new PipecatClient({
  transport: new DailyTransport(),
  enableMic: true,
  enableCam: false,
  callbacks: {
    onTrackStarted: handleBotAudio,
    onBotReady: () => console.log("Agent is ready"),
  },
});

// Start session via your backend (which calls Vocobase API)
const res = await fetch("/api/voice-session", { method: "POST" });
const session = await res.json();

// Connect to the voice agent
await client.connect({
  url: session.room_url,
  token: session.token,
});

Listen to events

// User's speech (transcribed in real time)
client.on(RTVIEvent.UserTranscript, (data) => {
  if (!data.final) return; // Only use final transcripts
  console.log("User:", data.text);
});

// Agent's response (arrives in TTS chunks — see aggregation below)
client.on(RTVIEvent.BotTtsText, (data) => {
  console.log("Agent:", data.text);
});

// Connection lifecycle
client.on(RTVIEvent.Connected, () => console.log("Connected"));
client.on(RTVIEvent.Disconnected, () => console.log("Disconnected"));
client.on(RTVIEvent.BotReady, () => console.log("Agent ready to talk"));

// Errors
client.on(RTVIEvent.Error, (message) => {
  console.error("Error:", message?.data?.message || message);
});

End the call

await client.disconnect();

Microphone controls

// Mute
client.enableMic(false);

// Unmute
client.enableMic(true);

Aggregating bot responses

RTVIEvent.BotTtsText fires once per TTS chunk, not once per complete response. To build full agent messages, aggregate chunks until the next user turn:
let currentBotMessage = "";

client.on(RTVIEvent.BotTtsText, (data) => {
  currentBotMessage += (currentBotMessage ? " " : "") + data.text;
  updateLastAgentMessage(currentBotMessage);
});

client.on(RTVIEvent.UserTranscript, (data) => {
  if (!data.final) return;
  currentBotMessage = ""; // Reset for next agent turn
  addUserMessage(data.text);
});

Complete HTML example

A self-contained page you can use to test your integration:
<!DOCTYPE html>
<html>
<head><title>Voice Agent Test</title></head>
<body>
  <div id="status">Ready</div>
  <div id="transcript"></div>
  <button id="startBtn">Start Call</button>
  <button id="endBtn" disabled>End Call</button>

  <script type="module">
    import { PipecatClient, RTVIEvent } from "@pipecat-ai/client-js";
    import { DailyTransport } from "@pipecat-ai/daily-transport";

    const statusEl = document.getElementById("status");
    const transcriptEl = document.getElementById("transcript");
    const startBtn = document.getElementById("startBtn");
    const endBtn = document.getElementById("endBtn");

    function handleBotAudio(track, participant) {
      if (participant.local || track.kind !== "audio") return;
      const audio = document.createElement("audio");
      audio.srcObject = new MediaStream([track]);
      audio.play();
    }

    const client = new PipecatClient({
      transport: new DailyTransport(),
      enableMic: true,
      enableCam: false,
      callbacks: { onTrackStarted: handleBotAudio },
    });

    let currentBotMsg = "";

    function addMessage(role, text) {
      const div = document.createElement("div");
      div.textContent = `${role}: ${text}`;
      transcriptEl.appendChild(div);
    }

    client.on(RTVIEvent.UserTranscript, (data) => {
      if (!data.final) return;
      currentBotMsg = "";
      addMessage("You", data.text);
    });

    client.on(RTVIEvent.BotTtsText, (data) => {
      currentBotMsg += (currentBotMsg ? " " : "") + data.text;
      // Update last agent message or add new one
      const lastDiv = transcriptEl.lastElementChild;
      if (lastDiv && lastDiv.textContent.startsWith("Agent:")) {
        lastDiv.textContent = "Agent: " + currentBotMsg;
      } else {
        addMessage("Agent", currentBotMsg);
      }
    });

    client.on(RTVIEvent.Connected, () => {
      statusEl.textContent = "Connected";
      startBtn.disabled = true;
      endBtn.disabled = false;
    });

    client.on(RTVIEvent.Disconnected, () => {
      statusEl.textContent = "Disconnected";
      startBtn.disabled = false;
      endBtn.disabled = true;
    });

    client.on(RTVIEvent.Error, (msg) => {
      statusEl.textContent = "Error: " + (msg?.data?.message || "Unknown");
    });

    startBtn.addEventListener("click", async () => {
      statusEl.textContent = "Connecting...";
      startBtn.disabled = true;

      try {
        // Replace with your backend endpoint
        const res = await fetch("/api/voice-session", { method: "POST" });
        if (!res.ok) throw new Error(`HTTP ${res.status}`);
        const session = await res.json();

        await client.connect({
          url: session.room_url,
          token: session.token,
        });
      } catch (err) {
        statusEl.textContent = "Error: " + err.message;
        startBtn.disabled = false;
      }
    });

    endBtn.addEventListener("click", async () => {
      statusEl.textContent = "Disconnecting...";
      await client.disconnect();
    });
  </script>
</body>
</html>

Connect with React

The @pipecat-ai/client-react package provides PipecatClientProvider and PipecatClientAudio — a provider for context and a component that automatically handles bot audio playback (replacing the manual onTrackStarted approach).
import { useState, useEffect, useCallback } from "react";
import { PipecatClient, RTVIEvent } from "@pipecat-ai/client-js";
import { DailyTransport } from "@pipecat-ai/daily-transport";
import {
  PipecatClientProvider,
  PipecatClientAudio,
} from "@pipecat-ai/client-react";

interface TranscriptEntry {
  role: "user" | "agent";
  content: string;
}

function VoiceChat({ agentName }: { agentName: string }) {
  const [client, setClient] = useState<PipecatClient | null>(null);
  const [status, setStatus] = useState<
    "idle" | "connecting" | "connected" | "error"
  >("idle");
  const [transcript, setTranscript] = useState<TranscriptEntry[]>([]);
  const [error, setError] = useState<string | null>(null);
  const [isMuted, setIsMuted] = useState(false);

  // Create client once
  useEffect(() => {
    const pc = new PipecatClient({
      transport: new DailyTransport(),
      enableMic: true,
      enableCam: false,
    });
    setClient(pc);
    return () => { pc.disconnect().catch(() => {}); };
  }, []);

  // Subscribe to events
  useEffect(() => {
    if (!client) return;

    const onUserTranscript = (data: any) => {
      if (!data.final) return;
      setTranscript((prev) => [
        ...prev,
        { role: "user", content: data.text },
      ]);
    };

    const onBotTtsText = (data: any) => {
      setTranscript((prev) => {
        const last = prev[prev.length - 1];
        if (last?.role === "agent") {
          return [
            ...prev.slice(0, -1),
            { ...last, content: last.content + " " + data.text },
          ];
        }
        return [...prev, { role: "agent", content: data.text }];
      });
    };

    const onConnected = () => setStatus("connected");
    const onDisconnected = () => { setStatus("idle"); setIsMuted(false); };
    const onError = (msg: any) => {
      setError(msg?.data?.message || msg?.data?.error || "Connection error");
      setStatus("error");
    };

    client.on(RTVIEvent.UserTranscript, onUserTranscript);
    client.on(RTVIEvent.BotTtsText, onBotTtsText);
    client.on(RTVIEvent.Connected, onConnected);
    client.on(RTVIEvent.Disconnected, onDisconnected);
    client.on(RTVIEvent.Error, onError);

    return () => {
      client.off(RTVIEvent.UserTranscript, onUserTranscript);
      client.off(RTVIEvent.BotTtsText, onBotTtsText);
      client.off(RTVIEvent.Connected, onConnected);
      client.off(RTVIEvent.Disconnected, onDisconnected);
      client.off(RTVIEvent.Error, onError);
    };
  }, [client]);

  const connect = useCallback(async () => {
    if (!client || status === "connecting") return;
    setError(null);
    setStatus("connecting");
    setTranscript([]);

    try {
      // Call YOUR backend, which proxies to Vocobase API
      const res = await fetch("/api/voice-session", { method: "POST" });
      if (!res.ok) throw new Error(`HTTP ${res.status}`);
      const session = await res.json();

      await client.connect({
        url: session.room_url,
        token: session.token,
      });
    } catch (err) {
      setError(err instanceof Error ? err.message : "Failed to connect");
      setStatus("error");
    }
  }, [client, status]);

  const disconnect = useCallback(async () => {
    if (!client) return;
    await client.disconnect();
  }, [client]);

  const toggleMute = useCallback(() => {
    if (!client) return;
    client.enableMic(isMuted);
    setIsMuted(!isMuted);
  }, [client, isMuted]);

  if (!client) return null;

  return (
    <PipecatClientProvider client={client}>
      <div>
        <p>Status: {status}</p>
        {error && <p style={{ color: "red" }}>{error}</p>}

        {transcript.map((entry, i) => (
          <div key={i}>
            <strong>{entry.role === "user" ? "You" : "Agent"}:</strong>{" "}
            {entry.content}
          </div>
        ))}

        {status === "connected" ? (
          <>
            <button onClick={toggleMute}>
              {isMuted ? "Unmute" : "Mute"}
            </button>
            <button onClick={disconnect}>End Call</button>
          </>
        ) : (
          <button
            onClick={connect}
            disabled={status === "connecting"}
          >
            {status === "connecting" ? "Connecting..." : "Start Call"}
          </button>
        )}

        {/* Handles bot audio playback automatically */}
        {status === "connected" && <PipecatClientAudio />}
      </div>
    </PipecatClientProvider>
  );
}

export default function App() {
  return <VoiceChat agentName="my-agent" />;
}
PipecatClientAudio renders a hidden &lt;audio&gt; element that plays the agent’s voice. Mount it when connected — it replaces the manual onTrackStarted callback used in the vanilla JS approach.

Events reference

EventPayloadDescription
RTVIEvent.ConnectedWebRTC connection established
RTVIEvent.DisconnectedConnection closed
RTVIEvent.BotReadyAgent initialized and ready to talk
RTVIEvent.UserTranscript{ text, final }User’s speech transcribed. Only use entries where final is true.
RTVIEvent.BotTtsText{ text }Agent’s response text. Arrives in chunks — aggregate them per turn.
RTVIEvent.TransportStateChangedstate: stringLow-level transport state changes
RTVIEvent.Error{ data: { message, error } }Connection or runtime error

Backend proxy example

Your backend should proxy the session start call to keep the API key server-side.
const express = require("express");
const app = express();

app.post("/api/voice-session", async (req, res) => {
  try {
    const response = await fetch(
      "https://api.vocobase.com/api/sessions/start",
      {
        method: "POST",
        headers: {
          "Authorization": `Bearer ${process.env.VOCOBASE_API_KEY}`,
          "Content-Type": "application/json",
        },
        body: JSON.stringify({ agent_name: "my-agent" }),
      }
    );

    if (!response.ok) {
      const err = await response.json();
      return res.status(response.status).json(err);
    }

    const session = await response.json();
    res.json(session);
  } catch (err) {
    res.status(500).json({ error: "Failed to start voice session" });
  }
});

Get session details

After a session ends, you can retrieve its transcript, duration, and credit usage.
curl -X GET https://api.vocobase.com/api/sessions/SESSION_ID \
  -H "Authorization: Bearer rg_live_abc123def456ghi789jkl012"
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "COMPLETED",
  "durationSecs": 45,
  "creditsUsed": 0.75,
  "transcript": {
    "messages": [
      { "role": "user", "content": "Hello" },
      { "role": "assistant", "content": "Hi! How can I help you today?" }
    ]
  },
  "startedAt": "2025-01-15T10:30:00Z",
  "endedAt": "2025-01-15T10:30:45Z"
}

Billing

Voice sessions are billed at 1 credit per 60 seconds, pro-rata by the second. Credits are deducted after the session ends. See Credits & Billing for full details.

Rate limits

LimitValue
Session starts10 per minute per API key
Concurrent sessionsConfigurable per key (default: 5)
Rate limit headers are included on all responses: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After (on 429). See Authentication for full rate limit details.

Troubleshooting

IssueSolution
No audio from agentVanilla JS: Ensure onTrackStarted creates an <audio> element and calls .play(). React: Ensure <PipecatClientAudio /> is mounted when connected. Browsers may block autoplay — always connect via a user gesture (button click).
Microphone not workingCheck navigator.mediaDevices.getUserMedia permission. HTTPS is required in production.
Connection drops immediatelyVerify room_url and token from the API are passed correctly to client.connect(). Tokens are single-use.
Multiple audio elementsIn vanilla JS, track the <audio> element and remove the old one before creating a new one in onTrackStarted.
enableCam: false but camera promptThe WebRTC transport layer may request camera access internally. Setting enableCam: false ensures the camera is never activated.
HTTPS is required for microphone access in all browsers except localhost. Your production deployment must use HTTPS.

Next steps

Credits & Billing

Understand how voice sessions consume credits.

Authentication

API key format, rate limits, and error codes.

Webhook Payloads

Receive session.completed events with transcripts and duration.

Quick Start

Create an agent and make your first call.