Realtime API Overview
MiniCPM-o Realtime API protocol overview
MiniCPM-o exposes two real-time full-duplex conversation modes over WebSocket.
API Host
https://minicpmo45.modelbest.cnEndpoint
wss://host/v1/realtime?mode={video|audio}| Parameter | Required | Description |
|---|---|---|
mode | No | video by default, or audio. It determines the session duration limit and the recommended input modality. |
session_id is generated by the server after the connection is established. The format is rt_{timestamp_ms} and the value is returned in the session.created event. Clients do not need to, and should not, pass session_id in the URL.
Modes
| Mode | Endpoint Example | Upstream Data | Session Duration | Effective Conversation |
|---|---|---|---|---|
| Video Full-Duplex | wss://host/v1/realtime?mode=video | Audio + video frames | 5 minutes | ~90 seconds |
| Audio Full-Duplex | wss://host/v1/realtime?mode=audio | Audio only | 10 minutes | ~8 minutes |
Both modes share the same event naming and message structure. The only difference is the recommended input modality:
- Video Full-Duplex:
input_audio_buffer.appendshould usually includevideo_frames. - Audio Full-Duplex:
input_audio_buffer.appendshould not includevideo_frames; behavior is undefined if it does.
The mode cannot be changed during a session.
Protocol Documents
- Video Full-Duplex Protocol — full-duplex conversation with video frames.
- Audio Full-Duplex Protocol — audio-only full-duplex conversation.
- JSON Schema — machine-readable message schema.
Example Code
For complete client implementations, see the full-duplex demo pages in this repository. They use the Realtime API protocol directly:
| Page | Description |
|---|---|
static/omni/ | Video Full-Duplex — real-time audio/video conversation |
static/audio-duplex/ | Audio Full-Duplex — real-time audio-only conversation |
Core client wrapper: static/duplex/lib/realtime-session.js
Repository: https://github.com/OpenBMB/MiniCPM-o-Demo/tree/realtime-protocol